AWS Beefs Up Its EC2 Cloud For Big Data Crunching, Munching & Analytics

#AWS, #AWSreinvent, AWS re:Invent 2013, AWS re:Invent

Amazon has strengthened its claim to being king of all things data, with the unveiling of two new EC2 instance types, plus an advanced new data analytics tool it’s called the AWS Data Pipeline.

The announcements took place during a keynote by CTO Werner Vogels at Amazon’s inaugural partner conference re:invent earlier this week. Speaking to the Las Vegas crowd, Vogels’ carefully chosen words were as prescriptive as they were persuasive, ensuring that his key tied Amazon’s software developments to business processes.

According to Vogels, 21st Century cloud architectures need to be controllable, adaptive, resilient and data driven. He said that businesses have been held back for two long due to old world constraints, which forced them to focus on available resources at the expense of customers and business. A big part of that focus is on business processes, something that the advent of the cloud is rapidly changing.

Amazon’s new High Storage and Cluster High memory Instances are much bigger than anything we’ve seen so far – dwarfing the new Compute Engine instances announced by Google just last week – and only strengthen Amazon’s dominant lineup. Vogels said that the new instances are designed to assist businesses with critical data analysis that has become essential to their continued growth.

Automated Data Workflows

Even more exciting was the announcement of the company’s new AWS Data Pipeline, so-called because it helps organizations to automate their analytics processes, whilst simultaneously transferring data between different storage locations. This concurs with what Vogels stressed during his key – that the cloud gives companies much more freedom to operate in a business-driven, unconstrained way.

Vogels explained that because most companies include their computing power on a per-usage basis, there’s no reason why they shouldn’t use the cloud as much as possible. This is where AWS Data Pipeline comes in, facilitating this by automatically integrating data from Dynamo database with Amazon S3 stores. To insert data sources it uses a simple drag and drop method, and provides templates for pre-made implementation formats.

Workflow demo of AWS Data Pipeline

This integration system can be used in dozens of ways, for example users could run daily reports using the data from one particular store, then add that same report to a second store. Operators can also set parameters telling it to run reports at pre-set intervals, or only after a specified amount of data has been collected, depending on the kind of information the company needs.

So is Amazon fit to be called the King of Big Data? Given that the combined power of its compute instances simply overwhelms whatever anyone else can offer, it certainly has a very strong case. This might make Amazon vulnerable, but then again, with very few competitors to say otherwise and no end to the flow of companies moving into the cloud, it’s hard to say otherwise.