Putting Big Data in the Cloud to Work for the Benefit of Humanity

Not to lead with the hyperbole, but it’s going to be hard to argue with the promise of big data analytics and the scalability of cloud when the combinations starts to solve one world problem after another. Okay, maybe that’s a ways off. But just yesterday, a pair of major cloud service providers made announcements that bring us one step closer.

First off, Amazon Web Services (AWS) announced a partnership with 1000 Genomes, the “international public-private consortium that aims to build the most detailed map of human genetic variation available,” brought the entirety of its 2010 pilot data set and most recent data sets into the Amazon S3 storage cloud. That’s around 200 terabytes of data, containing the genome sequences of more than 2,661 people from 26 global populations.

Access to the data is free, writes AWS high-performance computing (HPC) guru Matt Wood in a blog entry. You only have to pay for the cloud compute time needed to analyze the data to your satisfaction. And since it’s in an S3 storage bucket, it’s easy for AWS customers to use Apache Hadoop by way of Amazon Elastic MapReduce and crunch some genetic research data.

Meanwhile, Microsoft Research highlighted three experimental products that use some combination of machine learning, big data analytics and the Microsoft Windows Azure cloud platform for some impressive, if early, technologies.

Microsoft Translator Hub is a self-service model for building customized automatic translation services between any two languages, allowing users to upload their own data sets and keep less wide-spoken languages in use. These translation services can be made accessible via widget or the Microsoft Translator APIs. ChronoZoom is a tool for organizing historical collections of data in one place, across libraries and datasets. FetchClimate!, well, fetches current and historical climate data for any point on Earth, either via a few lines of .NET code or through a web interface.

Services Angle

This week, we’ve already seen how the White House plans to use big data to solve research and infrastructure problems. And enterprises like Foursquare are starting to leverage big data to help serve customers better. But genetic research especially is benefiting from an analytics-driven approach, and the sciences at large are starting to reap the benefits.

These are just examples of how the cloud can spur that kind of innovation – by removing scaling and infrastructure hurdles, big data can be used for technologies that could, some day, save the planet – or a life.