Big Data’s one of the hottest topics of discussion in the tech world at the moment, what with all the hype about its potential and its problems, and how its going to transform our world. All well and good, but it’s actions that speak louder than words, and for all the talk of how its going to affect our future, some people prefer to just knuckle down and quietly get on with it.
Big Data is not just something we have to look forward too. As Wikibon’s Jeff Kelly stated on the Wikibon blog earlier this week, “The reality is that Big Data is today – here and now – delivering on its many promises.” Big Data is very, very real, and more importantly it isn’t just sat there wasting away in some server eating up electricity. It’s being used right now in an astonishing variety of ways, and it’s already having a huge impact on our lives.
To show you what I’m talking about, take a look at this selection of real-world Big Data deployments that are happening right now:
We all know how hugely successful Netflix has become – today its the largest commercial video streaming network in the US, with more than 30 million customers. But what many don’t know is that it’s also become one of the world’s biggest Big Data hoarders, keeping track of what viewers watch, where they’re watching it, when they’re watching it, and what device they’re watching it on. Whenever you hit play or rewind or fast-forward or stop watching a movie altogether, its gathering data.
What’s more, it’s actually putting that data to use. Netflix has begun to produce its own original TV shows, and to do so its leveraging all of its data to do it. Netflix used its data to decide that the BBC’s “House of Cards” was the best fit for a remake, and its data also correlated fans of the original to fans of actor Kevin Space and director David Fincher, which in turn was what led to them being hired.
National Oceanic and Atmospheric Administration (NOAA)
The NOAA can actually claim to be one of the first organization’s in the world to harness the power of Big Data. After all, its been doing so for more than 50 years. Today, it gathers more than 30 petabytes of atmospheric, oceanographic and terrestrial data each year from more than 3.5 billion satellites, aircraft, ships, buoys and other sensors.
NOAA uses this data to produce millions of highly complex, high-fidelity predictive models each day, in order to generate forecasts for the National Weather Service, as well as warnings and guidance for the private sector and government agencies like NASA and the Department of Defense.
NARA Electronic Records Archive
As the United State’s official record keeper, NARA manages one of the largest Big Data sets in existence – more than 142TB of data representing over seven billion objects, and growing. Its records include data from just about every government and public agency there is, including Congress, several presidential libraries, and the entire eco-system of federal agencies.
NARA’s digitized records are spread across some 4,800 different formats, and it isn’t stopping there for its also in the process of digitizing a whopping four million cubic feet of traditional archival holdings. With the government insisting that NARA must make 95% of its data available to researchers by 2016, it’s developed the Electronic Records Archive to perform numerous archival functions to manage its records according to different legal frameworks.
Vestas Wind Energy Turbines
It’s always good to hear examples of how technological innovation is helping the environment, and in Big Data’s case there are few better than what the Danish wind energy firm Vestas is doing. In order to get the most bang for its buck, Vestas uses IBM supercomputers to help pinpoint the most suitable locations for its wind turbines. To date, its wind turbines have collected a staggering 2.8 petabytes of wind data, allowing it to map weather systems right across the globe.
Parameters for its data library include barometric pressure, temperature, precipitation, humidity, and of course wind velocity and direction. Crunching this data, Vestas is able to identify where to build new wind turbines in order to maximize power generation whilst reducing costs. It’s library is going to become even more extensive too, as it adds data from global deforestation, satellite images, geospatial data, historical metrics and data on sea tides and the phases of the moon.
Google and Facebook aren’t the only web companies sitting on enormous mountains of data. One of the web’s unsung Big Data hoggers is undoubtedly Ancestry.com, which sites on a pile of 11 billion records – amounting to roughly 4 petabytes of content.
Ancestry.com’s data includes all the obvious stuff – birth records, death records and historical records – together with records on immigration, war, and even yearbooks from schools, organizations and companies, and many of these are well over a hundred years old. Even more astonishing, many of its records are in handwritten format. In order to index and make this content searchable, it has to rely on some of the most advanced content processing technology. Ancestry.com is planning to increase its data haul too, by adding DNA processing to help clients establish connections. By collecting saliva samples and storing them, Ancestry.com aims to create a huge database that’ll one day be able to help people connect with distant or estranged family members, such as long-lost cousins or adopted siblings.