Big Data has been gaining a lot of momentum over the last 12 months, with a new report from Wikibon this morning putting the movement’s total value at a whopping $11.4 billion in 2012. Much of this cash has come from big business that are keen to exploit Big Data for economic reasons, but much less has been said about the massive investments that government agencies have also been making in Big Data over the last few months.
One of the crucial differences with government projects is that instead of using Big Data as a vehicle for delivering insights and making money – as big business normally does – agencies have been pursuing much more progressive ideas, aiming to use Big Data to improve our understanding of the world, and with it, better our lives.
Government Big Data projects are far more numerous than most people realize. Chances are that you’ve already benefited from one of these projects, though you probably never knew it. For example, anyone who’s ever asked Siri for directions has already taken part in a government Big Data project. Having begun life in one of the Pentagon’s secret labs, Apple’s automated personal assistant is one of the most obvious examples of Big Data being used today, accessing and analyzing petabytes of information to deliver the answers we need in real-time.
Siri is one of the most recognizable Big Data projects, but governments across the world have backed dozens of other innovative ideas, spanning from the Large Hadron Collider at CERN in Switzerland, to smaller scale – but just as beneficial – ideas like the ExpressLanes project in Los Angeles that aims to keep traffic flowing on the city’s most important highways. More recently though, governments have really begun to push the envelope with some of their Big Data projects, taking advantage of the latest advances in data collection and computation to quite literally blow our minds.
1000 Genomes Project
One of the most talked about – and most ambitious – government Big Data projects is the complex international effort to catalog all three billion bases within the human genome. The effort to map our DNA began back in 2008, with 75 companies and organizations from around the world collaborating to amass more than 200 terabytes of data from over 2,500 individuals. The project completed its pilot phase back in 2010, and most recently in October 2012 announced that it had successfully sequenced 1092 genomes in the human body.
The 1000 Genomes project is still ongoing, and the information it has derived has proven invaluable to thousands of researchers from across the world. Its stated goal is to create a complete and detailed catalogue of all human genetic variations, which can then be used by researchers studying genetic disease.
Scientists from across the world have already benefited from the 1000 Genomes Project, but accessing the enormous data set its created has presented another huge problem in itself, as many research labs lack sufficient storage and computation infrastructure to hold and analyze it. To get around this problem, scientists have once again turned to Big Data, contracting Amazon Web Services to provide infrastructure in the cloud to make the 1000 Genomes Project accessible to all.
The Defense Advanced Research Projects Agency, or DARPA for short, is one of the most active government agencies as far as Big Data is concerned, undertaking several unique projects in recent years. One of the most ambitious is XDATA, which kicked off last March and is being funded to the tune of $25 million a year for the next four years.
So what’s XDATA about? DARPA is actually somewhat secretive about the nuts and bolts of the project, but essentially, the aim is to support the government’s efforts to coordinate Big Data technology management, and utilize the petabytes of data collected by its federal agencies more efficiently. To meet these goals, DARPA is basically trying to develop new computational techniques and software programs that can analyze structured and unstructured Big Data sets faster and more effectively than current technologies allows them to do.
Announcing the project, DARPA underscored just how critical XDATA will be to the government and national defense in the future:
“Current DoD systems and processes for handling and analyzing information cannot be efficiently or effectively scaled to meet this challenge. The volume and characteristics of the data, and the range of applications for data analysis, require a fundamentally new approach to data science, analysis and incorporation into mission planning on timelines consistent with operational tempo.”
Atmospheric Radiation Measurement (ARM) Climate Research Facility
The ARM project is an initiative backed by the Department of Energy’s Biological and Environmental Research Program (BER), and ultimately hopes to serve as a multi-platform science hub that collects and acts on large climate data sets.
Its stated goals are complex, but the primary objective is to improve our understanding of the physics of our atmosphere, in particular with regards to the way that clouds, aerosols and radiative feedback processes interact with each other in the atmosphere. Essetially, what the project aims to do is to further our understanding of the Earth’s climate through Big Data, and with that knowledge perhaps come up with some answers to the threat of global warming and other climate change issues.
According to its website:
“ARM provides the national and international research community unparalleled infrastructure for obtaining precise observations of key atmospheric phenomena needed for the advancement of atmospheric process understanding and climate models.”
The data set created by ARM is absolutely massive, with numerous collection sites scattered around the world that take in diverse data types, meaning that the project faces one of the most demanding problems in dealing with Big Data – managing variable data sets in a distributed fashion. Thankfully, the project receives ample funding from government agencies and other organizations to stay at the forefront of the latest sensor and data handling technologies.
BioSense 2.0 builds on the success of its predecessor, BioSense, which kicked off back in 2003 as an effort to build an “integrated national public health surveillance system for early detection and rapid assessment of potential bioterrorism-related illness.” The initial project has since been taken over and expanded by the Center for Disease Control, which has established new aims that cover every aspect of public health tracking at local, state and national levels.
The BioSense 2.0 project is built on a collaborative cloud-hosted database built using data from all levels of government, with the aim of making that data instantly accessible to end-users across numerous departments of government. The program uses symptomatic data gathered from around the country, tracking health problems like the recent flu outbreak in real-time as they evolve, and giving health professionals the insights they need to prepare the best possible response to such incidents.
Such is the program’s success, that the Center for Disease Control recently announced its second Big Data project, aimed at identifying and classifying unknown pathogens and bacterium in order to help it detect outbreaks of disease more quickly. Known as the Special Bacterial Reference Laboratory, the lab relies on something called “networked phylogenomics” to identify new bacterium as they appear:
“Phylogenomics will bring the concept of sequence-based ID to an entirely new level in the new future with profound implications on public health.”
Human Brain Project (HBP)
Without doubt one of the most ambitious Big Data projects ever launched, the Human Brain Project recently won backing to the tune of a ceiling-shattering €1 billion from the European Commission.
Leading the project is Henry Markram, a South-African-born brain electrophysiologist who works at the Swiss Federal Institute of Technology in Lausanne. Markram has been saying for years that it would be possible to simulate the human brain, and now he has been given the chance to do exactly that after winning the two-year long Future and Emerging Technologies Flagship Initiatives contest.
So what is Markram going to do exactly? The idea is to model everything that scientists know about the human mind, including its chemistry, its cells and its connectivity, using a supercomputer. One of the most important goals that Markram hopes to achieve is to build up a complete inventory of which types of genes are active in which neurons. We already know that not all neurons are the same – they come in all different shapes and sizes, and play different roles within the brain, deploying different genomes.
Markram wants to create a full list – what he calls the “single-cell transcriptome” – which he says will help scientists to deduce which kind of neuron is used in different parts of the brain, and ultimately discover insights into how neurodegenerative diseases might be treated.