Future Big Data: Mapping Our Universe, and Beyond!


Mankind has made some pretty big strides over the last fifty years or so as we strive for a better understanding of the universe and how it came to be. We’ve landed men on the moon, we’ve sent numerous robots to Mars, Venus and other bodies in the Solar System, and we now have a permanent presence in space.

But despite all of these impressive achievements, we’ve barely gotten any further to answering the two most important questions – how did the universe come about, and is mankind alone?

These are questions that have troubled our greatest scientists and thinkers for centuries, and aside from a whole bunch of theories, very few people have come up with any substantial answers. But while we may not be able to definitively answer these questions any time soon, our understanding of the universe is set to improve radically in the next few years, and it will all be thanks to big data.

Mapping The Universe

Many scientists argue that before we can really understand the universe and figure out what caused everything to be, we first need to know exactly what it is we’re dealing with. To this end, a group of 23 academic institutions from countries including the US, UK, Canada, France and China has created an organization called the Next Generation Virgo Cluster Survey (NGVS), which has announced extremely ambitious plans to map the Virgo Cluster that consists of as many as 2,000 galaxies.

See the entire Future Big Data Series on Pinterest and Springpad!


That might sound simple enough to you and me, but in truth it’s anything but. Mapping the Virgo Cluster in the kind of detail scientists want to is a truly monumental task that will necessitate the collection of vast amounts of data – we’re talking hundreds of terabytes here, says InfoWorld.com.

Much of the data has already been collected using the Canada-France-Hawaii Telescope (CFHT) located atop Hawaii’s Mauna Kea, which from 2009 to 2012 spent 140 nights carefully compiling photos of the Virgo Cluster. But the problem is that it’s one thing to collect all of these images of the cluster, and quite another to piece them all together into a coherent and accurate map.

The biggest problem with mapping star systems accurately is that scientists can’t be 100% sure which stars belong to the cluster, and which do not. Looking at the cluster through even the most powerful telescopes provides few clues as to which stars are actually a part of it, and which are located millions and millions of light years behind or in front of it.

To solve this problem, scientists are using a big data analysis technique known as ‘Machine Learning’ to positively identify which celestial objects in the photos belong to the Virgo Cluster and which do not. To this end, scientists adopted a highly advanced analysis engine known as SkyTree to do the legwork for them.

The advantage of SkyTree is that it can it can analyze data from just about any source – structured or otherwise – before massaging this and visualizing it in dozens of different ways.

With regards to the NVGS project, SkyTree applied its machine learning algorithm to known data from more than 20 million galaxies that have already been mapped to some extent. Using this data, SkyTree can then look at the images of the Virgo Cluster and automatically dismiss millions of stars that appear to fall outside of the distance range that would place them within the Virgo Cluster, saving scientists hundreds of man hours from doing this job manually.

There’s much work to be done, and even once SkyTree has managed to sort through all of the images, it will likely be some time before the scientists can piece together their final map of the Virgo Cluster. Even so, applying big data to astronomy in this way represents an important step forward, and hopefully, will help to advance our understanding of the universe in ways that we previously thought were impossible.