UPDATED 18:31 EST / OCTOBER 16 2015

NEWS

MIT developing a system that replaces human intuition for big data analysis

Massachusetts Institute of Technology (MIT) researchers at the Computer Science and Artificial Intelligence Laboratory (CSAIL) are looking to take human intuition out of big data analysis by letting computers choose the feature set used to identify predictive patterns in the data. This effort is called “Data Science Machine” and so far the prototype of this software has beaten 615 of 908 teams competing for the same capability (across three data science competitions).

Big Data represents a huge, complex ecosystem that brings together innovative processes from across the spectrum of data analysis, storage, networking, curation, search, and many other processes and functions. Much of big data analysis is automated and algorithmic, but in the end data scientists and business users are needed to determine what features of the analysis and data sets are needed for end visualization to communicate that data and make it actionable.

When looking at a huge amount of data experts often collide over what features of that data are needed to produce results that can lead to action. Got a lot of consumer data about what people buy and how they use it? In that is a constellation of price points, locations, ethnographic information, returns, upgrades, etc. all of which lead to patterns in purchasers and purchases but in the end a human needs to choose what combinations of those data points will come together to tell them what they want to know.

“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine, speaking to Phys.org. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

Feature engineering and big data analysis

As described above, picking the features needed to reveal the patterns necessary to provide actionable information is often the purview of the big data scientist writing the analysis code. In the end, that code will guide the big data engine in its analysis that will predict or reveal what the humans looking at the data needs.

The essence of this is to provide a big data algorithm that doesn’t simply provide answers for a question asked about the data–but an algorithm that suggests questions based on the data set.

Researchers already intend to use this technology as proof-of-concept for seeking feature sets that will explore such things as the power-generating capacity of wind farms or predicting which students will drop out of online courses.

According to Phys.org, dropout prediction tends to rise out of two major data points: how long a student waits before a deadline before working on a problem and how long a student spends on a course relative to her classmates. MIT’s online learning platform MITx does not record these data points, but the galaxy of other data could potentially hold interactions that would allow this information to be inferred. A system such as the Data Science Machine could be used to engineer likely feature sets to deliver that.

An emerging science within big data at MIT CSAIL

Many corporations, institutions, businesses and governments collect a great deal of data already–and often must avoid collecting particular data due to network, storage and sensor constraints–but breakthroughs in machine learning and meta-analysis such as the Data Science Machine would augment the already interesting job of being a big data scientist by adding another layer of automation. Big data scientists would still code the analysis portions for the engines to turn over in their computerized brains, they would just have another tool in their belt when addressing the questions that deliver the answers they need.

Readers interested in the nitty-gritty academic details of this MIT project can read up on it in “Deep Feature Synthesis: Towards Automating Data Science Endeavors” [PDF] authored by James Max Kanter and Kalyan Veeramachaneni.

Featured image credit: via Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU