The term “data scientist” has become popular since the term first appeared at around 2009. But while the term has been applied to numerous people, exactly what a data scientist is is still unclear, Big Data visionary and Tresata founder and CEO Abhi Mehta said in an interview in The Cube from the Strata = Hadoop World 2012 conference in late October. And no actual degree in data science exists today. Tresata, he said, plans to help fill this gap with an effort led by its new Chief Scientist, Ph.D. Roy Lowrance, to create the first advanced degree program in data science this summer, offering both master’s and Ph.D. degrees.
The term has almost as many definitions as there are experts using it. Mehta says he sees data scientists as divided into two groups, whom he calls the “marketing quants” and the “data quants”. The first group does the “sexy work” of researching questions using established Big Data analysis tools and then applying the answers to defining better products and services to meet customer needs. That, he says, is about 20% of the work.
The other 80% is the “boring but important” work of the data quants. This “role is incredibly ill-defined,” he says, “but it has the most value, which is people who have the ability and resources” to close the gap in the “last mile” of Big Data analytics. Nine-tenths of that mile, he says, can and absolutely has to be automated using machine learning to “mimic smart people and … do the work of those people in a smarter way…. I have to have machines do it because there’s no way I can build an analytics engine, a business, manually around people manipulating data for 250 million units [the U.S. population]. Not possible.”
But he says, that last one-tenth has to be done by humans, and those are the data scientists. “That skill set is incredibly ill-defined and not available on the market. It is a sort of cyborg from a skills perspective. So training data quants is a very interesting problem that no one today is solving. But a trained data quant with the right machine learning tools can function with data that’s growing at I don’t know what, a new word, celabytes.” To do that they have to be “part physicist, part mechanical engineer, part statistician, and a total reservoir of common sense.”
And these people are in demand. Mehta says Tresata clients come to him constantly who get the potential of Big Data analysis to answer critical questions and provide the keys to a new level of business, but then add “I don’t have the people to write the new models for all this data. So sampling is dead, and we need to analyze the entire population. Who’ll write that for me?” That is the question that Tresata hopes to answer with the new data science degree program.