Don’t believe me? At last week’s Strata conference, throngs of attendees were jockeying so intensely for position to listen to bitly Data Scientist Hilary Mason that I thought I might be at a Bon Jovi concert circa 1986.
Ok, perhaps I’m exaggerating just a bit (not to mention showing my age.) Still, there’s no denying that demand for Data Scientists like Mason is at an all-time high. Why? Enterprises are desperate to turn the mountains of data they collect internally and the avalanche of data available externally – a.k.a. Big Data — into meaningful insights to increase operational efficiencies, identify potential new market opportunities and generally gain competitive advantage.
Traditional database and analytics tools – and DBA’s and analysts with traditional skills – simply aren’t up to the task. Big Data technologies and approaches — including Hadoop, but more generally distributed data processing and advanced data analytic methods – require a new set of skills from a new type of specialist: the Data Scientist.
But Data Scientists are hard to come by.
There are simply very few formal training and educational programs focused on analytics and data science. As a consequence, Data Scientists often develop their skills over the course of their careers, picking up the necessary experience in bits and pieces over a number of years.
“There’s no one true path to being a data scientist,” said Guy LeMar, a data scientist at Quest Software, during an interview at Strata. “It’s really a process of accumulating over time little tools that you add to your toolbox to help you analyze data.”
A handful of data science-related programs have begun to emerge over the last couple of years, however, some with the backing of vendors. The DePaul Center for Data Mining and Predictive Analytics at DePaul University offers a M.S. in predictive analytics, for example, that draws on faculty from both the university’s computer science school and department of marketing.
But becoming a data scientist isn’t just a matter of taking some classes. It takes a special type of person to make the grade, as data scientists need to combine a mix of diverse talents/expertise with a persistent, curious, creative attitude. They must be well versed in statistics, data visualization, mathematics and computer programming. They must be willing to experiment with data again and again, try novel approaches, accept constructive criticism from peers and even be willing to fail from time-to-time.
“It’s a lot about brainstorming and trying to see patterns in the data and sitting down together with a team and trying to tease out the signal from the noise,” said Monica Rogati, a Senior Data Scientist at LinkedIn. Data Scientists can make data say just about anything they want, “even lie,” she said, so Data Scientists must also be willing to question their own findings.
“Data lies. It lies because we let it,” Rogati said during a presentation at Strata. “So let’s not let it. Let’s ask the right questions.”
Successful Data Scientists are also able to put themselves in the shoes of the business. While experimentation is core to the Data Scientist’s mission, they must also be able to turn high-minded, abstract projects into to tangible business value such as new products or improved operational efficiency. This is particularly important at this phase of the Big Data market’s lifecycle. C-level executives are starting to take notice of the buzz surrounding Big Data, and some are even funding new Big Data initiatives. But those initiatives need to display hard, cold results – even small ones – in a timely fashion in order to keep the funding coming.
So with so few Data Scientists available, what’s an enterprise with Big Data ambitions, but not the internal staff to bring them to fruition, to do? For now, you must turn to either Big Data vendors, third-party consultants and/or the open source community for help getting Big Data Analytics programs off the ground. Ideally, enterprises should tap two or three of these groups, as each has their benefits and drawbacks.
Most Big Data vendors offer services and training to help deploy their particular technologies, for example, but lack the breadth of knowledge regarding competing and complimentary technologies. The open source community of Big Data developers is extremely knowledgeable about the latest technology developments, but may lack perspective on building a Big Data business case and getting executive buy-in. Third-party service providers, meanwhile, may understand both the technology and business-side of the Big Data equation, but there are few Big Data services practices at the moment.
Of course, another other option is to hire an internal team of Data Scientists. But enterprises that decide to go this route must be prepared to pay top-dollar. After all, rock stars don’t come cheap.