Data Scientist. That was a term coined when two people, DJ Patil and Jeff Hammerbacher, were trying to name their data team working on big data and did not want to limit their functions with just any job title like business analyst or research scientist. And ever since, the title has become extremely popular. The role of a data scientist is designed to help organizations make sense of the large number of disparate data sources that analytics platforms can interrogate.
So what makes a data scientist? What skill set does it require?
Here it goes!
Diverse Technologies – To be a data scientist, you should have your hands on a number of tools and technologies, especially open source ones, such as Hadoop, Java, Python, C++, ECL, etc. Besides, having good understanding of database technologies, such as NoSQL database like HBase, CouchDB, etc. is an add-on.
Mathematics – Conventional computer science degrees no longer satisfy the quest of a data scientist. The job requires someone who on the one hand understands large-scale machine learning algorithms and programming and on the other is a statistician. So, the profile is better suited for experts in other scientific and mathematical disciplines, apart from computer science.
Talking on this point, Kirk Dunn, chief operating officer at Hadoop specialist Cloudera said,
“Cloudera has been training up scientists from inside and outside the IT industry to become data scientists – teaching statistical experts the necessary computer science and computer scientists the necessary statistical skills. You can’t hire this generation of data scientists, you have to build them. Academics that specialize in data analytics and research, such as econometrics or epidemiologists, are well suited to the job.”
Business Skills – As data scientists wear multiple hats, they need to have strong business skills, including communication, planning, organizing, and managing. A data scientist has to communicate with diverse people in an organization that includes communicating and understanding business requirements, application requirements and interpret the patterns and relationships mined from data to people in marketing group, product development teams, and corporate executives. And all this requires good business and communication skills, to get the things done right.
Visualization – You may be able to mine and model data, but are you able to visualize it? Well if not, mind that you should be able to work with some, at least a few of the data visualization tools. Some of these include Flare, HighCharts, AmCharts, D3.js, Processing, Google Visualization API, and Raphael.js.
Innovation – You don’t just have to look around and do with data. You got to think creative, and innovate. A data scientist should be eager to learn more, be curious to finding new things, and think out of the box. They should be focused on making products real and making perfectly done data available to users. They should be able to see where data can add value, and how it can brings better results.
But that’s not the end! If you are really keen for this profile, there are some official courses coming up, or have already come up, that aim at making you a data scientist. One of them is from Coursera, instructed by Bill Howe, Director of Research for Scalable Data Analytics at the UW eScience Institute. The course enrolls undergraduate students having some programming experience with Java or Python, and some familiarity with databases.