Astronomer Tobias Mayer, born 1723, was the first data scientist John Rauser, a data scientist at Amazon.com, explained in a talk at Strata.
Mayer explained the motion of the moon using spherical motion trigonometry. In order to do this, Mayer compiled nine times as many data points as necessary (27 instead of three). Mayer claimed that because he had more data, his calculations were more accurate. Although he was wrong about just how much more accurate his data was (he claimed it was nine times better, but it was probably only about three times better) Rauser says, it was the first time someone made a quantitative argument that more data is better.
Not long before, tk Euler had written that more data increased the chance of error. How did Mayer get it right while Euler, one of the greatest mathematicians of all time, got it wrong? Rauser speculates that it was that Mayer had an engineering background and Euler was a pure mathematician. Rauser says that data scientists must have both a background in engineering (by which he means mostly software engineering) and a background in mathematics (by which he mostly means applied mathematics/statistics).
Rauser said that schools don’t offer a clear path towards becoming a data scientist – as far as he knows there is no school offering a data science degree yet. The closest you can come studying computer science along with a lot of math and machine learning. You could also majoring in statistics with a heavy emphasis on computation. Rauser says he has degrees in aerospace engineering and computer science. He worked for 10 years as a software engineer and along the way picked up some skills in machine learning and statistics. Once he started at Amazon.com he started to combine coding with answering business questions with data.
“If you want to be a data scientist, grow into the role,” Rauser says. And to hire data scientists, you should probably grow them on their own. It’s going to be easier to find a promising computer scientist or statistician and give them interesting problems to work on than it is to find someone who is already a data scientist.
Rauser cites the following core skills for data scientists:
- Writing. Rauser calls writing “the key to having impact” and says that if you no one can find or understand your work it might as well have never happened. “The written word scales,” he says.
- Skepticism. You need to look just as hard for evidence that refutes your hypothesis. Rauser believes that skepticism is a skill can be learned, not a trait that you are born with.
(hat tip Data Science 101)