UPDATED 11:22 EDT / MARCH 05 2012

Kaggle Sees Data Science as a Sport

With the tagline, “Data science as a sport,” Kaggle helps companies and government agencies, including NASA and Allstate Insurance, develop big data predictive analytical algorithms for a wide variety of data-dependent questions by creating contests and enlisting large numbers of independent data scientists worldwide to post entries.

Kaggle was inspired by the Netflix Prize, company President and Chief Scientist Jeremy Howard told Wikibon’s Jeff Kelly on a live webcast interview in The Cube at Strata 2012 (full video below). This was a $1 million prize created by Netflix for the best solution to improving its recommendation system that attracted about 50,000 entries and was mentioned several times in the N.Y. Times. The winning solution improved Netflix recommendation accuracy 300%.

“We realized that this was actually a great way to design predictive modeling for all kinds of problems in science, industry, and government,” Howard said. “So we created a site that helps organizations design and run their own predictive modeling competitions. Rather than having a team of experts spend a year setting up your predictive modeling system, you just fill out a five-step wizard and create your competition.”

In an environment where demand runs very high for the comparatively few data scientists available, this makes predictive modeling available to organizations without the skills in-house. But Kaggle’s clients are not limited to those organizations.

For instance, its internal data scientists worked with domain experts at NASA to design a competition for the optimal method to predict identify of dark matter in the Universe from the huge amounts of raw data NASA has captured. The result of the competition was a methodology that improved dark matter mapping by three times. NASA has some of the best data scientists in the world, but they could not do what the 30,000 PhD-level data scientists that participate in Kaggle competitions accomplished in a few weeks.

Allstate Insurance, which has its own internal team of the world’s best actuaries, had Kaggle run a contest to find a better way to predict which drivers would be most likely to crash their car. The contest yielded a method that was three times as accurate as what Allstate had developed internally.

Kaggle contest winners have four primary characteristics, Howard said: openmindedness to new and sometimes oddball ideas, creativity and curiosity about what others are doing in the field, tenacity to stick to the problem even when someone else is ahead in the contest, and top data science skills.

He said that the attraction of this methodology is that it creates “a meritocracy outside the world of sports.

“We all believe in using data, rather than where someone went to school, who talks the loudest, or someone’s title in the organization, to drive decisions. It is a meritocracy in which the person or team with the best solution wins.”


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU