UPDATED 11:22 EDT / MARCH 05 2012

Kaggle Sees Data Science as a Sport

With the tagline, “Data science as a sport,” Kaggle helps companies and government agencies, including NASA and Allstate Insurance, develop big data predictive analytical algorithms for a wide variety of data-dependent questions by creating contests and enlisting large numbers of independent data scientists worldwide to post entries.

Kaggle was inspired by the Netflix Prize, company President and Chief Scientist Jeremy Howard told Wikibon’s Jeff Kelly on a live webcast interview in The Cube at Strata 2012 (full video below). This was a $1 million prize created by Netflix for the best solution to improving its recommendation system that attracted about 50,000 entries and was mentioned several times in the N.Y. Times. The winning solution improved Netflix recommendation accuracy 300%.

“We realized that this was actually a great way to design predictive modeling for all kinds of problems in science, industry, and government,” Howard said. “So we created a site that helps organizations design and run their own predictive modeling competitions. Rather than having a team of experts spend a year setting up your predictive modeling system, you just fill out a five-step wizard and create your competition.”

In an environment where demand runs very high for the comparatively few data scientists available, this makes predictive modeling available to organizations without the skills in-house. But Kaggle’s clients are not limited to those organizations.

For instance, its internal data scientists worked with domain experts at NASA to design a competition for the optimal method to predict identify of dark matter in the Universe from the huge amounts of raw data NASA has captured. The result of the competition was a methodology that improved dark matter mapping by three times. NASA has some of the best data scientists in the world, but they could not do what the 30,000 PhD-level data scientists that participate in Kaggle competitions accomplished in a few weeks.

Allstate Insurance, which has its own internal team of the world’s best actuaries, had Kaggle run a contest to find a better way to predict which drivers would be most likely to crash their car. The contest yielded a method that was three times as accurate as what Allstate had developed internally.

Kaggle contest winners have four primary characteristics, Howard said: openmindedness to new and sometimes oddball ideas, creativity and curiosity about what others are doing in the field, tenacity to stick to the problem even when someone else is ahead in the contest, and top data science skills.

He said that the attraction of this methodology is that it creates “a meritocracy outside the world of sports.

“We all believe in using data, rather than where someone went to school, who talks the loudest, or someone’s title in the organization, to drive decisions. It is a meritocracy in which the person or team with the best solution wins.”


A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.