NEWS
NEWS
NEWS
Yahoo has just handed over what it claims is the world’s largest-ever machine learning dataset to the academic research community through its ongoing program, Yahoo Labs Webscope. The company said it’s hopeful that the release will encourage more people – not just data scientists and researchers – to try their arm at machine learning.
“Our goals are to promote independent research in the fields of large-scale machine learning and recommender systems, and to help level the playing field between industrial and academic research,” said Suju Rajan, director of research for personalization science at Yahoo Labs, in the announcement.
The whopping 13.5TB dataset contains the anonymized data Yahoo has accumulated from the interactions of around 20 million of its users, from February 2015 to May 2015.
Yahoo Labs Webscope is a data-sharing project where Yahoo stores massive amounts of anonymized data. The company has now authorized its use for non-commercial purposes.
Much of the data has to do with Yahoo users’ interactions with the news feeds on Yahoo properties like Yahoo News, Yahoo Sports, Yahoo Movies and its home page. As well, Yahoo is providing lots of anonymized demographic data, like the ages, gender and locations of a subset of its anonymized users. The data also includes timestamps and other data from the end user’s device, as well as the title, summary and key phrases from the articles users have interacted with.
Yahoo’s donation comes as interest in machine learning rapidly gathers pace. A number of big Web companies, including Google and IBM, have recently open-sourced their machine learning algorithms to help researchers get closer to building machines and applications that can show true artificial intelligence.
“Machine learning is a core transformative way by which we are rethinking everything we are doing,” said Google CEO Sundar Pichai in October, shortly before it open-sourced its TensorFlow machine learning software.
More recently Microsoft got in the game, open-sourcing its DMTK machine learning toolkit. But Yahoo’s release is important in a different way, because it makes it possible for individuals and small organizations that don’t have the compute resources to begin using machine learning too.
“We hope that this data release will similarly inspire our fellow researchers, data scientists, and machine learning enthusiasts in academia, and help validate their models on an extensive, ‘real-world” dataset’,” Rajan said.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.