UPDATED 15:48 EST / JANUARY 23 2020

CLOUD

Google launches Dataset Search out of beta with new capabilities

After more than a year of testing, Google LLC today launched its Dataset Search service out of beta test mode with new capabilities aimed at enabling users to find information faster.

Dataset Search is a version of the company’s search engine designed specifically for browsing collections of scientific and technical information. Google has to date indexed close to 25 million datasets that span topics ranging from volcano activity to the social behaviors of puppies. The information comes from governments, universities and other organizations engaged in research activities.

Open-source data is playing an increasingly important role in the technology landscape amid the rapid spread of artificial intelligence. The more sophisticated the AI, the more training data it needs to crunch to become production-ready. A portal such as Dataset Search where AI developers can search records in a centralized manner has the potential to be a valuable tool for machine learning projects. 

Google is marking Dataset Search’s launch from beta with the introduction of new features meant to make the service even more useful. To start, the company claims it has “significantly improved” the quality of the descriptions for information repositories. There are also new filters that allow users to narrow down search results based on what kind of dataset they require.

“You can now filter the results based on the desired types of dataset that you want (e.g., tables, images, text), or whether the dataset is available for free from the provider,” Google research scientist Natasha Noy wrote in a blog post. “If a dataset is about a geographic area, you can see the map.”

Finally, the service is now accessible on mobile devices. Noy told The Verge that Google plans to continue improving Dataset Search by adding features to let users explore datasets “when they don’t necessarily know what they are looking for.”

Dataset search - skiing

AI developers are far from the only knowledge workers can take advantage of the service in their projects. Dataset Search is used by several hundred thousand people worldwide, including academic researchers, business analysts and students.

The groundwork for the service was laid all the way back in 2011, when Google LLC, Yahoo! and Microsoft Corp. launched a joint open-source project called Schema.org. The companies set out to create a universal standard for formatting web pages that contain structured data such as research files. Schema.org has since been adopted by the majority of the world’s governments, along with numerous academic institutions, and Dataset Search employs the standard to index the records it serves up to users. 

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.