UPDATED 14:55 EDT / JUNE 21 2018

EMERGING TECH

Microsoft launches an online research hub for sharing AI and science datasets

In a bid to foster scientific collaboration, Microsoft Corp. today launched an online hub that will provide a place for researchers to share the datasets they produce as part of their work.

The company is leading by example. On launch, the Microsoft Research Open Data portal features dozens of datasets that have been produced by its own staff as part of published research studies. The repository covers a variety of fields ranging from computer science to biology.

“I am often asked to share my research data and the public sharing I have done in the past has been popular,” commented Microsoft principal researcher John Krumm. “Coordinating and cataloging these datasets in one place with Azure will be helpful for both internal and external researchers, giving them easy access, encouraging collaboration, and providing convenient cloud-based access to the wealth of Microsoft Research shared data.”

Microsoft Research Open Data has a strong computer science slant, with a particular focus on artificial intelligence fields such as natural language processing. That’s not surprising given that the company’s research division has dedicated much of its work to these areas in recent years. Microsoft is investing heavily in building out its AI capabilities as rivals such as Alphabet Inc. do the same.

The sections of the new data hub that are dedicated to other fields, such as physics, currently contain only a handful of items. But that could change over time as Microsoft works to draw researchers from outside its ranks. Another strong motivation for adding more domain-specific datasets is that such information can be useful in AI projects, mainly when it comes to training models. 

Microsoft hopes that the hub will complement the existing research data repositories out there. “The goal is to provide a simple platform to Microsoft researchers and collaborators to share datasets and related research technologies and tools,” Vani Mandava, a director of data science outreach at Microsoft, wrote in a blog post.

“Microsoft Research Open Data is designed to simplify access to these datasets, facilitate collaboration between researchers using cloud-based resources and enable reproducibility of research,” Mandava added. 

To help researchers put the datasets to use, the hub provides integration with Microsoft’s Azure cloud platform. Users can download information onto preconfigured virtual machines that feature popular data science and developments tools.

Microsoft isn’t the only tech giant that has made internal AI datasets public in a bid to advance research. Alphabet is prolific on this front as well, having made contributions to areas such as computer vision, natural language processing and geospatial analysis.

Photo: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU