UPDATED 21:16 EST / MARCH 16 2020


Tech firms publish massive CORD-19 dataset to help fight the coronavirus

A consortium of America’s leading technology firms and organizations has come together to create a new artificial intelligence-enabled dataset on the coronavirus to help facilitate research into the disease.

The COVID-19 Open Research Dataset, called CORD-19, is meant to give researchers faster access to the most popular scientific resources on the coronavirus in order to aid their work into how to stop it. The dataset, created following a request by the White House’s Office of Science and Technology Policy, pools more than 24,000 articles that have so far been written about the COVID-19 disease.

One of the main contributors to the dataset was the Allen Institute for Artificial Intelligence, also known as AI2, which played a key role in curating its content. “We think that AI has an important part to play in solving this problem,” said Doug Raymond, general manager of the Semantic Scholar academic search engine at AI2.

The CORD-19 dataset is hosted on the Semantic Scholar website and available for anyone to download. The idea is to make it easier for medical researchers to access the wealth of information on the coronavirus that’s currently scattered over the world wide web.

AI2 has developed natural language processing and other machine learning tools that can be used to extract the key points from scientific research literature, and help academics to find studies or other documents that are most useful for the problems they’re trying to solve.

Microsoft Corp. was another major contributor to the CORD-19 dataset. “It’s all hands on deck as we face the COVID-19 pandemic,” Eric Horvitz, chief scientific officer at Microsoft, said in a press release. “We need to come together as companies, governments and scientists, and work to bring our best technologies to bear across biomedicine, epidemiology, AI and other sciences.”

Also involved was the National Library of Medicine at the National Institutes of Health, which provided access to more than 10,000 scholarly articles that relate to the coronavirus. That content was transformed into a machine-readable format by AI2, and an adaptive feed has been created to keep users up to date on the research fields they’re most interested in.

Georgetown University’s Center for Security and Emerging Technology and the Chan Zuckerberg Initiative were also involved in the effort.

The CORD-19 dataset will be continually updated as new research about the coronavirus is published. In addition, the dataset will also link data from clinical trials, GitHub data archives and other non-academic research.

Analysts heaped praise on the tech companies involved, noting that this is just another example of technology being used for a good cause.

“AI and machine learning depends on the data and so we should be thankful for these companies bringing together all of the data we have on the coronavirus,” said Holger Mueller of Constellation Research Inc. “Now it will be interesting to see who will provide the most value at extracting insights from the data set. We shall see how fast and soon, let’s hope for sooner rather than later.”

“The collaboration between the Allen Institute, Microsoft and Facebook is a good example of the kinds of innovative projects that vendors can facilitate by working together,” Pund-IT Inc.’s Charles King told SiliconANGLE. “The results may be some ways down the road but that’s not surprising when you’re dealing with a novel or essentially unknown virus.”

Researchers who use the database can participate by sharing the data mining tools they use, and any insights they gain, through the Kaggle data science community.

“We’re putting this dataset up in front of our community of 4.3 million data scientists in the hope that the world’s AI community can help find answers to a key set of questions about COVID-19,” said Anthony Goldbloom, co-founder and chief executive officer of Kaggle.

Michael Kratsios, the White House’s chief technology officer, added in a statement that he believes the scientific community will play a crucial role in stopping the coronavirus in its tracks. He has called on researchers to embrace the dataset as much as they can. “The White House will continue to be a strong partner in this all-hands-on-deck approach,” Kratsios said.

Image: Badafest/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Join Our Community 

Click here to join the free and open Startup Showcase event.

“TheCUBE is part of re:Invent, you know, you guys really are a part of the event and we really appreciate your coming here and I know people appreciate the content you create as well” – Andy Jassy

We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.

Click here to join the free and open Startup Showcase event.