Tech firms publish massive CORD-19 dataset to help fight the coronavirus
A consortium of America’s leading technology firms and organizations has come together to create a new artificial intelligence-enabled dataset on the coronavirus to help facilitate research into the disease.
The COVID-19 Open Research Dataset, called CORD-19, is meant to give researchers faster access to the most popular scientific resources on the coronavirus in order to aid their work into how to stop it. The dataset, created following a request by the White House’s Office of Science and Technology Policy, pools more than 24,000 articles that have so far been written about the COVID-19 disease.
One of the main contributors to the dataset was the Allen Institute for Artificial Intelligence, also known as AI2, which played a key role in curating its content. “We think that AI has an important part to play in solving this problem,” said Doug Raymond, general manager of the Semantic Scholar academic search engine at AI2.
The CORD-19 dataset is hosted on the Semantic Scholar website and available for anyone to download. The idea is to make it easier for medical researchers to access the wealth of information on the coronavirus that’s currently scattered over the world wide web.
AI2 has developed natural language processing and other machine learning tools that can be used to extract the key points from scientific research literature, and help academics to find studies or other documents that are most useful for the problems they’re trying to solve.
Microsoft Corp. was another major contributor to the CORD-19 dataset. “It’s all hands on deck as we face the COVID-19 pandemic,” Eric Horvitz, chief scientific officer at Microsoft, said in a press release. “We need to come together as companies, governments and scientists, and work to bring our best technologies to bear across biomedicine, epidemiology, AI and other sciences.”
Also involved was the National Library of Medicine at the National Institutes of Health, which provided access to more than 10,000 scholarly articles that relate to the coronavirus. That content was transformed into a machine-readable format by AI2, and an adaptive feed has been created to keep users up to date on the research fields they’re most interested in.
Georgetown University’s Center for Security and Emerging Technology and the Chan Zuckerberg Initiative were also involved in the effort.
The CORD-19 dataset will be continually updated as new research about the coronavirus is published. In addition, the dataset will also link data from clinical trials, GitHub data archives and other non-academic research.
Analysts heaped praise on the tech companies involved, noting that this is just another example of technology being used for a good cause.
“AI and machine learning depends on the data and so we should be thankful for these companies bringing together all of the data we have on the coronavirus,” said Holger Mueller of Constellation Research Inc. “Now it will be interesting to see who will provide the most value at extracting insights from the data set. We shall see how fast and soon, let’s hope for sooner rather than later.”
“The collaboration between the Allen Institute, Microsoft and Facebook is a good example of the kinds of innovative projects that vendors can facilitate by working together,” Pund-IT Inc.’s Charles King told SiliconANGLE. “The results may be some ways down the road but that’s not surprising when you’re dealing with a novel or essentially unknown virus.”
Researchers who use the database can participate by sharing the data mining tools they use, and any insights they gain, through the Kaggle data science community.
“We’re putting this dataset up in front of our community of 4.3 million data scientists in the hope that the world’s AI community can help find answers to a key set of questions about COVID-19,” said Anthony Goldbloom, co-founder and chief executive officer of Kaggle.
Michael Kratsios, the White House’s chief technology officer, added in a statement that he believes the scientific community will play a crucial role in stopping the coronavirus in its tracks. He has called on researchers to embrace the dataset as much as they can. “The White House will continue to be a strong partner in this all-hands-on-deck approach,” Kratsios said.
Since you’re here …
Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!
Support our mission: >>>>>> SUBSCRIBE NOW >>>>>> to our YouTube channel.
… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.