ASAPP’s CLIP dataset helps train AI to read clinical notes more efficiently
Artificial intelligence startup ASAPP Inc. has created what it says is a more targeted way of training AI models that can help improve outpatient care.
ASAPP, which sells AI-powered customer care and sales platforms for healthcare organizations, said today that one of the biggest headaches for primary care physicians is dealing with outpatient care.
The problem lies in the fact that hospital discharge notes for each patient often number into thousands of words and are structured with factors such as billing and compliance in mind. That makes it very difficult for a human physician to pick up a patients’ notes and scan through them to understand what follow up actions must be taken. Added to that, most physicians are already pressed for time.
The obvious solution is to use AI to read those notes instead, and ASAPP is trying to make that process more efficient with a newly open-sourced and annotated data set for clinical natural language processing. Called CLInical Follow-uP, or CLIP, it’s a special data set that’s carefully annotated to help AI extract specific action items from clinical notes more easily.
CLIP is built upon MIMIC-III, which is said to be the world’s largest de-identified, open-access data set. ASAPP dug into that data and found 718 full discharge summaries within it. It then labeled each sentence within those summaries, specifying whether they contained a follow up item or not. The follow up items were further classified according to seven action item types, such as scheduling an appointment, following a medical prescription or reviewing pending laboratory results.
“CLIP makes the task of action item extraction tractable, by enabling us to train machine learning models to select the sentences in a document that contain action items,” ASAPP research engineer James Mullenbach wrote in a blog post.
ASAPP has gone a step further too. Along with CLIP it has come up with a method that it calls “task-targeted pre-training” that can be used to build even larger datasets capable of training more accurate models.
Using this method, healthcare providers will be able to select sentences within their own private patient data that look most like those in the annotated CLIP dataset, and then use them to bulk up CLIP with more relevant data. It’s an important advance because pre-training is the most costly step in AI model development because of the massive amounts of data that must be processed.
“We find that it’s possible, and maybe even advantageous, to select data for pre-training in this way, saving time and computational resources while maintaining model performance,” Mullenbach said.
AI models built using its methods can significantly reduce the administrative burden for physicians, enabling them to focus more of their time on actual patient care, he added.
“Our methods can condense notes down to what a primary care physician really needs to know, reducing note size by at least 80% while keeping important action items readily available,” Mullenbach wrote. “This reduction in ‘information overload’ can reduce physicians’ likelihood of missing important information, improving their accuracy and the well-being of their patients.”
ASAPP said it has demonstrated the effectiveness of the CLIP data set and task-targeted pre-training in a paper published on arXiv.org. It demonstrates how its AI algorithms outperformed other models and achieved a level of performance that’s close to that of humans, the company said.
Image: rawpixel/Pixabay
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU