Ed. Note: This is the first of a three-part series looking at how IBM is applying its advanced Watson natural-language cognitive system technology, arguably the world’s most sophisticated Big Data query engine, to the challenge of improving the level of care and outcomes for patients with some of the most intractable and most complex cancers, including lung and blood cancers. This part defines the problem. The second part will look at what Watson is, and the third will discuss IBM’s vision for Watson in healthcare.
Especially with the recent Strata Big Data Conference (see SiliconAngle.TV’s coverage here), everybody is talking about Big Data. But when the question can be a Gbyte long, the database is the entire corpus of medical research and patient outcomes for lung cancer, and the answer could well be a matter of life or death for a patient, then that is Big Data!
Now imagine being able to run this huge query without programming, knowledge of MapReduce or any other technical language, merely by asking the computer a question, much as Captain Picard did on StarTrek. Except instead of asking “Computer, where is Mr. Data”, a doctor can ask “What are the optimal treatment strategies for this patient & what is your confidence in their ranking.” And Watson’s reply will be a customized, personal treatment plan, based on that individual’s entire medical record, including genetic and epigenetic data, doctors’ notes, and recorded observations over the years. And the computer will include a list of suggested medical tests that are most likely to produce significant new data to refine those recommendations.
That is exactly what the IBM Watson team has spent the last two years working to develop with experts at the Memorial Sloan-Kettering (MSK) Institute and WellPoint, based on the Watson engine unveiled two years ago on Jeopardy. In February they announced the first commercial Watson-based cognitive computing breakthroughs, a major step toward fulfilling the dream of personalized treatment for cancer and other life-threatening diseases.
Watson was developed initially to meet the challenge of answering spoken, natural language questions against a very large knowledge base with human levels of accuracy, says Watson CTO, IBM Fellow, and VP Rob High. That was the Jeopardy challenge placed before IBM Research, and Watson’s victory on live TV against two Jeopardy champions provided they and done that. But then, having accomplished that, they looked around for other challenges.
“We wanted to apply Watson where we could make a difference,” High said. That was why they chose oncology and, specifically, lung cancer, one of the hardest cancers to treat successfully. The team is also working on breast and blood cancers.
The corpus of medical data – studies, drug and treatment trial results, experiences in actual practice, medical articles & papers, etc., is in the multiple-exabyte range. And it is growing rapidly as new studies and papers are published faster than oncologists can digest them.
Simultaneously patient information is growing at a similar astronomical rate. A decade ago a typical medical record might be a few pages long and include results of a small number of tests. Today for a cancer patient it will include the person’s entire DNA with notes, which can be multiple megabytes of data. And researchers are increasingly realizing that even that is not enough. Epigenetics, the influences that determine what genes are turned on or off at a given time, are becoming the new frontier. Those epigenetics are the reason that every person, even identical twins, have different fingerprints, & why identical twins do not often die of the same diseases. The result is that that patient record can be a gigabyte or more. And that, High says, is what the project uses for the question.
The goal of the analysis is to build an individualized treatment plan for each patient that uses all the findings, including the latest treatments and published advances in evidence-based medicine, matched in some cases gene-by-gene with the patient record. Then over time to adjust that treatment plan to reflect the patient’s response to the treatment along with any other medical or personal issues & advances in medical knowledge.
This is particularly important for cancer treatments because each patient and cancer responds differently, and many of the treatments involve poisons. Basically cancer treatment is an attempt to poison the cancer faster than the patient. So individualized treatment plans are vital to create the maximum chance of a cure. But developing those plans is so complicated that today at best a team picks what appears to be the best standard treatment regimen for the patient and adjusts that to reflect the patient’s reactions to that treatment. Watson promises true individualized treatment for the first time.
The system must also include advanced data cleansing, because ultimately the quality of the results can only be guaranteed if the quality of the data is assured. And of course the database must meet the stringent patient privacy and security requirements of the U.S. Health Insurance Portability and Affordability Act (HIPAA) and similar regulations in Canada, the European Union, Japan and other jurisdictions.