The data integration debacle; beyond ‘nirvana’ solutions | #MITCDOIQ
“Data integration is the 800-pound gorilla in the corner, and everyone’s got it in spades,” according to Mike Stonebraker, MIT professor and data scientist. The recent recipient of the Turing Award, which is presented for major contributions of lasting importance to computing, Stonebraker sat down with theCUBE, SiliconANGLE’s Media team, during MIT CDOIQ Symposium.
Stonebraker concluded that many members of the industry believe that introducing and adhering to standards will solve current data integration issues. However, the professor remarked that such expectations were a kind of “nirvana.” For the time being, Stonebraker said he was “in favor of after-the-fact data integration to capture value in the short run.”
Stonebraker gave the example of the Beth Israel Deaconess Medical Center, which has data for “26k intensive care unit patients creating real-time data” from the monitoring equipment, as well as information on prescriptions and notes from doctors and nurses. This hospital is currently working to incorporate imaging as well. The goal of such a system, according to Stonebraker, is to be able to access all of that data at once, even if that information is generated at a different hospital.
Privacy considerations more challenging than technologies
Eventually, Stonebraker envisions, a patient with chest pains would be x-rayed, and his or her physician would be able to run a query on every x-ray worldwide that resembles that patient’s images. Nevertheless, Stonebraker declared, “Privacy considerations are more challenging than technologies.” He expressed concerns not only with the regulatory body overseeing patient data HIPPA, but also stated that a national record system would be blocked by politics between hospitals.
Stonebraker discussed his view of Big Data as a “marketing buzz word” for three reasons: too much, denoting an issue with “volume”; too fast, marking a problem with “velocity”; or too many, indicative of what Stonebraker calls “variety.” Tamr, a BI and analytics tool that Stonebraker helped create, helps “scale in variety.” Data integration cannot be done using standard techniques, according to Stonebraker. Instead, what Tamr does is isolate source number one and “de-duplicate” using statistical techniques. Additionally, the program sets a “threshold for accuracy,” which ultimately comes down to an “accuracy versus cost” choice.
Stonebraker said Tamr “organizes their human labor differently” so that a duplication would be analyzed by a domain expert in the event a company chooses not to utilize automatic processes. Tamr uses “crowd-sourcing for domain experts” for questions as well. Tamr gets smarter over time and begins to use the parameters previously used for duplications automatically,” he concluded.
Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of MITCDOIQ Symposium 2015.
Photo by SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU