BIG DATA
BIG DATA
BIG DATA
To get their real-time data acts together, companies might take a tip from a guy who helped build Twitter, a site synonymous with always-on streams.
First off, Apache Hadoop’s big data framework doesn’t have the brains for real-time, according to Karthik Ramaswamy (pictured), formerly engineering manager at Twitter and now co-founder of Streamlio, an enterprise real-time data project. He is also a member of the faculty in the EECS Department at UC Berkeley.
“It kind of becomes a storage sea where all the data comes and stores there,” Ramaswamy said of Hadoop during the Data Platforms event in Litchfield Park, Arizona.
Hadoop’s strength is in sheer capacity for data — its abilities in real-time data and especially real-time analytics are quite limited, he told George Gilbert (@ggilbert41) and Jeff Frick (@JeffFrick), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio. (* Disclosure below.)
The reason for this is that visibility of data at all stages, from creation point to landing, is not possible in Hadoop, Ramaswamy stated. “You can kind of dump the data in real-time into Hadoop, but until you close the file, you cannot see the data at all, right?” he said.
In order to gain real-time visibility of data, there must be a distributed log that shows data from its entrance point onward, he explained. “The moment the data comes in, the data is immediately visible within the three to five millisecond time frame,” he said.
Streaming data platform Apache Kafka uses a distributed log in this way, Gilbert noted.
This highly visible streaming data can help rejigger analytics models on the fly, Ramaswamy stated.
“Once the model is built, the model is pre-loaded into the real-time compute environment like Heron [Twitter’s open-source data streaming engine],” he said.
The next step is model enhancement based on analysis of users’ changing behavior as shown through real-time data streams. It will then be possible to look up the model and serve data such as a relevant ad for a user to click on, he added.
Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s independent editorial coverage of Data Platforms 2017. (* Disclosure: TheCUBE is a paid media partner for Data Platforms 2017. Neither Qubole Inc. nor other sponsors have editorial influence on theCUBE or SiliconANGLE.)
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.