UPDATED 15:30 EDT / MAY 31 2017

BIG DATA

Real-time data is as good as its analytics, says Twitter alumnus

To get their real-time data acts together, companies might take a tip from a guy who helped build Twitter, a site synonymous with always-on streams.

First off, Apache Hadoop’s big data framework doesn’t have the brains for real-time, according to Karthik Ramaswamy (pictured), formerly engineering manager at Twitter and now co-founder of Streamlio, an enterprise real-time data project. He is also a member of the faculty in the EECS Department at UC Berkeley.

“It kind of becomes a storage sea where all the data comes and stores there,” Ramaswamy said of Hadoop during the Data Platforms event in Litchfield Park, Arizona.

Hadoop’s strength is in sheer capacity for data — its abilities in real-time data and especially real-time analytics are quite limited, he told George Gilbert (@ggilbert41) and Jeff Frick (@JeffFrick), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio. (* Disclosure below.)

The reason for this is that visibility of data at all stages, from creation point to landing, is not possible in Hadoop, Ramaswamy stated. “You can kind of dump the data in real-time into Hadoop, but until you close the file, you cannot see the data at all, right?” he said.

In order to gain real-time visibility of data, there must be a distributed log that shows data from its entrance point onward, he explained. “The moment the data comes in, the data is immediately visible within the three to five millisecond time frame,” he said.

Streaming data platform Apache Kafka uses a distributed log in this way, Gilbert noted.

Model behavior

This highly visible streaming data can help rejigger analytics models on the fly, Ramaswamy stated.

“Once the model is built, the model is pre-loaded into the real-time compute environment like Heron [Twitter’s open-source data streaming engine],” he said.

The next step is model enhancement based on analysis of users’ changing behavior as shown through real-time data streams. It will then be possible to look up the model and serve data such as a relevant ad for a user to click on, he added.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s independent editorial coverage of Data Platforms 2017. (* Disclosure: TheCUBE is a paid media partner for Data Platforms 2017. Neither Qubole Inc. nor other sponsors have editorial influence on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU