UPDATED 18:30 EST / APRIL 05 2022

INFRA

New database architectures needed for improved real-time analytics, says Imply Data CTO

Analytics must be thought of as an intrinsic software layer, says an executive behind a real-time analytics database geared specifically toward powering analytical applications. 

The reason is that as companies shift business to the internet, they’re generating massive amounts of valuable data. That data needs to be taken advantage of and developed as a part of the entire data management process, according to Gian Merlino (pictured), co-founder and chief technology officer of Imply Data Inc.

In order to do that, however, traditional databases, such as the kind some organizations use to power transactional applications, won’t work, Merlino added, in part because queries become more complicated the more they are analyzed.

“The requirements of that kind of application have sort of given rise to a new kind of database,” Merlino said, as he spoke with theCUBE industry analyst John Furrier during the AWS Startup Showcase: “Data as Code — The Future of Enterprise Data and Analytics” event, an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed upcomer database Apache Druid, how it’s helping enterprises shift to digital, and the resulting complex datasets. (* Disclosure below.)

Real-time and historical data merges

Imply Data’s premise is that a database powering transactional apps isn’t the same as one powering analytical applications.

“We integrate this real-time processing and this historical processing,” Merlino said, referring to the mechanics of the open-source back-end database Apache Druid, which Imply Data uses to create its offering.

Two elements must come into play to get value out of the kind of scenario Merlino is referring to: A historical system that pulls data out of an indexed pile and a real-time component that operates on the instant streams of data coming from distributed Kafka, for instance, or Amazon’s Kinesis cloud data processing service. 

“This system is responsible for all the data that’s recent, maybe the last hour or two of data,” Merlino said of the instant data stream. The indexed data is then handled by the historical system. The Druid query layer then blends these two information sources together, seamlessly, using thousands of queries per second.

That’s how Druid works right now. Extending Druid’s query engine is key to Imply Data’s future product development, though, according to Merlino. Now, the Druid query stack is what’s called single-stage, but he intends to convert that to more advanced multi-stage distributed queries. By doing that, the mechanics allow for queries with very large results sets, without a bottleneck, as well as more complex query structures.

The idea is to elaborate on the already rapid-fire querying on the simple queries. The company intends to add the ability to perform more complex queries at the same high speeds. In the future, Apache Druid will cater more to thousand-line-type SQL queries, according to Merlino, and reporting queries will also be developed.

Another future offering for Apache Druid via Imply Data will be a cloud-oriented database solution. By doing that, the company promises to remove some of the complications of setup. 

“Much more developer friendly is what we’re going for,” Merlino said. “Really easy to get started with.”

Performance analysis

Expanding on why organizations need the kind of database Imply Data offers, Merlino recounted how Netflix Inc. was one of the early adopters of Druid. Performance analysis for Netflix (distinct from simply monitoring) was one of the original use cases for the now over 10-year-old Apache Druid, which was created in 2011.

Back then, the movie streamer was fairly unique in needing those kinds of metrics. It not only used Druid for analyzing response times in a particular region, but that was augmented by drilling down to a particular app’s performance by the kind of device it was loaded on or operating system version used. It was “the ability to get really deep in the data,” Merlino said.

The higher level of business being done on the internet is driving interest in Imply Data and Apache Druid. “There’s more and more happening there,” Merlino said, adding that Alibaba, eBay and Airbnb are also Apache Druid users.

“Anything that is connected to the internet, anything that’s serving customers on the internet, it’s going to generate an absolute mountain of data. People want to try to get value out of this,” Merlino stated. “Real-time data matters, but also historical context matters.”

Watch the complete video interview below, part of SiliconANGLE’s and theCUBE’s coverage of the AWS Startup Showcase: “Data as Code — The Future of Enterprise Data and Analytics” event:

(* Disclosure: TheCUBE is a paid media partner for the AWS Startup Showcase: “Data as Code — The Future of Enterprise Data and Analytics” event. Neither Imply Data Inc., a sponsor for theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU