UPDATED 18:30 EDT / MAY 18 2021

NEWS

ChaosSearch aims to disrupt data lake log analytics at scale for ‘indexing in place’

Indexing data lakes in situ, rather than performing extraction, transforming and load processes is the best way to get value out of the pools of raw data, says a patent holder who has come up with a deep-tech method for doing it.

Cloud data indexing specialist company ChaosSearch Inc. believes moving data around and out of the storage repositories known as data lakes just to analyze it is so onerous and laborious, at scale, that it’s thwarting many data lake owners from getting anywhere close to fully capturing and then analyzing their log analytics.

Log analytics are used by digital enterprise to insight customer interactions with a website or for enterprise applications, as examples. Trouble can be identified through analytics in a case like that, such as reliability issues that come up during high-volume usage.

“They just can’t keep up with it,” said Ed Walsh (pictured), chief executive officer of ChaosSearch. “They can’t handle the scale.”

In anticipation of the AWS Startup Showcase: The Next Big Thing in AI, Security, & Life Sciences event — set to kick off on June 16 – Dave Vellante, host of SiliconANGLE Media’s livestreaming studio theCUBE, spoke with Walsh for a special CUBE Conversation on how ChaosSearch is aiming to disrupt data lake log analytics at large scale by using AWS S3 cloud and open application programming interfaces. (* Disclosure below.)

‘Indexing in place’

“What we do is allow you to literally keep it in place. We index it in place,” Walsh said.

The startup’s vision is to make the raw data, deposited in Amazon S3, or S3 Glacier, available for analysis through open APIs with multi-model access, using Search, SQL and upcoming machine learning.

Multiple benefits are accomplished by doing that, according to Walsh. Primarily, the data stays in the lake, and, importantly, as a lake. In other words, by not grabbing bits of data and removing it from the lake through ETL to transform and work on it, you don’t end up with separated, difficult-to-manage and govern datasets all spread out.

“Datasets end up being data puddles” if one pulls out too much, according to Walsh. That problem of big chunks of data moving around in the enterprise becomes particularly prevalent in the cloud.

“Once you go cloud-native, that mound of machine-generated data that comes from the environment dramatically just explodes,” he said. “You’re not managing hundreds or thousands or maybe 10,000 endpoints. You’re dealing with millions or billions. So, logs become one of the things you can’t keep up with.”

How data lake analytics is accomplished

Not doing transformation was an idea behind traditional data lakes, Walsh pointed out. The idea being that a data lake was where you put your data in a scalable, resilient environment so you did not have to do transformation.

“It’s too hard to structure for databases and data warehouses,” Walsh said. But it hasn’t really worked like that: It’s all too cumbersome at large scale.

“What we avoid is the ETL process,” he said. Looking at the index and doing a full schema discovery is part of the process. Sample sets can be provided, then advanced transformations using code, pulling the data apart and then providing role-based access to the end user — but “in a format that their tools understand,” Walsh added. Importantly, this happens when it’s still in the lake as read-only — the data isn’t changed.

The way ChaosSearch gets there is by never moving the data out of S3. A traditionally created, out-of-S3 schema doesn’t have to be generated. “The big bang theory of ‘do data lake and put everything in it’ has been proven not to work,” Walsh said. ChaosSearch, though, fixes that, he added.

“Just put it in S3, and we activate it with APIs and the tools your analysts use today, or what they want to use in the future,” Walsh explained. That transformation, within S3, is performed by the ChaosSearch patent. It’s done virtually and available immediately.

In the past, moving data using big teams — creating a pipeline into Elasticsearch, for example — could have taken an organization weeks, according to Walsh. “Which becomes kind of brutal at scale,” he said.

An ETL of the data source could take three weeks to three months in enterprise. “We do it virtually in five minutes,” Walsh claimed.

ChaosSearch makes S3 a hot analytic environment, with open APIs. It’s different compared to everybody else, mainly because you don’t have to put the data in some form of schema format to access it, according to Walsh.

“Just put it there, and I’ll give you access to it,” he said. “No one else does that.”

Here’s the complete video interview, one of many CUBE Conversations from SiliconANGLE and theCUBE. And tune in to theCUBE’s live coverage of the AWS Startup Showcase: The Next Big Thing in AI, Security, & Life Sciences event on June 16. (* Disclosure: ChaosSearch sponsored this CUBE Conversation. Neither ChaosSearch nor other sponsors have editorial control over the content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

ChaosSearch aims to disrupt data lake log analytics at scale for ‘indexing in place’

‘Indexing in place’

How data lake analytics is accomplished

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Open Source Summit NA 2025

Data Protection & AI Summit

AWS & Ecosystem Leaders Halftime Report - 2025

Black Hat USA 2025

CrowdStrike Fal.Con 2025

RECENT CUBE EVENTS

AppDev Done Right Summit 2025

Broadcom Delivers the Modern Private Cloud 2025

Databricks Data + AI Summit 2025

AWS Summit Washington, DC 2025

Google Cloud Partner AI Series 2025

ChaosSearch aims to disrupt data lake log analytics at scale for ‘indexing in place’

‘Indexing in place’

How data lake analytics is accomplished

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Open Source Summit NA 2025

Data Protection & AI Summit

AWS & Ecosystem Leaders Halftime Report - 2025

Black Hat USA 2025

CrowdStrike Fal.Con 2025

AppDev Done Right Summit 2025

Broadcom Delivers the Modern Private Cloud 2025

Databricks Data + AI Summit 2025

AWS Summit Washington, DC 2025

Google Cloud Partner AI Series 2025

Cookies