UPDATED 09:00 EDT / JANUARY 23 2019

BIG DATA

Varada lands $7.5M to make data lakes more easily accessible

Israel-based startup Varada has scooped up $7.5 million in a seed funding round that it plans to use to make big data more accessible to enterprises.

The company is touting a big-data inline indexing tool that allows information stored in cloud-based data lakes to be analyzed without preparing or modeling it first.

Data lakes are storage repositories that hold vast amounts of raw data in its native format until it’s needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried quickly for relevant data, and that smaller set of data can then be analyzed to help answer the question.

Data lakes have proved popular with enterprises because they provide more flexibility and more speed than traditional databases. By allowing information to remain in its native format, a far greater and timelier stream of data is available for analysis.

On paper, data lakes seem to be the most efficient way of storing data for easy access because they eliminate the need to perform costly and time-consuming extract, transform and load, or ETL operations first. But data lakes still aren’t as efficient as some users would like, since their structure means that the data stored within them is no longer modeled for a particular analytic need, Varada co-founder and Chief Technology Officer David Krakov told SiliconANGLE.

“So analytics on the data lake employ a ‘brute force’ approach and scan all of the data for a query,” Krakov said. “This is the approach used by the likes of Amazon Web Services’ Athena or EMR Presto, for example, and comes with a high cost and low performance.”

Data teams can pull some tricks in order to reduce time to insights, such as by copying data, partitioning it, pre-aggregating it and so on. But the volumes of data are still much larger, the schemas are more complex and the sheer number of data sources means that maintaining coherency and consistency among the various copies of that data is difficult.

“The result is that most data lakes are a single copy partitioned by a couple of ‘big dimensions’ such as dates and geography,” he said. “Analytics are still mostly brute force that requires custom and time-consuming development and costly maintenance of the ETL flow specific to the task.”

Varada offers a compromise to this, Krakov said. With it, users can choose their high value datasets and use Structured Query Language commands to define them. Varada then materializes this high-value dataset, keeps it synchronized with the data lake and enables much faster analytics.

“By virtue of our inline indexing and distributed architecture we can offer 100 times faster performance than brute-force analytics on any materialized data, and we make that materialization something that can be easily used ad-hoc,” Krakov said.

Varada’s seed funding round was led by Lightspeed Venture Partners, with participation by StageOne Ventures and F2 Capital.

Photo: Pok_Rie/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU