UPDATED 07:00 EDT / FEBRUARY 06 2018

BIG DATA

Podium Data takes its data lake catalog to the cloud

With the new 3.2 release of its Data Marketplace, Podium Data Inc. is taking its first steps outside the on-premises world and bringing self-service big data to the cloud.

The Data Marketplace is essentially a data catalog for use with data lakes that eliminates the need for the extensive extraction and massaging procedures that characterize pure-Hadoop models. Podium promotes the software as providing self-service, on-demand access to quality data.

It uses a proprietary data loader to pull information quickly from internal systems, including notoriously difficult-to-access platforms such as mainframes. The information is then converted into a standard format that business users can access.

The architecture requires no clusterside installation and so works with the most popular big data platforms. With the 3.2 release, users can now combine on-premises and cloud data in any combination, the company said. Podium architecture separates storage from computing to enable data of the data delivery teams to support multiple variations of an analytical application from a single store. With version 3.2, sources now include Amazon Web Services Inc. and Microsoft Corp. Azure clouds.

“When you look at how data gets into the hands of business people, the traditional data supply chain was heavy on data engineering and programming,” said Chief Executive Paul Barth. “This is a turnkey solution that lets users search for data, put it in shopping carts, combine it and compare it.”

Using a metadata-driven catalog enables the repository to “know what data has been used and the production processes, and it gets smarter over time,” Barth said. Machine learning works on the supply side to learn about data quality and governance standards during the load process.

“When we ingest data into the Marketplace, we build out metadata about that information, pull out dirty data records and set up access control policies,” he said. The platform uses parallel-processing Hadoop engines and a patent-pending algorithm that “looks at every byte and compares it to any technical constraints customers have defined about what is an acceptable record.”

Nonconforming “ugly records” are set aside and kept out of the production data set. Podium claims its platform accelerates delivery of new data to business users up to 25-fold and reduces data delivery costs by 40 percent. Version 3.2 permits assets inside and outside the cloud to be merged and joined. Barth said elasticity features support hundreds of users and concurrent workloads.

“All of that can be done using our single-node application without having to spin up a new cluster,” he said. “We can right-size the cluster for the size of the load that’s running.”

Pricing was not released. Founded in 2014, Podium Data has raised nearly $12 million in funding.

Image: Podium Data

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU