Okera brings intelligent schema management to S3 data lakes
Okera Inc., a startup founded by two former Cloudera Inc. executives to simplify the management of large heterogeneous data stores at scale, today is introducing a schema management tool designed to make it easier for companies to find, access and structure data from popular data analytics tools running on top of Amazon Web Services Inc.’s S3 cloud storage service.
The company, which launched out of stealth mode in May with $14.6 million in venture financing, specializes in data governance for data lakes, which are collections of largely unstructured data that aren’t organized according to a schema, which is a visual representation of the relationship between tables in a database. Schemas are typically applied to structured data prior to being used in production, but unstructured data can defy such rigid classification.
“All of the functionality that we’ve become used to in the world of relational databases has been missing from data lakes,” said Okera CEO Amandeep Khurana. ”We’re bringing that functionality.”
The new release of Okera’s Active Data Access Platform features what the company calls “intelligent schema management,” which it says enables data administrators to automatically discover new data sets, infer their schemas and assign universal access permissions at a fine-grained level.
It also features a new file system manager that the company said streamlines the discovery, access, governance and use of unstructured data in S3 data stores. Supported analytics platforms include Amazon’s Elastic Map Reduce, Apache Hive, Apache Presto, Apache Spark and business intelligence software from Tableau Software Inc., Birst Inc. and Qlik Inc.
The platform is similar to a data catalog in that it enables data to be registered and governed according to an assigned set of metadata. However, “Most catalogs focus on business metadata. We are the technical and operational metadata,” Khurana said. “With schema ingest, we’re making life easier for the data producer who’s on-boarding the data set.”
Data lakes have been plagued by a lack of tools to provide structure and access control, both of which are essential to performing reliable analysis without risking inadvertent disclosure.
Okera says its platform not only enables administrators keep track of all their data in one place but also enforce access rules down to the field level. Okera says it can automate these administrative procedures at scale, and that it is already managing multipetabyte data lakes for customers.
Pricing is based on usage, but Okera didn’t provide details.
Photo: Unsplash
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU