UPDATED 09:00 EDT / APRIL 25 2019

BIG DATA

Okera ups its game in data lake governance

Data governance startup Okera Inc. today is adding attribute-based access control and automated business metadata tagging to the policy enforcement capabilities of its software for managing, securing and governing data access on data lakes at large scale.

The company, which launched from stealth mode last spring, is tackling the problem of data governance in data lakes, which are repositories of data stored in their native formats. Data lakes are popular back-ends for analytics and data mining projects because they bypass the need for complex and lengthy cleansing and normalization procedures.

The downside is that they can become messy as more data is added to them. Okera said its platform not only enables data lake administrators to keep track of all their data in one place but also enforces access rules down to the field level. The new features automate many of the manual processes associated with enforcing access controls.

“Typically, access control is done by mapping users to data sets based on roles,” said Amandeep Khurana, Okera’s co-founder and chief executive. “As the number of data sets increases, and as more complex policies are put in place, managing policies based on data set names becomes operationally complex very quickly.”

The problem is that access controls are usually defined at the data set level rather than by individual data elements, Khurana said. Okera enables those controls instead to be enforced at the property level.

For example, policies may be set for Social Security numbers across the entire data lake, regardless of how many data sets are involved. Without such fine-grained controls, policies would have to be applied individually to each data set, of which there can be thousands in large data lakes.

Administrators can tag a table as “sales data” or tag columns as “personally identifiable information” and then automatically apply access policies based on those tags across multiple datasets and users, the company said. This has the collateral benefit of simplifying documentation because controls are defined by tags rather than column labels or data set names.

“To say a user has access to column C1 isn’t intuitive,” Khurana said. “This is defining the system for the user rather than the user to the system.”

Okera said its software can also now automatically discover and tag sensitive data by scanning for properties such as field lengths and document formats. A set of default rules is supplied and users can define their own.

The new release also features a native client interface for Java Database Connectivity, an specification for connecting programs written in Java to back-end databases. This enables users of popular analytics tools like Tableau Software Inc.’s Tableau and Microsoft Corp.’s Excel to access data in the data lake from within those applications with consistent governance rules preserved.

Founded by two former Cloudera Inc. executives, Okera has raised $14.6 million in funding. The new capabilities are being added to the core platform at no extra charge. Pricing is based on usage, but Okera didn’t provide details.

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU