Ely Kahn, Co-founder and Vice President of Business Development at Sqrrl, discussed data quality and data security with theCUBE co-hosts Dave Vellante and Jeff Kelly, live at the MIT CDQIQ Symposium.
Data quality had been a big issue and lead to the development of the Accumulo database, Kahn explained. Accumulo was initially an NSA project and Sqrrl is an NSA spin-off. When it comes to data quality, one aspect is “simply getting access to the data you need,” the second being to make sure the data is clean.
Intelligence analysts in the NSA and CIA had great difficulty getting to the information they needed in siloed databases, Kahn explained. Different types of classified materials, different databases, various types of information needed to be queried. “Accumulo allowed folks to securely query across all the data and providew a single platform storing petabytes of information.”
Storing & Securing All Data on a Single Platform Requires a Huge Cultural Transition
Asked to explain the organizational considerations around data, Kahn said that “inside the Big Data space, you often don’t hear about the change management and cultural transition that need to happen to go to a platform where all your data sits.” Formal organization policies that enable this change need to be created. Such a platform mixes data with different security requirements, and organizations need to be comfortable with the security changes, and only then implement the actual platform.”
Watch the full interview with Ely Kahn below:
Building awareness through examples of excellence
Sqrrl needs to support its customers in tackling the communication around that new technology when they implement their solutions. “The most effective way of tackling that problem is through examples of excellence. We started with small pilots, put the pilot out there as an example of excellence,” and built momentum around it. The National Institutes of Health and National Cancer Institute will implement the technology next, according to Kahn. One of the projects it would be used for involves extracting all cancer information and putting it into the same cloud-based platform. “What they are looking to do is bring the data into a single cloud, refine the data into a standardized format that can be accessed by institutions, and make sure that more researchers can access Big Data assets.”
Data-level security in Hadoop
Asked about the uptake of the recently launched Sqrrl enterprise solution, Kahn said it was already “in production with a good number of clients now. A lot of these clients have been utilizing Hadoop in a sandbox environment.” They now use the concept of data-centric security. “A lot of folks are starting to talk about security in Hadoop,” he said, but a big part of that conversation focuses on how to properly authenticate people into your Hadoop cluster. “Bringing security to the data, itself, is the most effective way to secure data.”
Exploring the company’s most important vertical markets, Kahn mentioned government, healthcare, and telecommunication companies. Banks are also a big part of their target market. “It’s less of a collaborative community, but because they place such a premium on security” and how data can flow through their institutions, “a data-centric security approach will be very important in financial services.
“Some of our secret sauce is in what we call our policy engine. We will look into their security policies and translate them into machine policies” that can be used on data, says Kahn. The company is also hiring right now. It currently has about 20 employees and is “like a real company. We plan to expand in our key verticals – healthcare, government, telecom.” To support their strategy, Sqrrl is promoting the use cases in cybersecurity, and healthcare analytics where it aims at “looking for patterns into data and predict medical diagnoses.”