UPDATED 14:19 EDT / JUNE 28 2016

NEWS

Hortonworks tightens Hadoop security, intros Spark-based notebook for data scientists

As Hadoop Summit opens today in San Jose, CA, Hortonworks Inc. has some new goodies for the 4,000-plus people who are expected to attend.

Version 2.5 of the Hortonworks Data Platform (HDP) boosts security with improved dynamic classification capabilities that can be managed by policies. Customers can use Apache Atlas to classify and assign metadata tags, which are then enforced through Apache Ranger policies. In addition, Atlas also now provides cross-component lineage for better documentation of data dependencies.

Tagging “helps organizations link security policies against different tags or categories of data, so I can tag certain data as personally identifiable and enable access policies against those tags,” said Matt Morgan, vice president of product and alliance marketing at Hortonworks. The same tactic can be used for combinations of data, such as names and social security numbers, which are innocuous on their own but when combined present a serious security issue.

Hortonworks is hoping that this year’s Summit – which features speakers from companies like Macy’s Inc., Capital One Financial Corp., Progressive Corp. and ConocoPhillips Co. – will showcase Hadoop’s breakout into the commercial mainstream. “We’re just reaching the point where organizations are driving these transformational case studies,” Morgan said.

Hortonworks will deliver the enhancements under a new release schedule announced in March under which parts of the core platform, which include the HDFS file systems, YARN resource manager and MapReduce programming model are released together as part of the Open Data Platform (ODP) initiative. Other projects that form part of the Hadoop ecosystem – such as Apache Spark, Apache Hive and Apache Ambari – are released according to a community schedule. “We were under enormous pressure to get innovations out because they change so quickly,” Morgan said. However, customers are less eager to update their foundation platform so frequently.

Also announced today is general availability of Apache Zeppelin, a Spark-based notebook for data scientists that Hortonworks President Herb Cunitz has likened to “Tableau for Spark” in a reference to the popular visualization engine from Tableau Software Inc. Zeppelin is a graphical environment that lets scientists create and share visualizations of data. “This is built on Spark, so you’re getting all the velocity that Spark provides as an in-memory system,” Morgan said. The project was announced in March.

Other new releases include the following:

  • A new version of Apache Ambari can be used to plan install and securely configure HDP as well as provide easier ongoing maintenance and management. An integrated log search and access capability enables operators to search, browse and filter their cluster operational logs for easier management. Also a new role-based access control model enables administrators to provide different users with a controlled set of functional access to the cluster.
  • Streamlined backup and restore capabilities have been added to Apache HBase allowing operators to perform incremental backups. In addition, multi-tenancy enables soft partitioning of data and nodes on the cluster and allocates specific data storage and data processing resources to specific internal or external tenants like departments within an enterprise.
  • A new version of Apache Storm adds stream-processing features such as sliding and tumbling window to enable developers to take snapshots of data in a stream and use it to create additional analytics. Connectors have also been added for new connector for search and noSQL database management systems.
  • Hive 2.0 with LLAP (Live Long and Process) Improved interactive query capabilities that make Hive SQL queries faster. “SQL queries against terabytes of data are now realistic,” Morgan said. “Hive has query response by 90 percent against 10 terabytes of data.”
  • Hortonworks also announced an agreement to resell AtScale’s self-service BI platform for Hadoop and enhanced its Partnerworks program to better support managed service providers.

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU