

As Hadoop Summit opens today in San Jose, CA, Hortonworks Inc. has some new goodies for the 4,000-plus people who are expected to attend.
Version 2.5 of the Hortonworks Data Platform (HDP) boosts security with improved dynamic classification capabilities that can be managed by policies. Customers can use Apache Atlas to classify and assign metadata tags, which are then enforced through Apache Ranger policies. In addition, Atlas also now provides cross-component lineage for better documentation of data dependencies.
Tagging “helps organizations link security policies against different tags or categories of data, so I can tag certain data as personally identifiable and enable access policies against those tags,” said Matt Morgan, vice president of product and alliance marketing at Hortonworks. The same tactic can be used for combinations of data, such as names and social security numbers, which are innocuous on their own but when combined present a serious security issue.
Hortonworks is hoping that this year’s Summit – which features speakers from companies like Macy’s Inc., Capital One Financial Corp., Progressive Corp. and ConocoPhillips Co. – will showcase Hadoop’s breakout into the commercial mainstream. “We’re just reaching the point where organizations are driving these transformational case studies,” Morgan said.
Hortonworks will deliver the enhancements under a new release schedule announced in March under which parts of the core platform, which include the HDFS file systems, YARN resource manager and MapReduce programming model are released together as part of the Open Data Platform (ODP) initiative. Other projects that form part of the Hadoop ecosystem – such as Apache Spark, Apache Hive and Apache Ambari – are released according to a community schedule. “We were under enormous pressure to get innovations out because they change so quickly,” Morgan said. However, customers are less eager to update their foundation platform so frequently.
Also announced today is general availability of Apache Zeppelin, a Spark-based notebook for data scientists that Hortonworks President Herb Cunitz has likened to “Tableau for Spark” in a reference to the popular visualization engine from Tableau Software Inc. Zeppelin is a graphical environment that lets scientists create and share visualizations of data. “This is built on Spark, so you’re getting all the velocity that Spark provides as an in-memory system,” Morgan said. The project was announced in March.
Other new releases include the following:
THANK YOU