UPDATED 14:19 EDT / JUNE 28 2016

NEWS

Hortonworks tightens Hadoop security, intros Spark-based notebook for data scientists

As Hadoop Summit opens today in San Jose, CA, Hortonworks Inc. has some new goodies for the 4,000-plus people who are expected to attend.

Version 2.5 of the Hortonworks Data Platform (HDP) boosts security with improved dynamic classification capabilities that can be managed by policies. Customers can use Apache Atlas to classify and assign metadata tags, which are then enforced through Apache Ranger policies. In addition, Atlas also now provides cross-component lineage for better documentation of data dependencies.

Tagging “helps organizations link security policies against different tags or categories of data, so I can tag certain data as personally identifiable and enable access policies against those tags,” said Matt Morgan, vice president of product and alliance marketing at Hortonworks. The same tactic can be used for combinations of data, such as names and social security numbers, which are innocuous on their own but when combined present a serious security issue.

Hortonworks is hoping that this year’s Summit – which features speakers from companies like Macy’s Inc., Capital One Financial Corp., Progressive Corp. and ConocoPhillips Co. – will showcase Hadoop’s breakout into the commercial mainstream. “We’re just reaching the point where organizations are driving these transformational case studies,” Morgan said.

Hortonworks will deliver the enhancements under a new release schedule announced in March under which parts of the core platform, which include the HDFS file systems, YARN resource manager and MapReduce programming model are released together as part of the Open Data Platform (ODP) initiative. Other projects that form part of the Hadoop ecosystem – such as Apache Spark, Apache Hive and Apache Ambari – are released according to a community schedule. “We were under enormous pressure to get innovations out because they change so quickly,” Morgan said. However, customers are less eager to update their foundation platform so frequently.

Also announced today is general availability of Apache Zeppelin, a Spark-based notebook for data scientists that Hortonworks President Herb Cunitz has likened to “Tableau for Spark” in a reference to the popular visualization engine from Tableau Software Inc. Zeppelin is a graphical environment that lets scientists create and share visualizations of data. “This is built on Spark, so you’re getting all the velocity that Spark provides as an in-memory system,” Morgan said. The project was announced in March.

Other new releases include the following:

  • A new version of Apache Ambari can be used to plan install and securely configure HDP as well as provide easier ongoing maintenance and management. An integrated log search and access capability enables operators to search, browse and filter their cluster operational logs for easier management. Also a new role-based access control model enables administrators to provide different users with a controlled set of functional access to the cluster.
  • Streamlined backup and restore capabilities have been added to Apache HBase allowing operators to perform incremental backups. In addition, multi-tenancy enables soft partitioning of data and nodes on the cluster and allocates specific data storage and data processing resources to specific internal or external tenants like departments within an enterprise.
  • A new version of Apache Storm adds stream-processing features such as sliding and tumbling window to enable developers to take snapshots of data in a stream and use it to create additional analytics. Connectors have also been added for new connector for search and noSQL database management systems.
  • Hive 2.0 with LLAP (Live Long and Process) Improved interactive query capabilities that make Hive SQL queries faster. “SQL queries against terabytes of data are now realistic,” Morgan said. “Hive has query response by 90 percent against 10 terabytes of data.”
  • Hortonworks also announced an agreement to resell AtScale’s self-service BI platform for Hadoop and enhanced its Partnerworks program to better support managed service providers.

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.