UPDATED 22:29 EST / JUNE 05 2017

BIG DATA

Apache Hadoop’s first update in two years focuses on cloud and security

The first new update of the open-source big-data software Apache Hadoop in two years has just been released by the Apache Software Foundation.

Apache Hadoop 2.8, the popular software framework for scalable, distributed computing, boasts a number of improvements focused on cloud and security. “Apache Hadoop 2.8 maintains the project’s momentum in its stable release series,” said Chris Douglas, vice president of Apache Hadoop. “Our community of users, operators, testers, and developers continue to evolve the thriving Big Data ecosystem at the ASF.”

One of the best-known open-source projects around, Apache Hadoop is a software framework that supports processing and storage of extremely large data sets in distributed computing environments. Hadoop is widely regarded as instrumental in driving market transformation for multiple enterprises, with Forrester Research Inc. forecasting companies will spend $800 million on the software and related services in 2017.

Although most Hadoop users continue to run the framework on physical clusters of computers and storage devices in their own data centers, the new release gives a nod to the growing number of users that choose to run it on cloud infrastructures. Public cloud companies, including Microsoft Corp. and Amazon Web Services Inc., made significant contributions to the release. One of the main new features is support for Microsoft Azure Data Lake as both a source and a destination of data, which should benefit anyone running Hadoop on Microsoft’s cloud.

Meanwhile, AWS has helped to improve the “S3A” client for users working with data stored in Amazon’s S3 storage service. The new client boasts enhanced scalability, performance and security, and the community claims that Hadoop is now able to process columnar data stored in S3 even faster than Amazon’s own EMR closed-source connector.

“My colleagues and I are happy that tests of Apache Hive and Hadoop 2.8 show that we are able to provide a similar experience reading data in from S3 as Amazon EMR, with its closed-source fork/rewrite of S3,” said Steve Loughran, a member of the Apache Hadoop Project Management Committee.

Developers have also reconfigured the YARN cluster management tool to create a more flexible resource model for cloud deployments. This allows operators to adapt to demand by scaling cloud-based Hadoop clusters up or down as necessary.

The updated framework also comes with the obligatory security improvements, including Hadoop user interface protection of Cross-Frame Scripting attacks and Hadoop REST API protection of cross-site request forgery attacks. The full list of features, improvements and bug fixes can be found in the Apache Hadoop 2.8 release notes.

Image: Apache Software Foundation

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU