Microsoft upgrades Azure HDInsight, its Hadoop Big Data offering


Microsoft has beefed up its cloud-based Azure HDInsight Hadoop offering with new security enhancements and a performance boost that the company claims will speed up Big Data queries by a factor of 25.

Azure HDInsight is kind of managed Hadoop service that lets users deploy and manage clusters on the Azure Cloud. It’s offered in partnership with Hortonworks Inc., based on that company’s Hortonworks Data Platform.

The extra security comes in the shape of enhanced authentication and identity management features, Microsoft revealed in a blog post. Meanwhile, the performance gains come by way of a new feature called Long Lived And Process (LLAP) used in Hive databases, which is now available in preview.

According to Microsoft, LLAP (whose full name others list as Live Long And Process) allows data to stay in a compressed format while running in-memory, helping to deliver a 25 times performance boost for Big Data queries. In addition, Microsoft says that further performance gains come from updating the platform to Spark 2.0, which overhauls the core query engine and gives it the ability to perform cache-efficient vectorized computations, with resulting gains of around 10 times faster processing.

As for the security, new features in Azure HDInsight include integration with Azure Active Directory, which is Microsoft’s cloud-hosted directory and identity management service. Also new is the implementation of Apache Ranger within Azure HDInsight, which provides centralized policy control for Hadoop clusters.

In addition, data processed in Azure HDInsight can now be secured at rest via server-side encryption in Azure Storage or the Azure Data Lake Store. Users can also opt to manage their own encryption keys for this service, storing them in Azure Key Vault.

Microsoft also welcomed a number of new third-party vendors into its HDInsight partner program. These include Cask Data Inc., which offers a self-service, extendable open source framework to visually develop, run, automate and operate data pipelines, and StreamSets Inc., whose Dataflow Performance Manager software provides a single pane of glass for management of big data flows, so enterprises can map and measure all their data in motion.

Image credit: geralt via