UPDATED 11:37 EDT / OCTOBER 22 2020

BIG DATA

Alluxio expands virtual file system to support billions of files

Alluxio Inc., maker of a virtual distributed file system for data science and analytics workloads, Wednesday released a new version that expands its metadata service and enables unified management across hybrid and multiple clouds.

Users can now manage namespaces with billion of files without the need for third-party tools, and a new management console makes it easier to connect an analytics cluster to multiple data sources both in the cloud and on premises.

Alluxio specifically targets data science and analytics users and has landed seven of the top 10 internet companies as customers, the company said. Its technology abstracts and virtualizes data for delivery to popular open-source analytics engines such as Apache Spark, Presto, Flink and Hive. It uses a global namespace, caching and in-memory metadata to track the location of and changes to data at its source, thereby avoiding the need to replicate.

Using Alluxio can improve the productivity of data modelers fourfold, said Chief Executive Haoyuan Li, who co-created the technology while a graduate student at the University of California at Berkeley. “The cost of training the model goes from $1 million to $200,000 and the time required from one year to three months,” he said.

The expanded metadata service moves the product further away from its Hadoop roots and improves support for cloud-native and container-based deployment. “We started in the Hadoop world and so required users to have that dependency,” Li said. “Now it’s completely removed.”

The management hub provides a wizard-based approach to connecting data sources across multiple locations as well as configuration and monitoring of Alluxio clusters. That permits data from sources such as Hadoop HDFS, Amazon Web Services Inc.’s S3 and Google LLC’s Cloud Storage to be combined.

In an effort to reduce barriers to adoption, the console also simplifies the process of configuring and launching a cluster and improves monitoring to reduce operational costs. Alluxio previously shipped with an open-source console that had only basic monitoring features and no configuration options, Li said.

New support for Terraform, an open-source toolset for managing infrastructure as code, now makes it easier to launch pre-configured clusters programmatically with a single command. This version also integrates with Vault to provide for secure, centralized management of sensitive information across clouds and data centers. Other enhancements include simpler cluster management and support for Java 11.

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU