UPDATED 08:00 EDT / MARCH 11 2020

BIG DATA

Alluxio adds cataloging and data transformation to its data orchestration platform

Alluxio Inc., developer of open-source cloud-based software that orchestrates and harmonizes data from multiple sources into a common format, today announced new features that simplify the task of transforming and storing data.

The new features also make the data available more quickly to data scientists for analytics and machine learning applications.

The company’s software is an in-memory virtual storage layer that interfaces with multiple back-end data stores to feed data to performance-dependent open-source computing frameworks like Apache Spark, Apache HBase and Presto. Its approach eliminates copies and uses intelligent caching to predict requests from the frameworks and pre-load data accordingly. Over the past year, Alluxio has been moving into global namespaces and data management for moving data between stores.

“What Kubernetes does for compute Alluxio does for data,” said Chief Executive Steven Mih, referring to the popular platform for orchestrating the self-contained software environments called containers.

Alluxio is tackling the performance problems inherent in transforming and loading data from diverse sources such as Amazon Web Services Inc.’s S3, the Hadoop file system, the Ceph free software storage platform and Dell Technologies Inc.’s Dell EMC Elastic Cloud Storage into Parquet and JavaScript Object Notation. They’re all open formats widely supported by analytic frameworks.

The company today is adding data catalog and transformation services to its platform. The catalog service manages the metadata of structured data in a system, keeping track of all the database, table and schema information, as well as the location of all stored data. That eliminates the need to change table locations in a metastore based on the Apache Hive data warehouse software or to restart or reconfigure Hive services.

The catalog service enables schema-aware optimizations for any type of structured data. For example, once a Hive metastore is attached to the Alluxio Catalog Service, the service will automatically mount the appropriate table locations and automatically serve the table metadata with the Alluxio locations, the company said.

“The benefits are that Alluxio can do schema-aware optimizations to deliver data in a particular schema,” Mih said. “That makes it simpler for data engineers who used to have to connect to multiple data siloes. They need metadata to understand what kind of data they have, how big it is and how to access it.”

The transformation service converts data into a compute-optimized representation that’s independent of the source storage format. While results depend on the specific formats and workloads, Alluxio said internal tests have shown up to fivefold improvements in query performance.

“SQL frameworks think of the world in tables, schemas, rows and columns while storage systems think of files, objects, directories and raw bytes,” Mih said. “We can transform data to be compute-optimized regardless of the format.” The service coalesces a large number of small files into a small number of large files, translates comma-separated value format into Parquet and does in-line sorting, he said.

The software is available in a free community edition under an Apache 2.0 license and an enterprise version with enhanced features such as security and sorting. The company publishes limited pricing information on its website.

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU