Alluxio’s data orchestration platform now spans multiple clouds
Alluxio Inc., developer of a virtual distributed file system aimed specifically at data science and analytics workloads, today released what it says is the most significant enhancement to its platform since its initial release more than three years ago.
The company sells a commercial version of its namesake open-source data orchestration technology, formerly known as Tachyon, which provides a consistent layer between storage and compute resources to enable analytics applications to rapidly access data regardless of location.
Rather than relying upon bandwidth-intensive replication, Alluxio uses a global namespace along with intelligent caching and in-memory metadata to track the location of and changes to data at its source. Alluxio said the open-source version of its technology is used by seven of the world’s top 10 internet companies.
In version 2.0, this process can now be managed by user-defined policies that automate data movement across storage systems on an automated and ongoing basis. The new release also supports data movement across multiple clouds, greater scalability, cluster partitioning and integration with external data sources through a Representational State Transfer or REST interface.
Alluxio said it brings a unique approach to the problem of data silos that afflict most large organizations. Data scientists attempting to build analytics programs that work across an enterprise must contend with multiple data sources that have sprouted up through departmental initiatives, acquisitions and legacy applications. Data virtualization is a relatively recent approach to this problem that attempts to harmonize disparate sources without requiring the need to make copies, a resource-intensive process that can also introduce dangerous data quality issues.
However, many solutions are built with the intention of optimizing storage rather than compute, said founder Haoyuan Li, who co-created the technology while working on his Ph.D. at the University Of California at Berkeley’s AMPLab. Although those approaches may reduce copies and optimize storage efficiency, they don’t help analytics routines run any faster.
Silo acceptance
“Silos are inevitable,” Li said. Rather than addressing the problem by creating copies, “we logically integrate the data so you can access it via a software layer as a folder.” The software is bound to the analytics application, such as Apache Spark or Presto, to optimize performance at the application layer.
The new policy features in Alluxio 2.0 provide for automatic tiering of hot, warm and cold data across any number of storage systems both on-premises and in multiple clouds. Users can configure policies at any directory and folder level to customize and streamline data access, and definitions for individual datasets can cover functions such as writing data or syncing data with storage systems.
When using cloud-based data sources, users can now partition Alluxio layers so datasets being used by different analytics frameworks, for example, can’t contaminate each other. Data from external sources can also be aggregated via a RESTful interface by pointing the source files to Alluxio for access as needed.
With a recent shift to gRPC, which is Google LLC’s open-source remote-procedure frameworks, Alluxio can now scale to 5,000 nodes in a single cluster, said Dipti Borkar, vice president of product management and marketing. Support for the open-source RocksDB persistent key-value store improves performance and enables tiered metadata management to scale to billions of files.
Alluxio provides a free community edition and an enterprise version with enhanced security, additional orchestration features and technical support. The software is delivered for on-premise deployment in a Docker container. Pricing wasn’t specified.
Photo: David Kornfeld/Flickr CC
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU