UPDATED 09:00 EDT / JUNE 02 2016

NEWS

IBM launches system for building and managing data lakes

The growing interest in so-called data center operating systems such as OpenStack and Mesosphere Inc.’s DCOS has spurred IBM Corp. to join the fray today with its own automation platform. Dubbed Spectrum Conductor, the software promises to do away with much of the duplicate componentry and expenses that burden IT departments.

According to the vendor, this feat is made possible by a homegrown access mechanism that provides the ability to share information among the applications that need it instead of having to make a separate copy for each. Spectrum Conductor thus reduces storage requirements and eliminates the infrastructure necessary to support duplicate records, which can save a lot of resources in a large company. Yet as appealing it is, IBM’s value proposition will likely meet  some skepticism due to the challenges that plagued past attempts to pull off such an arrangement.

In fact, creating a data lake, as the model is often called, has proven so difficult that Gartner Inc. all but deemed it impractical two years ago. However, two years is a long time in the technology world. Spectrum Conductor comes with automated configuration tools that IBM says can ease the task of configuring applications to exploit its data access mechanism. And the software also simplifies day-to-day management from there onwards with a policy-based provisioning feature borrowed from the company’s storage systems.

The functionality makes it possible to ensure that every workload runs on the infrastructure best suited to meet its requirements. For instance, an administrator can have Spectrum Conductor store an application’s most frequently-used records on flash drives while sending everything else to a cheaper disk system. IBM sees the capability coming particularly handy for analytic workloads, which is why it’s pairing the platform with an optional extension designed to ease the deployment of Spark clusters. The combination provides an up to 58 throughput improvement over vanilla implementations of the engine, according to the company.

Much of the credit goes to File Placement Optimizer, a set of low-level data management features included in Spectrum Conductor that accelerate read and write operations. IBM says that the benefits become especially pronounced in environments with multiple Spark instances, where its software can move infrastructure resources around as usage patterns change. When one cluster is inactive, the hardware allocated to it is made available for the others to help speed their work. And important data can be shared as well to save analysts the delay of recalculating results that have already been readied by a colleague.

IBM plans on contributing key parts of the technology to the upstream Spark community as part of its $300 million effort to foster adoption of the engine. Spectrum Conductor, meanwhile, will be made available commercially as an on-premise offering and in the public cloud.

Image via Geralt

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU