UPDATED 08:00 EDT / JUNE 26 2018

BIG DATA

MapR updates its data platform for better support of AI and analytics processing

MapR Technologies Inc. today is rolling out major updates to its core data platform.

Together, the updates are aimed at speeding up the development and deployment of automated analytics, improving developer and data scientist productivity, boosting security and improving storage options.

MapR’s stock in trade is what it calls a “converged data platform,” which supports access to multiple data stores via a single interface based upon the Posix file standard. The company said its platform is optimized for use with analytics and machine learning frameworks, most of which require Posix.

Among the major new features is support for cloud storage through object tiering, which is a policy-based, automated technique for moving noncritical or inactive data to a cloud object store. That’s combined with a native Amazon Web Services Inc. S3 interface for use in direct analytics processing on operational data with transparent application portability across on-premises and multicloud environments. The new release also supports erasure coding, a low-overhead approach to long-term data protection that uses technology similar to checksums to reduce capacity requirements.

Single point of access

Support for S3, a low-cost cloud-based object store, should be of particular value to data scientists who want to use machine learning frameworks across on-premises and cloud storage, said Anoop Dawar, a MapR senior vice president. “Many companies have lots of images in Posix stores,” he said. “They want to use S3 for analysis, but they can’t because they’re in a Posix store. The beauty of this is that you can move everything into the MapR file system and access it with S3.”

The MapR platform can also detect the location of data across on-premises and cloud storage locations and adjust accordingly. Less-used data can be moved to S3 to save costs and Apache Spark applications “will detect that the data is in that data store and automatically recall it,” Dawar said. “You can set up a tiny cluster in the cloud and compress and encrypt your production data into an object store. Only that data that’s used by those jobs is pushed back into the cluster.”

Security policies are now enabled by default on all data, and volume-based data encryption has been added for data at rest, with encryption keys managed automatically by the platform. Previously, security policies had to be enabled by administrators and encryption was applied on a drive-by-drive basis.

“AI and analytics databases are a honeypot for hackers,” Dawar said. “We’re now configured by default for security so users don’t have to go through administrative headaches.” All data can be stored in an encrypted state, and all network connections are now encrypted with authentication enabled.

For developers, MapR has added support for Apache Spark 2.3 for structured streaming and machine learning along with analytics toolkit support for Apache Hive 2.3 with more than 800 issues already resolved. Nonprogrammers can now create streaming applications with KSQL, an open-source, streaming SQL engine that enables real-time data processing against Apache Kafka streams. “Something that would have required extensive developer intervention is now a single SQL statement,” Dawar said.

Audits are now also available as a stream using a publish-and-subscribe metaphor, which is a useful feature for analyzing storage needs and allocating tiers accordingly. The upgraded platform will be available in the third quarter of 2018 as an in-line enhancement for existing customers.

MapR storage tiering, June, 2018

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU