UPDATED 12:00 EDT / SEPTEMBER 26 2016

NEWS

MemSQL calls its new streaming technology a ‘game-changer’

Boosting its claims of being the fastest SQL database platform for analytics, MemSQL Inc. today is announcing version 5.5 of its namesake product, incorporating new features for streaming data ingestion from external sources such as Apache Kafka.

The new feature, called MemSQL Pipelines, introduces the SQL “create pipeline” syntax, which enables users to construct real-time data streaming pipelines via the command line for immediate visibility of live data being streamed by message brokers such as Kafka.

The technology employs “exactly-once” semantics in Kafka, an important distinction that addresses a problem many IT organizations encounter in handling streaming data from sources such as sensors and Internet of Things devices, according to Gary Orenstein, chief marketing officer.

Kafka data streams typically use one of three polling methods. “At-most-once” pulls data into the stream one time only, with no guarantee of delivery. “At-least-once” guarantees delivery by polling data multiple times, but may result in duplication. “Exactly-once” guarantees delivery without duplication.

Orenstein said exactly-once semantics is a “game-changer” because it removes a significant impediment to deploying streaming applications. “’Exactly-once’ has been challenging because you have to handle all kinds of error conditions as well as starting, stopping and restarting pipelines,” he said.

MemSQL gets around the problem by co-locating the offset of the messaging queue in the same data store as the data. “There’s a huge opportunity to move from batch to real-time workflows, but the semantics have to be more robust,” Orenstein said. “Think of this as an enabler for that move.”

The new release also includes profiling and concurrency improvements that boost query performance an estimated five-fold in the star schema structures commonly used in data warehouses and data marts, MemSQL said. The addition of Bloom filters boosts performance and memory efficiency by determining if a data element is present in a set before executing a query. Query profiling provides a view of the query during execution. “It’s kind of like a live x-ray of your query,” Orenstein said. The feature is useful for understanding and fine-tuning query performance. Concurrency is helpful in scheduling resources to handle an optimal number of users and queries.

MemSQL previously integrated with Kafka through a procedure called Streamliner, which is tightly integrated with Apache Spark. Streamliner isn’t going away, but it’s being repositioned as a tool primarily for heavy Spark users. “If you’re starting anew, go with Pipelines. If you’re already a big Spark user, then choose Streamliner,” Orenstein said.

Founded in 2011, MemSQL has raised $78 million in financing from blue-chip venture capitalists, including an over-subscribed series C round of $36 million in April.

Image by Groman123 via Flickr CC

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU