Sqrrl CTO Explains the Secret Behind Accumulo’s Near Real-Time Performance

Accumulo logoAdam Fuchs, the co-creator of Accumulo and the founding CTO of Sqrrl, gave an exclusive whiteboard explanation of his company’s technology on SiliconANGLE’s theCube.

Accumulo is a highly secure, disk-based key-value store that combines Google’s BigTable storage system with innovations Fuchs and his colleagues developed as part of their work for the NSA. It utilizes a data structure known as the log-structured merge tree to rapidly sort randomly ordered key-value pairs using as little disk space as possible.

Fuchs says that the platform partitions incoming data into containers called tablets. Inside these tablets, data is fed into an in-memory map and then replicated onto HDFS to maximize availability. The latter process involves buffering information into sequential streams that are flushed to disk as soon as they “fill up.”

Accumulo merges tablets into a unified stream of key-value pairs in order to make data easily accessible for users. The amount of latency is proportional to the number of tablets, Fuchs adds, but it’s greatly reduced by the major compaction that the platform carries out in the background. This operation integrates data into a globally sorted file that is ready to go through iterator keys.

Iterator keys are operators that can perform versioning and filtering, aggregation and function application. Since these tasks are carried out in the stream, applications don’t have to request datasets multiple times and random I/O is further reduced.

Fuchs concludes the session by highlighting that Accumulo enables users to optimize their deployments depending on how write- and read-intensive their workloads happen to be.

Click the video below for the full presentation.

RELATED ARTICLE:  If Hadoop has crossed the chasm, what's it mean for customers? | #HadoopSummit
Maria Deutscher

Maria Deutscher

Maria Deutscher is a staff writer for SiliconANGLE covering all things enterprise and fresh. Her work takes her from the bowels of the corporate network up to the great free ranges of the open-source ecosystem and back on a daily basis, with the occasional pit stop in the world of end-users. She is especially passionate about cloud computing and data analytics, although she also has a soft spot for stories that diverge from the beaten track to provide a more unique perspective on the complexities of the industry.
Maria Deutscher