Sqrrl CTO Explains the Secret Behind Accumulo’s Near Real-Time Performance

Accumulo logoAdam Fuchs, the co-creator of Accumulo and the founding CTO of Sqrrl, gave an exclusive whiteboard explanation of his company’s technology on SiliconANGLE’s theCube.

Accumulo is a highly secure, disk-based key-value store that combines Google’s BigTable storage system with innovations Fuchs and his colleagues developed as part of their work for the NSA. It utilizes a data structure known as the log-structured merge tree to rapidly sort randomly ordered key-value pairs using as little disk space as possible.

Fuchs says that the platform partitions incoming data into containers called tablets. Inside these tablets, data is fed into an in-memory map and then replicated onto HDFS to maximize availability. The latter process involves buffering information into sequential streams that are flushed to disk as soon as they “fill up.”

Accumulo merges tablets into a unified stream of key-value pairs in order to make data easily accessible for users. The amount of latency is proportional to the number of tablets, Fuchs adds, but it’s greatly reduced by the major compaction that the platform carries out in the background. This operation integrates data into a globally sorted file that is ready to go through iterator keys.

Iterator keys are operators that can perform versioning and filtering, aggregation and function application. Since these tasks are carried out in the stream, applications don’t have to request datasets multiple times and random I/O is further reduced.

Fuchs concludes the session by highlighting that Accumulo enables users to optimize their deployments depending on how write- and read-intensive their workloads happen to be.

Click the video below for the full presentation.