In the fourth episode of theCube’s special series on Accumulo, Sqrrl founding CTO Adam Fuchs discusses how his company’s platform helps enterprises organize their NoSQL environments.
Fuchs opens the session with an overview of the system. Accumulo limits querying to a range within a keyspace, he explains, and that range represents a hierarchical structure which follows a row, column, and timestamp format. The row determines how the data is partitioned in the database, the column defines vertical partitioning within the row, and the qualifier denounces the uniqueness of the value stored in the key-value pair.
A user can query a specific row, a row in a particular column family, and any value or set of values that may be associated with it.
Fuchs says that one of simplest ways to optimize NoSQL databases is pairing a document table with an inverted index. The document is organized using universally unique identifiers (UUIDs) that represent fields, which in turn contain values that can be retrieved by querying the IDs. This table design enables users to perform query based on the characteristics of a document (that is, the value or parts of the value they’re looking for) rather than its identifier.
The approach described above is known as term-distributed information retrieval, which differs from document-distributed information retrieval. Fusch points out that the latter method involves grouping documents and index entries into partitions, or shards, that can be queried in parallel.
The CTO wraps up the episode by highlighting that Sqrrl implemented both concepts in Accumulo to simplify location data analysis, graph organization, 3D modeling and other data-driven workloads.
Click the video below for the full highlights.