UPDATED 15:01 EDT / OCTOBER 01 2013

Choosing (the Right Kind of) NoSQL

What is NoSQL?

 

The NoSQL arena can be tough to get your hands around. Not only is the field undergoing pretty rapid development, but it doesn’t really center on a single technology. NoSQL is more of an approach that selects and recombines some of the major components of traditional database technology.

The motivation for re-thinking database technology is to achieve a better combination of scalability, fault tolerance, and performance. Reduced to its essence, traditional database technology uses a single machine to host a single database that provides the strong data guarantees associated with the ACID properties. The drawback of this single-machine approach was always that scaling was difficult, usually requiring the addition of expensive hardware.

Of course, you can also scale with a sharding approach, running multiple instance of your database on multiple machines. Sharding requires you to partition your data manually, often resulting in a fragile architecture. To make matters worse, traditional transaction processing algorithms don’t scale well in a distributed setting, forcing a choice between decreased performance and weakened data guarantees for operations that span multiple machines. If you take the latter route and sacrifice ACID properties, you effectively have multiple databases, exposing your application to “split-brain syndrome.”

The first generation of NoSQL systems responded to the difficulties of manual sharding by automating the partitioning of data over a large cluster of machines. Running on clusters of commodity hardware, NoSQL systems provide both horizontal scalability and fault tolerance. Their popularity was initially driven by the growth of web applications, but the same advantages have also led a broad range of enterprise applications to adopt NoSQL.

ACID and NoSQL

 

On the other hand, most NoSQL databases don’t offer ACID guarantees. These guarantees were abandoned with sharding, and NoSQL initially made no attempt to restore them. On the contrary, these systems typically lack ACID properties for operations spanning multiple rows even on the same machine.

Although ACID transactions were first commercialized in relational databases, there’s no engineering necessity for that linkage. In fact, the combination of ACID transactions and NoSQL is ideal for applications with multiple, concurrent clients.

Multiple Data Models

 

Relational databases, by definition, support the relational data model, almost always with SQL as a query and data definition language. Likewise, each first-generation NoSQL database supports a single data model, such as a columnar, document, or graph model. This single-model limitation has a negative effect on application development: different data models are better suited to different use cases, and applications often require more than one data model for different data types.

As a result, engineers turn to “polyglot persistence,” integrating multiple databases into their application’s backend. However, the need to integrate multiple databases imposes a significant engineering cost and not to mention an operational nightmare. Many engineers have come to realize that polyglot persistence is workaround for their problem, not a solution to it.

Rather than integrating multiple databases, it’s usually better to build multiple data models directly on a single storage substrate. The challenge is that concurrent operations on data models require guaranteed coordination of different data elements. Without ACID transactions, application developers have no good way to build new data models on top of a data store. With such transactions, a database can be architected around a storage substrate augmented by layers, allowing applications to select the models best suited to their use cases. The result is data-model flexibility in a single, integrated database.

ACID for Engineering

 

To enable the implementation of strong abstractions, ACID transactions must be global. Many NoSQL systems claim support for transactions of some sort, but they are almost always referring to local transactions, i.e., transactions limited to a single row, document, or adjacent graph elements. Global transactions allow operations on arbitrary data elements and can be used to enforce relationships and constraints among them.

Global ACID transactions are a fundamental tool of good engineering, allowing you to build on a solid foundation and preserve strong data guarantees at each level of your application. Sacrificing ACID has negative engineering consequences for any application that depends on the correctness of its data.

Transactions Are the Future of NoSQL

 

As NoSQL databases become more broadly used for a wide variety of purposes, more applications built on them employ non-trivial concurrency from multiple clients. Without adequate concurrency control, all the traditional problems of concurrency re-emerge and create a significant burden for application developers. ACID transactions simplify concurrency for developers by providing serializable operations that can be composed to properly engineer application software. If you’re building an application that needs to be scalable and you don’t have transactions, you will eventually be burned. Fortunately, the scalability, fault-tolerance, and performance of NoSQL databases are still achievable with transactions. The choice to use transactions is ultimately not a matter of fundamental tradeoffs but of sound engineering. As the technology matures, transactions will form a foundational capability for future NoSQL databases.

About the Author

Stephen Pimentel is the Director of Developer Evangelism at FoundationDB in Vienna, VA, where he promotes a new generation of distributed database technology. In previous positions, he successfully applied data science and analytics to large data sets for a variety of customers in the federal government. He holds  a M.S. in Electrical Engineering and Computer Science from the Johns Hopkins University.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU