Misconceptions about NoSQL

By Dwight Merriman, CEO and co-founder of 10gen

We’re several years into the rise of NoSQL, and yet common misconceptions persist about the movement. As spotlighted by 451 Research, NoSQL job demand is booming, and ranks second only to HTML5 among Indeed.com’s hottest job trends. Yet some continue to think of NoSQL as a niche phenomenon that will fade away.

Nothing could be further from the truth.

Relational database technology has dominated the industry for over 30 years, but it lacks the flexibility and scalability needed to support modern applications, leading Redmonk analyst James Governor to declare, “The idea that everything is relational?  Those days are gone.” NoSQL will come to be the preferred database technology for the vast majority of use cases, with document databases increasingly capable of handling as much as 80 percent of database workloads.

Given this, it’s not surprising I get frustrated when I hear the following misconceptions About NoSQL:

Only Web Companies Use NoSQL Databases

The easy way to counter this is to point to customer or production deployments examples.  There are literally thousands, spread across a variety of vendors and open-source projects. And while it’s true that NoSQL started with the web companies as they sought to solve their data scalability problems, the “NoSQL is only for the web” canard almost immediately lost touch with reality.

Why? Because nearly every company has a “Big Data” problem, though the relative size of “big” differs markedly.

It’s also the case that most companies aggressively use the web. Enterprise applications are increasingly reliant upon web-based architectures. While “web” within the enterprise used to mean the company’s public-facing website, this stopped being true long ago.

In short, the distinction between “web company” and “traditional enterprise” hardly makes sense anymore, and as the enterprise has become more porous to inbound data, the need to manage it in a flexible, scalable fashion has necessitated that traditional enterprises follow the lead of Google, Facebook, and others in embracing NoSQL.

NoSQL Is Only Used for Caching, Not Systems of Record

“But wait!” The NoSQL Luddites exclaim. “NoSQL may be used by traditional enterprises, but not for serious applications. It’s just a caching technology.”

Again, there are numerous examples of production deployments that indicate this isn’t even remotely true. If anything, at 10gen we see the opposite: NoSQL is being used to drive the real application, and then the data may be moved to a relational database for time-insensitive, offline analysis. The data that is “hot”, which is relevant and changing right now, that data is being kept in NoSQL databases to inform an application, as The Guardian newspaper does to drive greater user engagement with its content.

At a higher level, NoSQL is used where a) your “records” are loosely and/or dynamically structured, b) where your system requires such scale that relational is incapable of being the system of record, and c) systems of record are evolving more quickly and need agility.

NoSQL Is All About Offline

This misconception is related to the preceding one, and is just as flawed. In NoSQL, there is a great deal of online operational use cases, where the application is both constantly reading and writing.

Applications are getting away from their “passive” status, where the application only does something when I ask it a question. Enabled by NoSQL, applications are more and more active, where the application yields information that I didn’t know, without me even having to ask. Maybe this is Google Maps routing me around traffic that I didn’t know about, or it’s Foursquare notifying me of a merchant-sponsored deal based on my physical location. Such applications require constant updating and reading of data in order to notify the user when something changes.

Indeed, in many ways, “operational” is the future of business intelligence. Traditional BI requires the user to ask the system questions to get answers. Going forward, NoSQL enables a much more active relationship with our data, revealing interesting things without having to ask.

Hadoop and NoSQL Are Different

Since we’re talking about BI, “Big Data” is not far away. Hadoop is a computing framework whereas NoSQL refers to a class of databases. However, you can do many of the same things with both technologies, the primary one being data analysis. Ultimately, the biggest difference between Hadoop and NoSQL is the speed at which one can get answers to questions, or process and analyze data. Hadoop, of course, is batch-oriented, letting data analysts run queries against historical data, while NoSQL gives users the chance to operate in real-time.

NoSQL, in other words, is about interacting with data right now, whereas Hadoop is about working with that data later.

In some ways, batch processing is easier because it’s less complicated: you collect data, you store it, you process it, and then you analyze it after it’s safely packaged up. Batch processing is good for experiments with data to prove that it’s useful. But batch-oriented systems can’t power the actual applications. For this, we need to operationalize Big Data, which requires real-time interfaces. Like NoSQL.

Users wondering whether to use Hadoop or NoSQL should focus on how they intend to use data, and the velocity, or rate of access, of that data. If the velocity is fast, or real-time, NoSQL is required. How the data is to be stored or queried is a secondary question.

Big Data Is All About Analytics

If you look at the traditional database market, roughly two-thirds is comprised of operational use cases (OLTP, more or less), with analytical use cases (OLAP) taking up the remainder. This is true in Big Data as well. Real-time analytics are harder than offline or batch-oriented processing for analytics, so it has taken more time to mature. But it’s increasingly clear that real-time data analytics is the near-term future of Big Data, and that’s NoSQL.

Developers Should Default to Batch Processing

Have you not been listening? While some applications are fine using old data, the trend is toward active applications that require real-time reading and writing of data. This is where NoSQL is strong.

It’s also a better fit for developers. Most developers are comfortable with the request/response mode of programming. They think in terms of “I receive a request, I do these steps, then send a response.” Batch systems are therefore harder to program, as they limit the number of developers that can write it. Batch processing is asynchronous to one’s applications and hence harder to integrate and manage. Real-time systems like NoSQL databases give developers the best chance of actually having their data be leveraged by applications, rather than just perused by analysts.

Again, batch processing tools like Hadoop have their place, but real-time tools like NoSQL databases arguably should be the first consideration for developers.

Organizations Should Store Data First; Ask Questions Later

In the relational database world, a developer spends time on design schemas and architectures and then sets clean data in it. But if they didn’t forecast correctly they miss out on data, so the incentive is to store everything.

Let’s be clear: it’s very hard (impossible?) to get the data schema correct out of the gate.  One simply doesn’t know which direction the business will take, necessitating changes to the model, even if the original design was a good fit. It’s therefore important that developers design for failure and iteration. A document database affords the flexibility to continuously tweak one’s data model, as necessary, giving the developer a flexible data infrastructure with which to work. Up-front schema design is generally premature optimization.

In Summary

NoSQL has only been around a couple of years, but already it is dramatically changing the way applications are built, and the way we store and analyze data. Given how important a role it plays in the industry today, it’s important that we think about it correctly.

It’s also important to understand that “NoSQL” is not a monolithic category, which perhaps contributes to some confusion over what it means and what NoSQL databases are meant to do. Within the NoSQL camp there are document databases with broad applicability to a range of use cases. Then there are other NoSQL databases that solve narrower problems, but very well.

Which kind of NoSQL database you need will depend upon your application. But the first step is to recognize that more often than not, you are going to need a NoSQL database for your next application.