NewSQL Will Prevail in 2012 says MIT’s Michael Stonebraker
This week’s Snapshot interview was well-fated, as we learn about Professor Michael Stonebraker’s vision of 2012 for big data and relational databases. Stonebraker’s been an influential element in database research, and offers insight on big data’s future along with some lessons learned from years of prototypes and startups that span academia to the commercial space. Currently running VoltDB, Stonebraker discusses the hindrances facing big data today, as well as some insider tips on this emerging industry. Stonebraker’s a regular in SiliconAngle’s studio, as he’s seen here discussing the impact of open source and data analytics.
What is your definition of big data?
Technically speaking, big data is best defined as “the 3 V’s”. It means an application has one of the following:
- Big volume. The application consumes terabytes (TB) of data, even petabytes (PB). For example, website visitor traffic, which can quickly grow to Petabyte-scale, is increasingly analyzed by website owners to help them learn about visitor patterns, reactions to promotional offers, and seasonal behaviors.
- Big velocity. An application has so much data, moving so fast, that it’s like drinking from a firehouse. For example, an internet service provider that samples hundreds of thousands of messages from its routers, in real time, to recognize and mitigate large-scale denial-of-service (DoS) attacks.
- Big variety. The application needs to integrate data from a large variety of data sources. For example, a social networking website that needs to efficiently store and retrieve social graphs for its members, each of which can have thousands of endpoints.
What is the biggest hindrance in making use of Big Data?
With few exceptions, the databases being used today were architected over 30 years ago. The data we used back then had a certain profile, and we built computing systems around that. Think airline reservation systems, back when travel agents were needed as intermediaries to book a flight for you. Or customer relationship management (CRM) systems designed to store a few different types of operational data, and to allow you to query it in a limited set of ways. Most data was manually entered into a computer by database administrators. Because computing memory 30 years ago was very expensive, we architected the database to store records out of the box on cheap disk drives, and preserve that costly memory inside the computer.
Today, of course, business operates very differently and, indeed, the very nature of data has changed fundamentally. We are moving away from data that is manually enter to a world of self-service where massive numbers of people access a database on the Internet. Think about today’s airline ticket purchases by many millions of people worldwide. Think massively multi-player gaming, where thousands of users point and shoot simultaneously each second – where each movement represents a database transaction. Think sensor driven data, where our cars or GPS’s send data about how we drive or where we are, several times a second, far faster than a human could ever enter. These types of applications generate real-time data at breathtaking velocities and historical data at phenomenal volumes. However, unlike 30 years ago, the cost of 1TB main memory is roughly 1/10,000th of what it was in relative terms – while the time for data to go out to disk, and come back, has become the rate-limiting constraint on the computer system’s ability to deliver a query. Independent testing has shown that in-memory databases can deliver a factor of 60 times improvement in processing speed. In other words, a query that takes 1 minute in a legacy RDBMS would take 1 second in an in-memory database like VoltDB.
In short, this is not your grandfather’s on-line transaction processing (OLTP) database business anymore. The so-called elephants, or legacy database vendors, will still have their place in processing operational business data about (e.g., inventory levels or bookings). However, they are not were the growth is, or really where the future lies.
What’s been your most satisfying project?
Most of us want to make a lasting impact in our professions. My goal is to make a difference by building innovative database systems, seeing them adopted and ultimately generating value for customers. Thirty years ago I developed Ingres, and later Postgres, Vertica and VoltDB. All of them have been very successful at improving the way we relate to data in one way or another. I would have a hard time picking one of these four systems as the most satisfying; it’s like asking which of my children I love the most.
Can you share some insider tips on learning about big data?
- Find out how to get access to the engineers who wrote any given product of interest. Ask questions to them, directly. Sales staffs usually aren’t the best source of the technical information that really illuminates a purchasing decision. Ask to talk to customers that are running any particular product. They are, by far, the best source of straight information.
- Technical conferences such as SIGMOD, VLDB, ICDE and CIDR are a great way to follow researcher directions, and from that, gain insight into whether your own IT architecture is keeping pace.
- Look for vendors who deliver their software under open source licenses, where you can get your hands on it for free, and even inspect the source code, if you’d like, for free. Often times vendors are more misleading than helpful with confusing marketing materials, products that are preannounced far ahead of delivery, or poor quality control on software releases. If you take a deep dive into how the database vendor is handling certain types of big data applications, you can gain a level of transparency that’s very healthy to the process of seeing if their implementation fits your needs.
You’ve seen most of big data’s past, but what does balancing the future mean to you?
Balancing the future means that one size database no longer fits all – and in the future there will be a half-dozen types of relational databases, each optimized for a particular workload. IT organizations have already begun to embrace this reality, as demonstrated by the broad acceptance of analytic DBMS products such as IBM/Netezza, EMC/Greenplum and Teradata/Aster.
Making the best database choices means you need to pay close attention to your data.. How is your data changing? Is it becoming faster? Is it becoming deeper? Is it becoming more diverse? The answer to all of these questions is almost certainly “yes.” Listen to what the data tells you, then find the right tools for the work that needs to be done, and the right people to do the work — and greatly filter what you hear from legacy database sales teams. You’ll thank yourself this time next year.
What will 2012 bring?
2012 will be the year that NewSQL engines make a big splash. NewSQL combines all that’s good with SQL, without the baggage.
2012 will mark the year that Hadoop enters the “trough of disillusionment” on the Gartner Hype cycle. We like and support Hadoop, but it’s not a panacea.
2012 will mark the year that most of the NoSQL camp starts shipping high level languages that look exactly like SQL.
2012 will be the year of “inside the firewall” private clouds.
2012 will continue the tradition of some new company making a big splash that nobody would have foreseen.
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU