Digging under the covers of AWS’ Timestream database with Amazon CTO Werner Vogels
Amazon Web Services Inc. has become famous, or perhaps infamous, for the constant introduction of new cloud computing services, so many that customers often have trouble keeping track of them all — and databases are no exception.
One of the newest among the at least 15 databases available from AWS is Amazon Timestream, which became generally available in September. Aimed chiefly at “internet of things” applications, it’s designed to store and retrieve data records that are part of a so-called time series, a set of data points that are associated with timestamps.
Like other time series databases, Timestream is built for applications that generate a continuous flow of data, such as measurements from IoT sensors, in a format that allows fast insertion and retrieval of massive amounts of time series event to support complex analyses — trillions of events per day, in fact, and at speeds up to 1,000 times faster than standard relational databases.
Time series data can be as simple as tracking the number of steps a day you take and relating that to body weight over time, but it becomes mandatory for enterprises that need to do things such as tracking and making sense of fast-changing data such as stock prices, video streams or the temperature of an operating machine. These databases are becoming increasingly important as more and more IoT devices proliferate, evidenced partly by the rise of Timestream competitors such as the open-source Prometheus, InfluxData Inc.’s InfluxDB and Timescale Inc.’s TimescaleDB.
Werner Vogels, Amazon.com Inc.’s chief technology officer, last week wrote in a blog post about what’s behind Timestream. In an exclusive interview with SiliconANGLE, he explained how it illustrates both Amazon’s philosophy of building specific cloud tools for various tasks rather than trying to put them all into a single platform, and the company’s underlying architectural choices. The conversation was lightly edited for clarity.
You seem to have a broader goal with the blog post on Timestream than just showing how it works.
I’m trying to provide a little bit of background on the technology that sits behind the databases, or any of the other systems we’re building. There’s some unique architecture decisions that we’ve made with Timestream that may be applicable to general architectures as well. For example, the fact that we use cell-based architectures to reduce blast radius.
Another point I make is: Don’t optimize systems in isolation. The idea that you just measure messaging, or just measure storage or just measure querying — that doesn’t give you an overall system that is actually optimized to what you would like to do.
How does Timestream and the thinking behind it fit into AWS’ overall approach to cloud computing?
It’s part of our quest to build purpose-built databases. For a lot of engineers, the relational database is the hammer they use for everything. It’s the tool they know how to use best. But doing time series on a relational database is a major headache. And it’s very hard to do that in real time. After all, most of the time series applications we’ve been seeing, real-time plays an important role.
So we’ve been on this quest, something that Jeff Bezos said many years ago, that really drove my architectural principles: We’re building tools, not platforms. The definition of a platform in the old style is that software companies would do the next release and give you everything and the kitchen sink and tell you how you should develop software.
What we’ve learned over time is that if you build small tools that are really purpose-built for one particular target, you can optimize that, and make it better-performing or more reliable or actually cheaper, or make sure you improve developer productivity.
What is the purpose of time series databases?
Time series is just a sequence of data points over time. If you look at the world around us, there’s tons of examples where it’s applicable. There’s a whole range of applications where time is the dominant factor. Take for example clickstream analysis. You may want a dashboard that tells you in real time what is going on. Time plays an important role in that. Having a purpose-built database that actually allows you to have real-time access to current, timely data in-memory.
How does Timestream work?
Under the covers, there’s two different storage engines. One them is in-memory, basically the most current records. Most records in a time series database are in current time. That’s many of the current use cases. Then the historical data goes into magnetic storage or SSD. There’s a policy you can set when stuff needs to move out of memory into the historical engine.
The in-memory store is targeted toward very fast queries. Basically you analyze tens of gigabytes in literally milliseconds. The historical store is much more about thinking about terabytes, petabytes, the types of queries that take seconds. The good thing for customers is they don’t have to think about that. They don’t have to think about what’s in memory or what’s on disk. It’s transparent for them.
Most of the queries that happen on the current timely data, you want to have the results back in milliseconds. If this is a predictive engine that looks at failures in your refrigerator for liquid natural gas, you don’t want to do that tomorrow, you want to do that right now. There’s also dashboarding, alarming, clickstream analysis, those kinds of things that are crucial.
How does Timestream differ from other time series databases?
At Amazon, we’ve had this principle for a very long time that we’re not really interested in the average latency for our customers, because it means 50% of our customers are getting a worse experience, and you don’t know how much worse. The 90th, 99th percentile are the measures we really care about. You want to measure that in real time. It’s easy to create a query over Timestream that extracts the P90 or P99 out of the last five minutes or the last 45 minutes or the last hour, or the last 45 seconds.
It’s a serverless database after all, so we take care of the performance and the reliability and we automatically replicate over multiple availability zones, scale up and down under the covers, and security. By default, it’s encrypted, but you can also bring your own keys. Customers don’t have to think about it. This is a unique database that plays into everything that includes time but is not relational.
Why are time series databases and their capabilities becoming more important, or at least more possible to do now?
Customers are building quite complex architectures, especially in the areas of IoT and DevOps management. IoT is the big use case where time was always important. But imagine if you have a fleet of trucks, you want to keep track of their load, their speed, their fuel consumption. If I want in real time to see which of my trucks are consuming more gas than the other trucks, then Timestream becomes really important there. Timestream makes it really simple to build something like that.
What are some other examples of how companies are using time series databases?
If you look at our customers, the big part is in those who need to manage devices or customer engagement. One of the big customers of Timestream is Disney Plus. It records billions and billions of data points each day of the quality of video, the buffering, the customer experience. Many of our industrial customers, such as trucking or natural gas production, or take for example companies in construction, use this.
The dominant attribute in all of these is time, especially if you want to correlate things. Timestream comes with a standard correlation function. Take the clickstream of yesterday and the clickstream you’re seeing now, and is there a major variation? Or you may think about how your customers are going to come today but then you put your real metrics against it, and how far is it from your prediction?
Clean Air tracks air quality around the world. You may be interested in the air quality yesterday, but that’s what I would call reporting. It’s not just one data point. For example, there is a function in time series databases called a window, basically a sliding window. You can compute, let’s say, average air quality … of the last five minutes or five seconds. Quite a few of the functions that we’ve added to normal SQL allow us to do this time series capabilities to really slide over time, or to find correlations over a certain time period.
How does Timestream get used with other Amazon databases?
There are three parts to time series database. One is ingesting data to a write-optimized memory store, and then you move that over time to a magnetic store, and then you have the query engine. But you need to get it in there. Kinesis and Managed Kafka Service are two ways to move data into Timestream. There’s also Apache Flink that is the right sort of connector between Timestream and other databases.
What about users — OK, Oracle users — who don’t want to use so many different kinds of databases?
I think you need to use the right tool for the job. Many of our professionals have a range of tools that do exactly what they want for that purpose. Surgeons don’t just have a knife. They have a whole range of scalpels. They don’t get confused about it, they just use the one that’s exactly right for the job. If they come with a saw, you’re not going to be very happy.
That is where we as software engineers are getting to as well. Now there’s a palette of eight or 10 purpose-built databases at AWS. It allows customers to pick exactly the right tool. It saves them money, and most importantly, it saves them developer time. If I need to build a time series operation on top of a relational database, it’s going to cost me a lot of effort, a lot of time. And then you wouldn’t even have the right observability tools to see what’s going on.
How did Amazon come to the idea that so many different databases were needed?
The first new database we built was a key value store. Why? Back in 2004, on Dec. 12, all of Amazon was running on relational databases. We had a failure in our massive rack cluster that took the whole site down over the busiest day of the year. It made us think that maybe relational databases are not the right tool for everything.
So we did a deep dive on everything, and it turns out 70% of the storage at Amazon was key-value. It was just, Give me my shopping cart, give me this, give me that — one attribute and give me the result of that. So we thought, Hey, wait, we can build a key-value store service for ourselves that actually supports all this and makes it higher-scale, more reliable, than you would ever be able to do with a relational database. Plus, keeping relational databases alive over three different availability zones or data centers was a nightmare.
We once pulled the plug on one of the data centers to see what happens. It turns out that all the failures were with relational databases. Then inserting the data center back into live operations was a nightmare.
What did Amazon do about that?
We built DynamoDB. It was the first service where everything was just managed. Internal customers didn’t have to think about this. The first service we launched with that was a shopping cart.
Take another example, our quantum database ledger service QLDB. The core piece of all bitcoin operations is basically an immutable ledger, a distributed ledger. But many of our customers want it in one place. If you want to do this in a relational database, that’s a nightmare.
Development becomes so much simpler if you have these purpose-built databases. It’s all about improving the speed to development by giving you the right tools that you need.
Photo: Robert Hof/SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU