UPDATED 14:00 EDT / FEBRUARY 23 2015

Designing data stores for the Internet of Things

data cloud falling thingsThe Internet of Things (IoT) has spurred a new world of connected devices. The most substantial impact of the data produced by this sea of connected devices will be the need to drive difference-making decisions for people and businesses.

Computer systems will be pressured to analyze this data quickly effectively for timely decisions. “Perhaps the biggest data problem the IoT faces is correlating the data it collects with actions you can take,” said Alistair Croll in O’Reilly Radar.

Our expectations tend toward instant when we think about how long it takes from sensing information to acting upon it. What if data from hundreds or thousands of sensors needs to be combined with data from hours, days or even months ago to make the right call?

To help businesses overcome this challenge, we suggest four key principles in designing the right data systems for the new world of IoT.

 .

1. Capture Everything

 .
The IoT landscape stretches far and wide incorporating a host of different end points. For some cases, that will be millions and perhaps billions of devices or applications. For others, it will be a constant stream of detailed information from a smaller group of devices. In either case, the datastore must keep up with a constant stream while not discarding data.

Systems that cannot keep up with data ingest essentially lose discarded data. To guarantee no data loss, data ingest at this scale requires using memory. The capability simply could not exist without it. One streaming option is Apache Spark, a memory-optimized distributed processing framework that can stream and structure data on the fly. Spark can also be coupled with an in-memory database for longer term persistence.

 .

2. Save Data While Serving Data

 .
Assuming a constant influx of records, an effective IoT datastore must easily multitask between saving data and simultaneously serving data. Serving data includes responding directly to end user requests or end device requirements, as well as providing a comprehensive analytics view for teams of analysts. Larger teams can be difficult to support as a constant and impromptu query assault can overload traditional systems.

To guarantee integrity of the database, traditional solutions implement all types of locking mechanisms that essentially cause the database to pause while completing a prior request. A world of interconnected devices cannot afford to wait for inflexible systems. Today companies sometimes use Apache Kafka to support real-time data distribution to multiple data stores with each performing a specialized task. It is also possible to use a memory-optimized database to collapse functionality into a single system.

 .

3. Fit the Ecosystem

 .
Today’s data challenges typically stretch beyond a single system or data store. Therefore, data stores must play well in the context of complete IoT data pipelines, such as being able to import data quickly from other sources or connecting to other data stores for complementary functionality.

Many data stores have loading capabilities for particular data sources like those in Amazon Web Services’ S3, or the Hadoop Distributed File System (HDFS). New computational processing frameworks like Spark also nicely complement persistent data stores.

 .

4. Online All the Time

 .
The application infrastructure supporting IoT devices must remain online all the time. This includes high-availability and disaster recovery abilities, but also the option to make changes to the data store without halting incoming data. It also means being able to expand capacity or processing throughput by growing the data store without taking it offline. While easy to understand, the intricacies of these operations can vary widely across solutions, and architects need to pay close attention to the details. One option is to use Apache Zookeeper in conjunction with other tools where Zookeeper provides distributed processes to coordinate availability. Select in-memory databases also provide online operations directly for automatic scaling and high availability, thereby simplifying deployment.

 

Meeting IoT Data Requirements

 .

As we venture further into our interconnected and interactive world, the Internet of Things is certain to showcase new applications of capturing, processing, and acting on unique and timely data. By adhering to the guiding principles outlined above, companies will be able to provide the capabilities, flexibility, unification, and resiliency to architect the right data stores for the Internet of Things.

 .

About the Author

 .

ericfrenkielEric Frenkiel, Chief Executive Officer & Co-Founder of MemSQL

Eric Frenkiel co-founded MemSQL and has served as CEO since inception. Before MemSQL, Eric worked at Facebook on partnership development. He has worked in various engineering and sales engineering capacities at both consumer and enterprise startups. Eric is a graduate of Stanford University’s School of Engineering. In 2011 and 2012, Eric was named to Forbes’ 30 under 30 list of technology innovators.

photo credit: epredator via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU