When it comes to monetizing the Internet of Things’ vast network of sensors and activity logs, the assumption is there’s value in the data. Value, however, can be a highly subjective lens for crafting IoT’s ecosystem bound to overlap with nearly every industry imaginable. One company wants to make data more available within organizations, a goal directed at achieving real-time accessibility for building business applications and making faster decisions.
MemSQL, Inc., with a speedy database platform for real-time analytics, promises to help companies adapt and learn from their data in real time by using in-memory performance, horizontal scalability and advanced SQL analytics.
The company is among the growing crowd of Apache Spark supporters, developing tools atop the open source platform in support of its real-time analytics offerings. Addressing demand for machine learning archives, graph analytics and stream processing at scale, MemSQL’s released the Spark Connector, what it calls “the easiest, most performant path for operationalizing Spark in the enterprise.”
SiliconANGLE Media recently spoke with Gary Orenstein, chief marketing officer at MemSQL, to get an inside look at what it takes to effectively interpret real-time data, unlock the Internet of Things’ (IoT’s) market potential, where Apache Spark fits in and the most promising use cases for real-time data in the business world.
Effectively interpreting real-time data
Q: To unlock IoT’s market potential, why is it better to analyze real-time and historic data together?
Orenstein: IoT represents real-world happenings in the moment. This can be everything from updates on industrial equipment health or data on the status of manufacturing processes to sensor inputs from remote monitoring. Because the end-point devices are interconnected, applications have an opportunity to see real-time data. But real-time data alone is just one piece of the picture. Without historical data incorporated into analysis, interpretations can be misleading.
Let’s look at the case of monitoring global wind turbines. While real-time data will tell us how the turbine is functioning in the moment, we have no way to correlate that performance with a historical pattern. Understandably, wind turbine performance is highly correlated with weather and the shift from high- to low-pressure zones that generate wind. Only with access to historical data that can correlate current weather do we get a sense of the wind turbine efficiency in context.
Q: What are the top challenges in monetizing IoT data, and how do MemSQL innovations like distributed architecture and in-memory capabilities address these obstacles?
Orenstein: The top challenges for monetizing IoT data are capturing data quickly and making data accessible to members of your organization for building actionable applications.
In order to achieve these objectives, MemSQL has taken a unique architectural approach that combines in-memory capabilities with distributed systems, remaining fully compatible with ANSI SQL. With MemSQL in place, companies can handle IoT workloads and build applications with the following capabilities:
- Support massive data ingest across millions of devices and connections. Systems must keep up with incoming data. The use of in-memory and distributed systems enables this performance.
- Serve as the system of record while simultaneously providing real-time analytics.
- MemSQL combines transactional and analytical workloads, saving time by avoiding batch ETL processes.
- Respond to and integrate well with familiar ecosystems. By retaining the data model of SQL, the lingua franca for data processing, MemSQL can connect to existing data workflows, including importing data from other systems, and easily connect to business intelligence tools like Tableau or Zoomdata.
- Allow for online scaling and online operations. The world stops for no one, and successful services will be judged by their ability to grow and provide enterprise-level service quality. By maintaining a flexible software footprint that works across servers, virtual machines, and containers on-premises or in the cloud, MemSQL allows for easy scalability.
The Apache Spark advantage
Q: What are the most notable ways Apache Spark has catalyzed real-time data analysis, and what impact do you expect this platform to have on enterprises’ ability to unlock IoT’s market potential?
Orenstein: Apache Spark has helped spur real-time analysis by providing a fast transformation engine. More specifically, Spark can assist in areas like streaming and data enrichment before data pipelines are persisted to a durable data store. It is important to note that Spark itself does not store data and fits best when coupled with a fast durable datastore like an in-memory database.
Spark has allowed customers to add real-time processing to workflows formerly restricted to batch updates. For example, cases where data was collected and then scored overnight can now be transitioned to real time with Spark. In particular, customers are often able to use existing machine-learning models to load into Spark and then have sensor data and scoring data persisted to a database.
Q: What are some of the most promising use cases you’ve seen for real-time data in the business world?
Orenstein: Three notable use cases for real-time data include:
- Network monitoring and proactive provisioning. By monitoring set-top boxes, large cable companies can monitor their entire network to determine the video viewing experience and proactively allocate additional bandwidth as needed. If they monitor latency in the network, they can increase capacity to ensure viewers’ favorite shows are not interrupted by jitter.
- Mobile application statistics and real-time analytics. By capturing mobile application data in real time, web properties and social networks can understand user activity in the moment. This is especially critical for exchanging information with advertising partners. In areas like fashion, music and online videos, brands must remain relevant in the present.
- Predictive analytics in energy sector. When energy companies have massive investments in drilling rigs and bits, they need to keep track of things like drill bit health. If they do not push hard enough and retire the drill bit early, they are essentially leaving money on the table. If they push too hard and the drill bit breaks, it becomes a costly repair. By capturing drill bit data in a real-time data pipeline, energy companies use machine-learning models to score incoming data and determine action plans instantly.