UPDATED 11:28 EDT / JULY 18 2012

NEWS

Nodeable Shift Highlights Growing Interest in (and Need For) Streaming Big Data Analytics

by Jeffrey Kelly

Nodeable made what you might call a major shift in its business model today. The company, which began life as cloud monitoring service provider for systems administrators, debuted a streaming Big Data Analytics services that is based on open source Storm and can be applied to numerous use cases beyond application monitoring.

The new cloud-based service is called StreamReduce and, according to the company, can help clients make sense of (and take action based) on multiple flows of multi-structured data as it hits the system. StreamReduce, and streaming Big Data analytics generally, is a compliment to other Big Data approaches, particularly Hadoop, that are not optimized to make sense of data in real-time.

Why Do We Need A Dedicated Streaming Big Data Service When We Have Hadoop?

Here’s the crux of the problem. Hadoop has proven itself a reliable platform for crunching and analyzing large volumes of historical, multi-structured data. But the open source Big Data framework was not designed to process and analyze data in real-time and nobody has yet figured out a way to add such capabilities to Hadoop.

Why? Because at its core Hadoop is a batch-and-load-oriented framework. That is, to get data into Hadoop, you gather up all the data you want to crunch and perform a large data dump into HDFS, without the need to normalize or otherwise add a common structure to the data first. You can automate and schedule this process, but at this point in time there’s no practical way to perform streaming analysis via Hadoop.

The two approaches – historical, batch-and-load Big Data analytics and streaming Big Data analytics – have two different goals. The former’s aim is to discover historical patterns and trends in large volumes of multi-structured data that could data back days, weeks or even years. The latter’s goal is to derive as much value from data as it is created, then send it on its way.

Though different goals, the two approaches can and should work hand-in-hand. In an ideal scenario, as data is created it passes through a platform like StreamReduce, which performs analytics to detect anomalous or otherwise important events and triggers responsive actions. Once the data passes through the streaming Big Data analytics service, it is sent to a Big Data platform like Hadoop where Data Scientists can pore over the data to uncover further insights.

Tag Team: Streaming and Historical Big Data Analytics

As a simple example, consider the Twitter fire hose. The fire hose hits the streaming Big Data analytics service in real-time, where the data is mined to identify unhappy customers, all within in sub-second time frames. This kicks-off a series of responses, such as emails offering free services to those ticked off customers. Once the fire hose passes through the streaming service, some or all of the data is then sent into a queue to be loaded into Hadoop, where at a later date Data Scientists might combine it with other data sources to perform social graph analysis or to correlate social media activity to buying behaviors.

Streaming Big Data Analytics is particularly relevant to industries where the ability to respond faster than the competition – even just a split second faster – can mean the difference between success and failure. Think the financial services, energy and utilities, and consumer-facing retail industries. Streaming Big Data Analytics is also applicable across a number of horizontal use cases, including clickstream analysis, log file analysis and real-time advertising optimization.

Of course, financial services and trading firms have been using complex event processing engines to perform streaming analytics on high velocity but relatively structured data for years. But new approaches were needed to expand streaming analytics to even more data sources and to multi-structured data, which traditional CEP engines are not equipped to handle.

Marz and Storm

That’s why Nathan Marz invented Storm for streaming Big Data Analytics when he worked at BackType. The company was later acquired by Twitter, where Marz put the finishing touches on Storm and open sourced the project. Marz talked about Storm and its evolution on theCUBE at Strata 2012:

Of course the ideal ideal scenario is one platform that can perform both streaming and historical Big Data Analytics. But we’re not there yet. With the right expertise, however, engineers can architect the two systems – Storm and Hadoop – to work tightly together as Twitter does. I also expect streaming Big Data analytics vendors like Nodeable and HStreaming to partner closely with Hadoop providers to deliver the two services combined as seamlessly as possible.

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.