The challenge of handling the growing amounts of real-time data entering the corporate network returned to the center of the analytics discussion last week after IBM Corp. added four new stream processing services to its public cloud. The star of the lineup is a machine learning engine that allows organizations to create pre-programmed algorithms for picking out useful information from the transmissions generated by the growing number of connected devices deployed in the workplace.
An electronics manufacturer, for instance, could use the service to immediately detect when a sensor embedded in an expensive piece of equipment signals a malfunction and automatically alert the nearest technician. IBM is touting the functionality as a way to cut through the massive volume of machine-generated signals produced every second in such environments, which can overburden not only analysts but also the technology infrastructure that supports their work. The latter problem has proven to be a particularly major hindrance for the internal business intelligence efforts of Yahoo! Inc., which last week open-sourced its engineers’ solution.
The Data Sketches library is a collection of stream processing algorithms that avoid the overhead of individually scanning every new item in a real-time feed by substituting direct counting with estimation, thereby sacrificing some accuracy for a significant speed boost. A proof-of-concept implementation Yahoo! showcased to the press was able to blaze through a sample set of 100 million values that took two and a half minutes to analyze by regular means in just under three seconds. That technology is meant mainly to help provide high-level measurements like visitor statistics, which fellow web giant Flipboard Inc. is also tackling.
The startup followed up the release of Data Sketches with the introduction of an embedded Google Analytics dashboard for its namesake content aggregation portal that aims to help publishers gain a better understanding of their audiences. The integration makes it possible to track visitors in real-time with visibility into all the usual metrics, including engagement and churn rate.