

Logistics have been the Achilles’ heel of many grand endeavors, and large-scale analytics projects are no exception. Moving data from different sources to a central location for processing is a challenge so monumental that several different vendors have emerged to try and tackle the problem, the newest being StreamSets Inc., which raised $12.5 million this morning to help make its pitch stand out.
Co-founders Girish Pancha, the former chief product officer of data integration kingpin Informatica Corp., and early Cloudera Inc. employee Arvind Prabhakar established the startup last year to distill the lessons they learned in their work into a new product. The result is the StreamSets Data Collector, which offers to radically simplify the consumption of information in the enterprise.
The open-source pipelining tool makes it possible to aggregate unstructured data with only a fraction of the work that legacy alternatives such as Informatica’s require to define ingestion parameters, according to the startup. That’s achieved using built-in monitoring capabilities that be customized to identify the structure of the incoming information and watch for unexpected deviations.
Such visibility is especially important for machine learning and predictive analytics projects, where concept drift, as the phenomenon of certain data’s natural tendency to change over time is known, can severely undermine the accuracy of reports. It’s an all too common occurrence in industries such as retail, which see seasonal shifts in consumer behavior that often require making major adjustments to their models.
Reducing the amount of work involved in dealing with concept drift frees up data scientists to focus on the next step of the analytics cycle, which is readying their information for processing. StreamSets promises to simplify that part, too, with a collection of pre-implemented operations for cleansing, modifying and integrating input from different sources.
The startup sees the capabilities of its tool coming handy in a wide range of use cases, including pulling various kinds of data into Hadoop for historical analysis, performing stream processing and populating data into internal search engines. The new funding round, which was led by Battery Ventures and New Enterprise Associates, will be used to broaden the appeal of StreamSets Data Collector even further.
THANK YOU