UPDATED 14:57 EDT / SEPTEMBER 24 2015

NEWS

StreamSets bags $12.5 from big-name VCs to simplify data transportation

Logistics have been the Achilles’ heel of many grand endeavors, and large-scale analytics projects are no exception. Moving data from different sources to a central location for processing is a challenge so monumental that several different vendors have emerged to try and tackle the problem, the newest being StreamSets Inc., which raised $12.5 million this morning to help make its pitch stand out.

Co-founders Girish Pancha, the former chief product officer of data integration kingpin Informatica Corp., and early Cloudera Inc. employee Arvind Prabhakar established the startup last year to distill the lessons they learned in their work into a new product. The result is the StreamSets Data Collector, which offers to radically simplify the consumption of information in the enterprise.

The open-source pipelining tool makes it possible to aggregate unstructured data with only a fraction of the work that legacy alternatives such as Informatica’s require to define ingestion parameters, according to the startup. That’s achieved using built-in monitoring capabilities that be customized to identify the structure of the incoming information and watch for unexpected deviations.

Such visibility is especially important for machine learning and predictive analytics projects, where concept drift, as the phenomenon of certain data’s natural tendency to change over time is known, can severely undermine the accuracy of reports. It’s an all too common occurrence in industries such as retail, which see seasonal shifts in consumer behavior that often require making major adjustments to their models.

Reducing the amount of work involved in dealing with concept drift frees up data scientists to focus on the next step of the analytics cycle, which is readying their information for processing. StreamSets promises to simplify that part, too, with a collection of pre-implemented operations for cleansing, modifying and integrating input from different sources.

The startup sees the capabilities of its tool coming handy in a wide range of use cases, including pulling various kinds of data into Hadoop for historical analysis, performing stream processing and populating data into internal search engines. The new funding round, which was led by Battery Ventures and New Enterprise Associates, will be used to broaden the appeal of StreamSets Data Collector even further.

Image via StreamSets

 


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU