UPDATED 14:57 EDT / SEPTEMBER 24 2015

NEWS

StreamSets bags $12.5 from big-name VCs to simplify data transportation

Logistics have been the Achilles’ heel of many grand endeavors, and large-scale analytics projects are no exception. Moving data from different sources to a central location for processing is a challenge so monumental that several different vendors have emerged to try and tackle the problem, the newest being StreamSets Inc., which raised $12.5 million this morning to help make its pitch stand out.

Co-founders Girish Pancha, the former chief product officer of data integration kingpin Informatica Corp., and early Cloudera Inc. employee Arvind Prabhakar established the startup last year to distill the lessons they learned in their work into a new product. The result is the StreamSets Data Collector, which offers to radically simplify the consumption of information in the enterprise.

The open-source pipelining tool makes it possible to aggregate unstructured data with only a fraction of the work that legacy alternatives such as Informatica’s require to define ingestion parameters, according to the startup. That’s achieved using built-in monitoring capabilities that be customized to identify the structure of the incoming information and watch for unexpected deviations.

Such visibility is especially important for machine learning and predictive analytics projects, where concept drift, as the phenomenon of certain data’s natural tendency to change over time is known, can severely undermine the accuracy of reports. It’s an all too common occurrence in industries such as retail, which see seasonal shifts in consumer behavior that often require making major adjustments to their models.

Reducing the amount of work involved in dealing with concept drift frees up data scientists to focus on the next step of the analytics cycle, which is readying their information for processing. StreamSets promises to simplify that part, too, with a collection of pre-implemented operations for cleansing, modifying and integrating input from different sources.

The startup sees the capabilities of its tool coming handy in a wide range of use cases, including pulling various kinds of data into Hadoop for historical analysis, performing stream processing and populating data into internal search engines. The new funding round, which was led by Battery Ventures and New Enterprise Associates, will be used to broaden the appeal of StreamSets Data Collector even further.

Image via StreamSets

 


A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.