UPDATED 09:48 EDT / JUNE 04 2014

Hadoop vendors filling holes throughout entire analytics lifecycle | #HadoopSummit recap

John Furrier theCUBE Hadoop Summit 2014 The third annual Hadoop Summit in San Jose is off to a flying start with thousands in attendance and a record number of sponsors and exhibitors representing the full industry gamut, from incumbent vendors fighting to hold onto their turf to the ambitious startups looking to take their place. It’s the latter group that stole the spotlight on the first day of the event with a bevy of product announcements spanning the entire analytics lifecycle, beginning at the very start: preparing data for processing.

Checking all the boxes on information quality

Data scientists today spend as much as 80 percent of their time filtering out errors and inconsistencies and working around compatibility issues, according to Pentaho. The Hadoop business intelligence (BI) firm promises to help customers flip that number on its head with a new toolkit aimed at streamlining the process of readying information for analysis.

Included in the Data Science Pack are three utilities designed to simplify life for users working with Pentaho’s Weka open source data mining project and the R statistical language, two of the most widely used analytic technologies in the industry. Among the tools is a script execution engine that offloads all the messy details of the data transformation process to the company’s software, a scoring engine that rates datasets based on accuracy and an automated forecasting solution that generates predictions on incoming information.

Pentaho says that the bundle can make not only make it easier for users to whip their information into analyze shape but take the hassle out of blending multiple sources as well, a challenge Talent is also addressing with the latest edition of its namesake platform. The release brings with it the ability to import multi-gigabyte documents into Hadoop and provides a visual environment for integrating different streams with response times up to 45 percent faster than the previous version, according to the company.

Cutting out the middle-man

While some vendors are focusing on helping data scientists be more productive, others are working to eliminate the need for specialized talent altogether. Actian is firmly in the latter camp. It too made headlines the summit today after joining the ranks of the dozens of companies offering structured query capabilities for Hadoop with the introduction of a new SQL feature for its flagship analytics platform. The value proposition is a familiar one: the company claims that business users can leverage its software to access data stored in HDFS directly instead of going through a data scientist.

Altoscale has begun offering similar functionality to users of its Hadoop cloud, which supports the latest stable release of Apache Hive as of this morning. The open source data warehouse was originally developed by Facebook to save its developers the trouble of familiarizing themselves with MapReduce or the slightly less complex but still unwieldy Pig platform and simply use familiar SQL syntax instead.

Being able to access and manipulate data in Hadoop without getting bogged down in the inherent complexity of the batch processing framework is vital to enable the kind of velocity business users have come to expect from their application, but using a structured query tool is not the way to accomplish that. MetaScale, an analytics firm owned by Sears, says its newly launched “Ready-to-Go Reports” service can achieve the same results at a fraction of the cost of on-premise alternatives by eliminating the need for both data scientists and costly in-house infrastructure.

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Hadoop vendors filling holes throughout entire analytics lifecycle | #HadoopSummit recap

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Hadoop vendors filling holes throughout entire analytics lifecycle | #HadoopSummit recap

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cookies