UPDATED 10:50 EDT / FEBRUARY 08 2017

BIG DATA

WATCH LIVE: Spark Summit explores growth and challenges in open source tech | #SparkSummit

Kicking off today is our live broadcast from Spark Summit East, an event dedicated to the open-source community as Apache Spark tackles the biggest data-wrangling challenges in enterprise information technology. With ongoing enhancements for intelligent automation and code consolidation, Spark is expected to grow its influence in enterprise environments, continuing to close complexity gaps in the Hadoop open-source ecosystem.

Ahead of the conference we heard from George Gilbert, analyst with Wikibon (owned by the same company as SiliconANGLE), on Spark’s market opportunities and remaining obstacles. Below is a recap of Gilbert’s commentary, along with resources for viewing our live broadcast and archived interviews with some of the industry’s leading experts. (* Disclosure below.)

As Internet powerhouses such as Yahoo, eBay and Netflix deploy Spark at massive scale, Spark has seen rapid adoption by enterprises across a wide range of industries. It also claims to be the largest open-source community in big data, with more than 1,000 contributors from more than 250 organizations, the most recent contribution from Intel. The chipmaker boosted Spark’s deep-learning capabilities with the open sourcing of BigDL, a distributed library leveraging existing Spark clusters to run deep learning computations while simplifying data loading from large datasets stored in Hadoop.

This dedication to simplification processes, including advanced support for speed and security from Apache Software Foundation’s latest top-tier additions Apache Beam and Apache Eagle, is what makes Spark so appealing, according to Gilbert. Spark’s deep integration among its libraries of code means it can “minimize the number of building blocks needed for developing machine-learning pipelines that would otherwise have to come from multiple vendors,” he explained.

Dual support for transforming data using either batch processing or streaming with SQL libraries gives Spark a leg up in readying data for machine-learning programs, he added. So even as Hadoop complexities continue to fragment its ecosystem, Spark’s ability to unify processes make it the choice for engineering at scale.

Calling all third parties

Nevertheless, Spark’s rapid growth faces its own set of challenges to avoid the same pitfalls as Hadoop, which grew so quickly that it lacked much top-down guiding architecture, leaving cracks in the ecosystem. Despite a commitment to speed and fresh efforts supporting machine-learning methods, Spark isn’t yet fast enough to do hyper-scale predictions, leaving “developers to convert Spark’s machine-learning models into a language that’s faster, such as C++ or Java,” Gilbert noted. Without a native database to call its own, Spark also leaves gaps in its ecosystem, requiring ongoing integration with third-party services.

Gilbert called out two other areas where Spark will need third-party help. One is the process of ingesting data. With most usage scenarios defaulting to the open-source stream processing platform Apache Kafka, he said, “Spark still needs a fair amount of work under the covers to be able to handle this step.”

The other challenge is real-time data analytics in the Internet of Things era, where edge devices may need a single event analyzed and acted upon immediately. “As Spark’s stream processing can’t analyze one event at a time, Spark could make it difficult to support IoT workloads at the edge,” Gilbert explained.

How will Apache Spark face these challenges? SiliconANGLE will get the inside scoop at Spark Summit East, broadcasting live from the roving news desk, theCUBE. During the event, theCUBE hosts Dave Vellante and George Gilbert will talk with industry experts about the future of Apache Spark, how to use the Spark stack in a variety of applications, the best practices for deploying Spark at Scale, and use cases from leading organizations solving big data problems. TheCUBE guests are set to include:

Ziya Ma, vice president of big data at Intel
Mike Gualtieri, lead big data analyst at Forrester Research
Alfred Essa, vice president of analytics and R&D at McGraw-Hill Education
John Landry, distinguished technologist for HP Personal Systems Data Science
Many more to come

Where to watch

Watch theCUBE live during Spark Summit East 2017 on SiliconANGLE TV.

Also, make sure to check in during the event with theCUBE hosts Dave Vellante (@dvellante) and George Gilbert (@ggilbert41) via Twitter.

Watch live

From Feb. 8–9, join theCUBE live during Spark Summit East 2017 by viewing the real-time video stream here. Or watch live below:

Contributors: Cheryl Knight

(*Disclosure: TheCUBE is a media partner at the conference. Neither Databricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

WATCH LIVE: Spark Summit explores growth and challenges in open source tech | #SparkSummit

Calling all third parties

Where to watch

Watch live

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Securing the AI Factory with Dell Technologies and Intel 2026

Atlassian Team 2026

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

WATCH LIVE: Spark Summit explores growth and challenges in open source tech | #SparkSummit

Calling all third parties

Where to watch

Watch live

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Securing the AI Factory with Dell Technologies and Intel 2026

Atlassian Team 2026

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

Cookies