UPDATED 20:37 EDT / JUNE 07 2016

NEWS

Spark Summit keynote explores structured streaming, innovation in deep learning | #SparkSummit

Spark Summit 2016 opened today at the Hilton San Francisco Union Square with Matei Zaharia, chief technology officer at Databricks, Inc. and creator of Spark, revealing the latest version of Spark 2.0, the company’s largest release to date, which will be coming this month. For those developers who want a sneak peek, there is an unstable preview release at spark.apache.org, where you can take it for a test drive. Zaharia explained that the new release will remain highly compatible with Apache Spark 1.X, fixes issues with dependencies and has over 2000 patches from 280 contributors.

The focus of upgrading Spark 2.0 was led by three key ideas. The goals were to create a unified engine that will support end-to-end applications; high-level APIs that are easy to use and enables rich optimizations; and to support broad integration.

“It’s agnostic to the storage system so you can run it on data you have anywhere … and integrates with many libraries,” said Zaharia.

New and improved Spark 2.0

The new version will support structured API improvements around DataFrame, Dataset and SparkSession along with structured streaming that will allow users to query data in real time. Michael Armbrust, a software engineer at Databricks, later provided a demonstration of these new capabilities by showcasing the ease of use and the ability to begin with unstructured data, use ETL (“extract, transform and load”) data, use JSON (JavaScript Object Notation) to take the structured table and finally he took the same code and applied to a stream.

For the broader community, Zaharia explained that Spark 2.0 will also facilitate deep learning libraries, graph frames, PyData integration, reactive streams, C# bindings and JS bindings. Spark 2.0 will also provide deep dive structured APIs using an engine optimization plan with specialized code and DataSet static timing. Also new in Spark 2.0, will be whole stage code generation which will fuse across multiple operators and optimized input/output with Apache Parquet and built in cache.

High-level improvements

Structured streaming is the newest feature in this version of Spark. “Structured streaming is still very new and experimental,” said Zaharia. Using high-level streaming APIs built on a structured engine (DataFrames), structured streaming also supports interactive and batch queries, not just for streaming but for continuous applications. Other Spark 2.0 improvements include infinite DataFrames.

In keeping up with the industry, Spark 2.0 has been upgraded to aid in machine learning by allowing users to export models, load them in another program and move them to production. These enhancements include SparkR, MLlib 2.0 and new algorithms to provide deep learning experiences.

Growing the Apache Spark community

Recognizing the biggest challenge in applying Big Data, Zaharia notes the skills gap. In order to combat that hole, Databricks is offering the “Community Edition”, a place where developers can find interactive tutorials, Apache Spark and popular data science libraries and visualization and debug tools.

Additionally, in conjunction with Berkeley University of California, UCLA and edX, there is a free five-course series that will provide courses such as Introduction to Apache Spark, Distributed Machine Learning, Big Data Analysis, Advanced Apache Spark for Data Science and Data Engineering and Advanced Machine Learning. The company completed the beta in February and now these courses are available on d.bricks.com/mooc16.

Deep learning with Google

Jeff Dean Google senior fellow at Google, Inc. spoke next about deep learning using data. He demonstrated how you can teach a machine to learn things you never thought possible before. He used examples of perceptional data that facilitated in graphical recognition, eliminating the need to tag photos. Deep learning is a powerful class of machine learning with a modern reincarnation of artificial neural networks that uses a collection of simple, trainable mathematical functions.

The systems work to build layers of abstractions. The concept is loosely based on the human brain and according to what it sees, it decides what it wants to say. Ultimately, like our brains, the machine learns to cooperate to accomplish the task. According to Dean, the results get better with more data, bigger models and more computation. “Better algorithms, new insights and improved techniques always help too,” he said.

Dean also showcased TensorFlow, an open source software library for numerical computation using data flow graphs. He demonstrated the flexible architecture, which allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. According to Dean, TensorFlow is improving voice recognition and photo searches. To learn more about this open source software, visit http://tensorflow.org.

Deep learning trends

Andrew Ng, chief scientist of Baidu; chairman and co-founder of Coursera; associate professor (Research) of Stanford University took to the stage to talk about deep learning trend and how AI will impact teams and industries. He compared large neural networks as the engine or driver of the trend and data as the fuel. He spoke about how speech recognition has changed due to speech system complex features that provide the end-to-end learning.

His key takeaways were that scale drives AI progress and learning complex outputs offer the end-to-end learning. According to Ng, AI is the new electricity. He also predicts the future trends in AI. In the short term, companies will build a centralized AI function and sprinkle it on the existing business. In the long term, AI will be deeply incorporated into the business and novel business strategies will be built on AI.

The possibilities of production

Lastly, Marvin Theimer, distinguished engineer at Amazon Web Services delivered his talk, “From Prototype to Production” in which he covered how to bring your ideas to market. He listed several qualities that needed to be part of any solution. Scalability, high availability, maintainability and evolvability. He spoke to the challenges and offered advice on making your prototype efficient and user-friendly.

Visit the Spark Summit 2016 event page for more information, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of Spark Summit 2016.

Photo by HPE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Spark Summit keynote explores structured streaming, innovation in deep learning | #SparkSummit

New and improved Spark 2.0

High-level improvements

Growing the Apache Spark community

Deep learning with Google

Deep learning trends

The possibilities of production

Photo by HPE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

IBM Think 2026

Dell Technologies World 2026

KB4-CON 2026

VeeamON 2026

Boomi World 2026

Spark Summit keynote explores structured streaming, innovation in deep learning | #SparkSummit

New and improved Spark 2.0

High-level improvements

Growing the Apache Spark community

Deep learning with Google

Deep learning trends

The possibilities of production

Photo by HPE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

IBM Think 2026

Dell Technologies World 2026

KB4-CON 2026

VeeamON 2026

Boomi World 2026

Cookies