Igniting chat app capabilities with Spark | #SparkSummit


Texting is so 2009. By 2010 all the cool kids switched to using the mobile chat app Kik to communicate with each other through their smartphones. The Waterloo, Ontario startup hit the market at a time when mobile companies were charging for text messages.

“Kik really took off in 2010 when it got 2 million users in the first 22 days of existence. It was insanely popular specifically with U.S. youth,” stated Joel Cumming (pictured), head of data at Kik Interactive Inc.

Dave Vellante (@dvellante) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE’s mobile live-streaming studio, interviewed Cumming at Spark Summit East 2017 in Boston, MA, to uncover the role data plays in a chat app.

What Spark can do for one startup

While working at BlackBerry, Cumming worked on the BlackBerry Messenger (BBM), a desktop chat app. He led a team of 40 data scientists and engineers, but made a move to Kik because it was small and fast-moving. The company was not leveraging the data it had stored on Amazon S3. Additionally, he learned that now he was a team of one.

“On the first day our CEO said, ‘Your a data guy. I want you to tell me in a week why people leave Kik,’” stated Cumming. “There wasn’t even a database yet.” However, he worked with Amazon RedShift and Amazon Web Services tools to try to transform the data using [Amazon] EMR and Apache Pig, a programming framework to analyze and transform large data sets. By the end of the week, he could not answer the question, but he was able to provide ideas and opportunities based on preliminary explorations.

Initially, his job was to understand behaviors across the app, and Cumming delved into the data that he had, instrumented new events and in the past year he built out an A/B testing framework, which contributes to leveraging data at Kik. He has grown his team and implemented new policies for using data in a more personalized way.

Currently, Kik mostly uses batch processing. Cumming views Spark as a tool to get to true personalization and receive immediate recommendations. What does he want in the future?

“More real-time data coming in from Spark Streaming, with more real-time model scoring and the ability to push that over into some capability that can be surfaced up through an API,”  Cumming said, with the expectation that Spark will provide his data team the capacity to be flexible and fast.

“Surfacing things that can be personalized to the end user as opposed to what we have now, which is all this batch processing and loading once a day, knowing we can’t react on the fly,” Cumming disclosed.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of the Spark Summit East 2017 Boston. (*Disclosure: TheCUBE is a media partner at the conference. Neither Databricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo by SiliconANGLE