UPDATED 10:31 EDT / JUNE 28 2013

NEWS

eHarmony Refines the Science of Love : Hadoop + Machine Learning | #hadoopsummit

For our flagship broadcast program theCUBE, Jeff Kelly interviewed Vaclav Petricek, Principal Data Specialist with eHarmony, live from Hadoop Summit 2013, talking about long term compatibility and the underlying architecture that makes it possible.

Petricek runs machine learning applications at eHarmony, in order to decide who they should introduce to whom, and when. For that, they use Hadoop and logical machine learning. “eHarmony is a bit different than your typical dating site,” brags Petreicek. Those are search-based, with results generated by certain search criteria. The founder of eHarmony is Neil Clark Warren, a marriage counselor. After years of counseling couples in failing marriages, he wanted to help people not only meet the people they would be attracted to, but also the people they are compatible with.

As for the underlying technology that makes this possible, Petricek explains: ”To match people effectively, you need to solve three separate problems. The first one is long term compatibility, then there’s the affinity matches (based on age and location), and finally, distribution (who to introduce to whom and when).”

An affinity for Hadoop

Hadoop and large scale machine learning are used for the affinity part. To predict whether or not two people would be interested in talking to each-other, eHarmony uses the historical data generated by their 10 years of operations. As for the data itself, Petricek clarifies: “Over the years the questionnaires have evolved, but certain questions have survived. It used to be 500 questions and now it’s down to 150, which is a lot of data, enough to ‘know’ someone. That’s how you can still make recommendations to people who joined the site recently.” The questionnaire alone is not the only tool. eHarmony collects behavioral data, when they are logging in and how often, what kind of devices they are using.

Jeff Kelly wanted to know next how the problem of people who are not answering the 150 questions truthfully is addressed. “You cannot force someone to answer truthfully, but we offer incentives to do so, in order to get the right matches. It’s a science in itself to design the questions in such a way to get the underlying psychological traits, and not what the person would like to be.”

Talking about the technology itself, Petricek explained: ”We store all of our data in-house, on Hadoop cluster, in HDFS, and on top of that we run Hive, which provides the SQL interface, and then we do the machine learning modeling. We use a lot of vowpal wabbit, a large-scale machine learning open source written by John Langford, that can scale on the Hadoop cluster. And lastly, we use some genetic algorythms.”

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

eHarmony Refines the Science of Love : Hadoop + Machine Learning | #hadoopsummit

An affinity for Hadoop

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

eHarmony Refines the Science of Love : Hadoop + Machine Learning | #hadoopsummit

An affinity for Hadoop

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies