Trulia’s new data story: Image search and beyond
Zillow, Inc. completed its integration of the Trulia platform in Q4 2015, and the results exceeded the company’s expectation. Zillow’s user base broadened, and its up-sells to advertisers increased significantly.
Maintaining its own web destination, Trulia is an online marketplace focused on providing unique insights about properties, neighborhoods, commute times and school districts for home buyers, sellers and renters, and the company believes its core strength is its data, of which it collects close to 1.5TB of data daily. The company’s data science, engineers and data mining team transforms this data for the human experience to make the consumer experience better.
SiliconANGLE recently spoke to Deep Varma, vice president of data engineering at, Trulia, Inc., about the company’s evolving data science, how it uses data science to improve the user experience, and how it can benefit from the Open Data movement.
Creating an ‘emotional’ user experience
Q: How have your data science roles evolved in recent years?
Varma: We started as a traditional search engine 10 years ago, but have added more valuable insights over time. Where we’ve been investing a lot in the past two years is our Discovery homepage and search. We’ve created collections for our consumers, such as a “new listings”, and “homes with pools”. We’re moving toward a more personalized experience. So, say you visit our site and come back two weeks later, we’re going to have a collection for you based on your search from two weeks ago, and you can see what you’ve missed since you last visited. We feel the traditional search will disappear.
We’re also investing in visual browsing. Think about when agents provide listings. They have images of kitchens, baths, etc.,and we’re looking at how we can use object recognition to personalize searches (think white marble floors in kitchen)? The bottom line is, we are taking this very emotional experience of buying a home and applying it to computer science.
Q: How are you using data science today to improve the user experience?
Varma: Data science — this is where we truly differentiate. The question is, how can we build more consumer engagement with predictive tech, click-thru rates, scoring technology, and pricing models? Data science is pervasive across all Trulia product development. We use Hadoop as our underlying platform and MapReduce for file systems. We use MapReduce for batch models we can wait for.
We’ve recently transferred to Spark because it offers more flexibility. We can do micro-batching in Spark, Elasticsearch. We are also using some Elastic MapReduce (EMR) solutions from Amazon, but we mostly have our own homegrown servers.
We are also investing in the consumer digital footprint to build a cross-device understanding of anonymous consumers. Our team looks into consumer searches (for instance, two bedrooms/two bathrooms under $500K) and builds recommendation from there. Those recommendations are applied at city/state/zip code level and email push programs. We tell a story to our users. We don’t want to come across as badgering users.
Our last area of investment for data science is neighborhood content. What’s unique about a given neighborhood? It’s like a natural language process, looking into neighborhoods. We have so much info about a neighborhood. We have images, listing info (number of bedrooms, how many times it’s been sold, year built, neighborhood churn, school districts, public info, etc.). We take all this content and our user-contributed content (questions from users), and we can predict things like the safety of a neighborhood.
Closing the gap on real-time data
Q: What are the challenges Trulia faces in light of industry-operated sites like har.com and realtor.com?
Varma: This is one area that the quality of our listings is better than ever before. We’re sourcing closely with MLS resources for direct relationships to improve accuracy.
Q: How has the merger with Zillow, Inc. benefitted Trulia’s data sources, analytics and delivery?
Varma: These two forces coming together has allowed us to unlock innovation and to align time and resources. With this, relationships with the industry improve to close that gap toward real-time data.
Q: How exactly can Trulia benefit from the Open Data movement in California and other states?
Varma: When we collect crime scores, it’s mostly machine learning, but as police departments disclose that info (protecting privacy), how can we build on our crime score? Another example is public transit systems. These play a crucial role in daily consumer behavior. This data still is lacking in open data terms. We’d like to incorporate this real time into our heat maps.
About Deep Varma, vice president of data engineering, Trulia, Inc.
Varma manages data engineering functions across the Trulia business, including the vital acquisition of listings and public records, the consumer search experience and API, email/push, efforts to enhance personalization, industry leading location services such as geo coding, as well as data science, data warehouse and reporting. During his 17 years of Silicon Valley experience, he has focused on building large-scale distributed data platforms with IBM, ABB, Yahoo! and other startups. Varma is a graduate of the Haas School of Business at the University of California Berkeley.
photo credit: Newtown grafitti via photopin cc
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU