UPDATED 11:46 EDT / AUGUST 13 2014

The biggest surprise Etsy encountered when applying HP Vertica to search | #HPBigData2014


In a time when Big Data forces companies to re-architect hardware, software and hierarchies, Chris Bohn, Senior Database Engineer with Etsy, Inc. shares the biggest surprises and the true cost of implementing Hewlett-Packard, Co.’s HP Vertica products into the ecommerce platform.

“Interface is everything,” disclosed Bohn to theCUBE co-hosts John Furrier and Dave Vellante, recognizing its importance when it comes to providing end user access to Hadoop’s capabilities. During a live interview at this week’s HP Vertica conference, Bohn noted that Etsy’s 30 terabytes of data are efficiently and quickly analyzed thanks to its adoption of Vertica. “Vertica has given us accessibility and speed.”

Talking in depth about the rise of Etsy and how his team had to scale the company’s infrastructure, Bohn mentioned the initial monolithic Postgres database which accommodated all transactions, forums and conversations. After a year they sharded data vertically, enabling certain sections to have their own dedicated databases, and after another year they sharded horizontally on MySQL databases, developing a taste for analytics and data aggregation.

“Once we’ve got Vertica in house, it’s amazing what difference it has made. We were able to take queries that our analysts were running on business intelligence machines in four days, which suddenly were running in minutes,” explained Bohn.

Etsy uses Hadoop to improve its search algorithms. It’s also using MapReduce and Mathematica, and for a while Etsy did such on Amazon Cloud. The $80,000 monthly bill is what drove Bohn and his team to look for an alternative. So they bought a 200 node cluster and acknowledged a new problem: “Hadoop is hard. Give me your best Hadoop engineer and I give him a 1/10 shot of getting a MapReduce right the first time,” dared Bohn, noting that MapReduce is a very iterative, specialized and complex process. “We actually use Vertica as a front-end to Hadoop,” admitted Bohn.

Read more after the video:

Biggest surprise with Vertica


“We thought only our analysts would use [Vertica]: then we had to up our licence and get more nodes because everybody was jumping in the Vertica action,” Bohn revealed. “It’s so fast we use it to do so many things: we power all our internal dashboards on it, we use it to get a tight loop on our A/B testing, we run our financial reports on it,” added Bohn.

  • Total cost of ownership

With Vertica, “we didn’t have to hire any new people because we found out that our DBA could administer this just fine; it shares a lot of its DNA with Postgres, which is very familiar,” explained Bohn. “We didn’t have to change any of our queries. We’ve invested many hours into hundreds of reporting queries running on that Postgres business intelligence server, and we were able to bring those over and run them unchanged in Vertica, having a kick up in speed,” Bohn went on. “Rewriting all these queries would have been a big expense.”


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy