UPDATED 13:00 EDT / APRIL 08 2016

NEWS

LinkedIn releases its internal Hadoop optimization tool

Less than two weeks after open-sourcing its internal application testing system, LinkedIn Inc. is expanding its open-source software portfolio once again with the release of Dr. Elephant, a performance optimization engine designed to help speed up Hadoop queries. The software was developed to spare the social networking giant’s data science team from having to manually instruct analysts on how to fine-tune their workflows.

The chore took too much time away from the unit’s other activities due to the fact that many of the employees on who rely on LinkedIn’s internal analytics environment, which includes both Hadoop and Spark, aren’t particularly familiar with the cluster’s inner workings. Having a query execute at optimal speed is difficult even if one does possess a thorough understanding of the frameworks since performance is influenced by numerous different configuration settings that each must be tweaked individually. And to make matters even more complicated, many of those settings are also interdependent, which means that setting a parameter to the wrong value can potentially send a user straight back to square one.

Dr. Elephant promises to do away with much of that hassle by automatically analyzing the operations logs from an analytics cluster to identify why queries aren’t running as fast as they should. Its findings are displayed in a visual dashboard that enables users to see how performance fluctuates over the course of a given workflow’s execution and compare the speed with previous runs. According to LinkedIn, the functionality allows analysts to quickly tweak their jobs until finding the right fix.

The company claims that Dr. Elephant can thereby help resolve 80 percent of the optimization issues that crop up during day-to-day analytics work. That adds up to a lot of saved time for LinkedIn’s data science team across the roughly 10,000 Hadoop and Spark jobs that employees run every day, a benefit other organizations are now able to exploit as well. The engine’s new open-source status means allows for its functionality to be customized according to the specific requirements of the analytics cluster in which it’s deployed, a major boon for potential adopters.

Image via Geralt

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU