UPDATED 13:00 EDT / APRIL 08 2016

NEWS

LinkedIn releases its internal Hadoop optimization tool

Less than two weeks after open-sourcing its internal application testing system, LinkedIn Inc. is expanding its open-source software portfolio once again with the release of Dr. Elephant, a performance optimization engine designed to help speed up Hadoop queries. The software was developed to spare the social networking giant’s data science team from having to manually instruct analysts on how to fine-tune their workflows.

The chore took too much time away from the unit’s other activities due to the fact that many of the employees on who rely on LinkedIn’s internal analytics environment, which includes both Hadoop and Spark, aren’t particularly familiar with the cluster’s inner workings. Having a query execute at optimal speed is difficult even if one does possess a thorough understanding of the frameworks since performance is influenced by numerous different configuration settings that each must be tweaked individually. And to make matters even more complicated, many of those settings are also interdependent, which means that setting a parameter to the wrong value can potentially send a user straight back to square one.

Dr. Elephant promises to do away with much of that hassle by automatically analyzing the operations logs from an analytics cluster to identify why queries aren’t running as fast as they should. Its findings are displayed in a visual dashboard that enables users to see how performance fluctuates over the course of a given workflow’s execution and compare the speed with previous runs. According to LinkedIn, the functionality allows analysts to quickly tweak their jobs until finding the right fix.

The company claims that Dr. Elephant can thereby help resolve 80 percent of the optimization issues that crop up during day-to-day analytics work. That adds up to a lot of saved time for LinkedIn’s data science team across the roughly 10,000 Hadoop and Spark jobs that employees run every day, a benefit other organizations are now able to exploit as well. The engine’s new open-source status means allows for its functionality to be customized according to the specific requirements of the analytics cluster in which it’s deployed, a major boon for potential adopters.

Image via Geralt

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.