UPDATED 12:35 EDT / MARCH 29 2016


LinkedIn’s newest open-source tool will crash your application to test its resilience

Practically every important application in the enterprise today comes with safeguards designed to protect against common technical problems like server outages. But there’s often a big difference between how a service is expected to handle an issue and how it does so in practice, which requires organizations to painstakingly test for any weak points that may have slipped through their quality controls. LinkedIn Inc. moved to ease the chore this week by open-sourcing the homegrown system that its engineers use internally to assess the resilience of its infrastructure.

The company developed Simoorg, as the software is called, after finding older failure induction technologies like Chaos Monkey (the brainchild of fellow web giant Netflix Inc.) to be inadequate for its purposes. LinkedIn needed a tool that can not only check how well a workloads deal with technical trouble in general, but also simulate specific operational conditions where its internal processes are likely to run into trouble. That includes every small detail down to the amount of traffic an application handles and how much latency it’s experiencing.

Simoorg also provides the ability to customize the way a test is carried out to ensure that it’s reflective of what a real-life outage would look like. An engineer could point the system at a certain group of servers, set how long each machine will be taken offline and then specify the precise sequence in which the process should be executed. LinkedIn even included the option to have hardware components disabled at a random order, an addition that makes it possible to check how an application performs in situations that the IT department can’t necessarily anticipate.

The versatility of Simoorg enables organizations to simulate everything from the effects of a bad patch to severe hardware failures spread out throughout an entire data center. Its customizability also allows for tests to be tweaked with relative ease, which gives users the ability to explore more nuanced issues like whether a service’s susceptibility to hardware outages increases above a certain traffic threshold. The knowledge gleaned using the system is useful both for developers looking to improve the resilience of their applications and operations professionals charged with troubleshooting problems with their organizations’ infrastructure.

Image via Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy