

Content streaming and movie rental service Netflix utilizes cloud computing to power its core operations. In fact, the bulk of Netflix’s infrastructure is cloud-based, and it is one of Amazon Web Services’ (AWS) largest customers. Netflix has developed an entire arsenal of tools that help it manage its massive cloud environment and more efficiently manage outages and technical issues.
Netflix refers to these tools as the Simian Army. The software includes colorful named items like Latency Monkey, Chaos Gorilla and Chaos Monkey. If you couldn’t by the name, Chaos Monkey is a scaled-down version of Chaos Gorilla. (Who says developers don’t have a sense of humor.)
Chaos Monkey is a service that runs on AWS and improves application resiliency by helping ensure an application can remain running if an instance unexpectedly shuts down – a universally helpful capability for any cloud-based application. Chaos Monkey works by randomly killing instances. If an application is well designed, the outage of a single node shouldn’t impact it. Developers can use the service to identify unnecessary dependencies and weed out architectural problems. Chaos Monkey was developed for AWS, but according to Netflix it is flexible enough to work with other cloud providers.
As promised in April, Netflix has made the code publicly available as open source. The company announced the Chaos Monkey’s open source launch in an official blog post. According to the post, developers that use the service can be confident the tool has already been field tested. The announcement explained,
“Chaos Monkey has terminated over 65,000 instances running in our production and testing environments. Most of the time nobody notices, but we continue to find surprises caused by Chaos Monkey which allows us to isolate and resolve them so they don’t happen again.”
The code for Chaos Monkey is available on GitHub. In addition to Chaos Monkey, Janitor Monkey, a tool similar to Cloudability that tracks down unused resources, might be the next open source candidate.
Incidents like the recent Amazon outage and Azure’s Western European blackout show the importance of such solutions. In spite of Netflix’s preparation, the AWS failure still managed to take the service down. Netflix’s availability architecture did manage to reduce the impact of the damage.
Support our open free content by sharing and engaging with our content and community.
Where Technology Leaders Connect, Share Intelligence & Create Opportunities
SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.