As Promised, NetFlix Open Sources Chaos Monkey

Content streaming and movie rental service Netflix  utilizes cloud computing to power its core operations. In fact, the bulk of Netflix’s infrastructure is cloud-based, and it is one of Amazon Web Services’ (AWS) largest customers. Netflix has developed an entire arsenal of tools that help it manage its massive cloud environment and more efficiently manage outages and technical issues.

Netflix refers to these tools as the Simian Army. The software includes colorful named items like Latency Monkey, Chaos Gorilla and Chaos Monkey. If you couldn’t by the name, Chaos Monkey is a scaled-down version of Chaos Gorilla. (Who says developers don’t have a sense of humor.)

Chaos Monkey is a service that runs on AWS and improves application resiliency by helping ensure an application can remain running if an instance unexpectedly shuts down – a universally helpful capability for any cloud-based application. Chaos Monkey works by randomly killing instances. If an application is well designed, the outage of a single node shouldn’t impact it. Developers can use the service to identify unnecessary dependencies and weed out architectural problems. Chaos Monkey was developed for AWS, but according to Netflix it is flexible enough to work with other cloud providers.

As promised in April, Netflix has made the code publicly available as open source. The company announced the Chaos Monkey’s open source launch in an official blog post. According to the post, developers that use the service can be confident the tool has already been field tested. The announcement explained,

 “Chaos Monkey has terminated over 65,000 instances running in our production and testing environments. Most of the time nobody notices, but we continue to find surprises caused by Chaos Monkey which allows us to isolate and resolve them so they don’t happen again.”

The code for Chaos Monkey is available on GitHub. In addition to Chaos Monkey, Janitor Monkey, a tool similar to Cloudability that tracks down unused resources, might be the next open source candidate.

RELATED:  Nadella delivers another shocker as Microsoft embraces Red Hat in cloud alliance

Incidents like the recent Amazon outage and Azure’s Western European blackout show the importance of such solutions. In spite of Netflix’s preparation, the AWS failure still managed to take the service down. Netflix’s availability architecture did manage to reduce the impact of the damage.

Maria Deutscher

Maria Deutscher

Maria Deutscher is a staff writer for SiliconANGLE covering all things enterprise and fresh. Her work takes her from the bowels of the corporate network up to the great free ranges of the open-source ecosystem and back on a daily basis, with the occasional pit stop in the world of end-users. She is especially passionate about cloud computing and data analytics, although she also has a soft spot for stories that diverge from the beaten track to provide a more unique perspective on the complexities of the industry.
Maria Deutscher


Join our mailing list to receive the latest news and updates from our team.

Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Share This

Share This

Share this post with your friends!