UPDATED 06:00 EST / AUGUST 01 2012

As Promised, NetFlix Open Sources Chaos Monkey

Content streaming and movie rental service Netflix  utilizes cloud computing to power its core operations. In fact, the bulk of Netflix’s infrastructure is cloud-based, and it is one of Amazon Web Services’ (AWS) largest customers. Netflix has developed an entire arsenal of tools that help it manage its massive cloud environment and more efficiently manage outages and technical issues.

Netflix refers to these tools as the Simian Army. The software includes colorful named items like Latency Monkey, Chaos Gorilla and Chaos Monkey. If you couldn’t by the name, Chaos Monkey is a scaled-down version of Chaos Gorilla. (Who says developers don’t have a sense of humor.)

Chaos Monkey is a service that runs on AWS and improves application resiliency by helping ensure an application can remain running if an instance unexpectedly shuts down – a universally helpful capability for any cloud-based application. Chaos Monkey works by randomly killing instances. If an application is well designed, the outage of a single node shouldn’t impact it. Developers can use the service to identify unnecessary dependencies and weed out architectural problems. Chaos Monkey was developed for AWS, but according to Netflix it is flexible enough to work with other cloud providers.

As promised in April, Netflix has made the code publicly available as open source. The company announced the Chaos Monkey’s open source launch in an official blog post. According to the post, developers that use the service can be confident the tool has already been field tested. The announcement explained,

 “Chaos Monkey has terminated over 65,000 instances running in our production and testing environments. Most of the time nobody notices, but we continue to find surprises caused by Chaos Monkey which allows us to isolate and resolve them so they don’t happen again.”

The code for Chaos Monkey is available on GitHub. In addition to Chaos Monkey, Janitor Monkey, a tool similar to Cloudability that tracks down unused resources, might be the next open source candidate.

Incidents like the recent Amazon outage and Azure’s Western European blackout show the importance of such solutions. In spite of Netflix’s preparation, the AWS failure still managed to take the service down. Netflix’s availability architecture did manage to reduce the impact of the damage.


A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.