One of the biggest factors that hold back some companies from migrating to the cloud is the whole topic of reliability. The average uptime of cloud deployments has been observed to be higher than that of traditional ones, but, that doesn’t mean there iesn’t any margin for error.
Last night Amazon EC2 and the Relational Database Service were disrupted for about 40 minutes in the east coast reliability zone, resulting in the temporary shutdown of dozens of services including Netflix, Foursquare, Reddit, Quora and others.
“Amazon’s US-EAST-1 region experienced connectivity issues according to its status page and big name startups including Rapportive, Reddit, Foursquare, Hootsuite and Heroku were temporarily unavailable.”
This is not the first time Amazon AWS went down in recent months. Back in April Amazon Web Services was down for maintenance for two full days. This naturally raised some concerns about how Amazon’s assures uptime for its cloud users, but surpassingly, this latest incident will likely help restore some of Amazon’s lost reputation in this area.
CRN reports that a human error caused the outage – instead of scaling the system to handle more demand and routing traffic to another router, it was routed to a lower capacity network that couldn’t handle the workload. The very same reason stands behind the previous 48-hour shut down, meaning that Amazon probably learned its lesson and implemented measures to should such an error repeat itself.
Reliability in the cloud extends beyond uptime alone to security too, and Amazon suffered a reputation blow from this aspect as well. A source claimed that the hackers that caused the PSN network outage launched their attack from a rented EC2 instance.
Uptime and security are concerns that every cloud service providers has to face, not just Amazon. Yahoo is one of the, especially now after its email service went down for two and a half hours last week.