Recent AWS Outage Was Caused By a Typing Error

Posted by at 12:11 pm on March 3, 2017

A typographical error re-configuring Amazon Web Services led to an outage that took down millions of Internet sites for hours this week.

The Amazon Web Services Inc. (AWS) Simple Storage Service (S3) team was debugging a problem that caused S3 billing to run more slowly than expected. An S3 team member executed a command at 9:37 am PT intended to remove a small number of servers for one of the S3 subsystems used for billing, according to a statement Thursday by Amazon. The team member typed incorrectly, and removed a larger number of servers than intended. Outages in the cloud service continued until 1:54 pm PT, but full recovery took longer as some services had to go through a backlog of work, Amazon says.

Affected sites included Netflix, Airbnb, and Slack. Amazon apologized for the outage, and says it is making changes to its procedures to prevent recurrence. Among these: Changing the AWS Service Health Dashboard to reduce dependency on Amazon S3. Amazon had difficulty communicating the status of the outage to the public, because the status board depended on the problematic S3 service.

The outages cost $150 million to S&P 500 companies and $160 million to US financial services companies using the affected S3 infrastructure, according to Cyence, a firm that works with insurance companies to estimate cyber risk.

Leave a Reply

Sign Up For Our Newsletter

Sign up to receive breaking news
as well as receive other site updates

Enter your Email


Preview | Powered by FeedBlitz

Log in

Copyright © 2008 - 2020 · StreetCorner Media , LLC· All Rights Reserved ·