Saturday, March 21, 2026

On the AWS Outage – O’Reilly

All people notices when one thing massive fails—like AWS’s US-EAST-1 area. And fail it did. All types of companies and websites turned inaccessible, and all of us knew it was Amazon’s fault. Every week later, after I run right into a web site that’s down, I nonetheless say, “Have to be some hangover from the AWS outage. Some cache that didn’t get refreshed.” Amazon will get blamed—perhaps even rightly—even when it’s not their fault.

I’m not writing about fault, although, and I’m additionally not writing a technical evaluation of what occurred. There are good locations for that on-line, together with AWS’s personal abstract. What I’m writing about is a response to the outage that I’ve seen all too usually: “This proves we will’t belief AWS. We have to construct our personal infrastructure.”

Constructing your personal infrastructure is okay. However I’m additionally reminded of the wisest remark I heard after the 2012 US-EAST outage. I requested JD Lengthy about his response to the outage. He stated, “I’m actually glad it wasn’t my guys making an attempt to repair the issue.”1 JD wasn’t disparaging his group; he was saying that Amazon has a variety of experience in operating, sustaining, and troubleshooting actually massive techniques that may fail out of the blue in unpredictable methods—when simply the appropriate circumstances occur to tickle a bug that had been latent within the system for years. That experience is tough to search out and costly whenever you discover it. And regardless of how professional “your guys” are, all advanced techniques fail. After final month’s AWS failure, Microsoft’s Azure obligingly failed about 10 days later.

I’m probably not an Amazon fan or, extra particularly, an AWS fan. However outages like this could drive us to recollect what they do proper. AWS outages additionally warn us that we have to learn to “craft methods of undoing this focus and creating actual selection,” as Sign CEO Meredith Whittaker factors out. However Meredith understands how troublesome it is going to be to construct this infrastructure and that, for the current, there’s no viable various to AWS or one of many different hyperscalers.

Working and troubleshooting giant techniques is troublesome and requires very specialised abilities. If you happen to determine to construct your personal infrastructure, you have to these abilities. And you might find yourself wishing that it isn’t your guys making an attempt to repair the issue.


Footnote

  1. In 2012, I occurred to be flying out of DC simply because the storm that took US-EAST down was rolling in. My flight made it out, however it was dramatic.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles