Today's Massive AWS Outage That Took Down Your Favorite Sites Is Still Going On
Today's Massive AWS Outage That Took Down Your Favorite Sites Is Still Going On
That Massive AWS Outage Explained: Failures and Fixes Tripping Over Themselves

Today's Massive AWS Outage That Took Down Your Favorite Sites Is Still Going On
That Massive AWS Outage Explained: Failures and Fixes Tripping Over Themselves

your favorite sites
looks at list
nop
I get what you're saying, and agree, but there were many more, Ancestry.com and findagrave.com and many more were also down (while I'm in the middle of an ancestry fact finding trip). It really was massive.
Ancestry.com and findagrave.com are kinda the funniest examples that could be picked from the sites being affected today. Obviously there's the parallels of AWS being dead today, but I also can't imagine there would be a lot of updates to those sites that not being active on there for some amount of time would miss out on some timely update. I totally hate being in the grove when something out of my control impedes my workflow, don't get me wrong, and can totally see how the outages would be annoying.
Only site that got me was Riverside ☹️
I literally noticed zero difference. But it sounds bad. Have they tried shoving more AI in there to fix the problem?
My DNS is a Lightsail instance out west, no issue.
75%? Those are rookie numbers. Gotta get that up around 200, 250%, then we'll really get things started.
You're like one of those movie guys who gets days into a zombie apocalypse before realizing anything's wrong
I can hear the smug grins on homelabber's/self-hoster's faces from here.
Oh look, fediverse is still working.
You can share in the smug grin
double grins in self-hosted Fedi instances
Not like their systems never have downtime.
They'd type up a really angry reply, but they need to fix a their router config real quick
lol yeah I definitely have more downtime than AWS.
The main differences are:
The difference is I can do something about my downtime and fix it.
I had surgery today and couldn't pick up my meds at the pharmacy because my insurance uses AWS somewhere in the billing process. We had to pay out of pocket and pray we get reimbursed because they're expensive. This took 6 phone calls to find out and overall, sucked. I didn't think AWS going down could affect my damn insurance.
Jeez, that's messed up. I'm so sorry! I hope you're able to get reimbursed without much more trouble, and I hope your recovery goes well!
In any sane place, with sane laws, they would just have to approve everything until they fix their own shit.
As someone that works at another Amazon AWS dependent org, it also took out us. It was awful. Nothing I could do on my end. Why the fuck didn’t it get rolled back immediately? Why did it go to a second region? Fucking idiots on the big teams side.
I got paged 140 times between 12 and 4 am PDT. Then there was another one where I had to hand it off at 7am because I needed fucking sleep. And they handled it until 1pm. I love my team, but it’s so awful that this even was able to happen. All our our fuck ups take 5-30 mins to roll back or manually intervene. This took them 2+ hours, and it was painful. Then it HAPPENED AGAIN! Like what the fuck.
This is a good reason to start investing in multi region architecture at some point.
Not trying to be smug here or anything, but we updated a single config value, made a PR, and committed the change and we were switched over to a different region in a few minutes. Smooth sailing after that.
(This is still dependent to some degree on AWS in order to actually execute the failover, something we're mulling over how to solve)
Now, our work demands we invest in such things, we're even investing in multi-cloud (an actual nightmare). Not everyone can do this, and some systems are just not built to be able to, but if it's within reach it's probably worth it.
Last night from 12-4am, it was almost every region impacted so it didn't help that much.
But we do have failovers for customers that they need to activate to just start working in another region.
But our canaries and infrastructure alarms cannot do that since they are for alerts in the region.
Oof.
Worst part for me was when the rewards app at the tea shop wouldn't work.
I hated it when I was trying to break to avoid hitting that pedestrian at the cross walk, and the brake pedal input does a roundtrip to AWS before activating the wheel brakes. For user statistics, for my safety. Not at all for AI training, we swear.
Oh well, had no choice but to drive-by that old hag.
I keep seeing all this stuff all over Lemmy and FB about the AWS outage and being in the northeast I dont understand how it hasn't effected me in the slightest but I'll take the W I guess !
I'm in the northeast as well, I couldn't watch TV last night thanks to this. Count your blessings I guess.
Lol no kidding. All my friends complaining about the outage are in ri/ ma, I'm in VT. Not sure if I'm on the same servers as the rest of east coast? Don't even know how to check honestly.
Can't even do any of my work for college until the outage is over
Same. I’m sitting in my college’s library right now trying to work and the outage threw a wrench in all of my plans. I’m thankful I downloaded the files to my hard drive though so I can do most of the work on Pen and Paper
Yeah. I'm really glad that I already finished all my work that was due today. Otherwise I'd be screwed. I was trying to get a head start on some stuff due Wednesday, but I guess I can't until aws is back up
I don't even want to hear an argument for moving back on prem with how badly Broadcom/VMware ripped our eyes out this year. 350% increase over 2 years ago, and I still have to buy the hardware, secure it in a room, power it, buy redundant Internet and networking equipment, get a backup facility, buy and test a generator/UPS, and condition the damn air. Oh then every few years we have to switch out all the hardware when it stops getting vendor support.
At least everyone was all in the same boat today, and we all know what was broken.
Moving to Nutanix soon. Love their product. Proxmox looks good on paper too, just not mature enough in the enterprise to bet my paycheck on it.
Cloud infrastructure is expensive.
We're a year or so into our AWS migration, but will have a presence on prem for years to come. Broadcom has seen our last dollar, so for what remains on prem, we will be moving to a different hypervisor. I was kinda hoping that Proxmox could be the one, but it may not shake out that way.
Lucky me needs Proxmox only for self-hosting and loves it :)
Okay, who messed with the Network Switches?
It's always DNS™
AWS salespeople, meeting customers today.
Today should have been easy for them tbh. "See? That's why you should pay us more money to have active infra on our other regions to failover to!"
I hardly noticed. When I look at the shit companies that are affected I'm happy I'm not their customer.
"Sir. Sir? Sir. (sigh) have you tried turning the internet off and on again?"
I have ZERO sympathy for companies whose services are affected by this. Because seriously, fuck Amazon.
The laundry room monitor in my building went down lol... (The monitor being the thing that lets you check a website to see if any of the machines are free or when they're done, etc.)
It's almost like we shouldn't rely on just a few central sites. And that everything should be democratized and federated.
I2P
https://geti2p.net/en/
Basically, what if the entire internet was torrents, everyone was seeding / routing to everyone else, oh and also its more private/secure than Tor or VPNs?
Downside is it is quite slow... but if it caught on more widely, that could alleviate somewhat.
I think I’d really need to know how “somewhat” alleviated it is to have any interest, given the status quo is like, a second.