Today's Massive AWS Outage That Took Down Your Favorite Sites Is Still Going On

It's almost like we shouldn't rely on just a few central sites. And that everything should be democratized and federated.

sp3ctr4l @lemmy.dbzer0.com 3w ago
I2P
https://geti2p.net/en/
Basically, what if the entire internet was torrents, everyone was seeding / routing to everyone else, oh and also its more private/secure than Tor or VPNs?
Downside is it is quite slow... but if it caught on more widely, that could alleviate somewhat.
- errer @lemmy.world 3w ago
  I think I’d really need to know how “somewhat” alleviated it is to have any interest, given the status quo is like, a second.

expatriado @lemmy.world 3w ago

your favorite sites

looks at list

nop

mycodesucks @lemmy.world 3w ago
mushroommunk @lemmy.today 3w ago
I get what you're saying, and agree, but there were many more, Ancestry.com and findagrave.com and many more were also down (while I'm in the middle of an ancestry fact finding trip). It really was massive.
- bamboo @lemmy.blahaj.zone 3w ago
  Ancestry.com and findagrave.com are kinda the funniest examples that could be picked from the sites being affected today. Obviously there's the parallels of AWS being dead today, but I also can't imagine there would be a lot of updates to those sites that not being active on there for some amount of time would miss out on some timely update. I totally hate being in the grove when something out of my control impedes my workflow, don't get me wrong, and can totally see how the outages would be annoying.
artyom @piefed.social 3w ago
Only site that got me was Riverside ☹️

frustrated_phagocytosis @fedia.io 3w ago

I literally noticed zero difference. But it sounds bad. Have they tried shoving more AI in there to fix the problem?

dissentiate @lemmy.dbzer0.com 3w ago
shalafi @lemmy.world 3w ago
My DNS is a Lightsail instance out west, no issue.
perfectly_boiled_pizza @lemmy.world 3w ago
- frustrated_phagocytosis @fedia.io 3w ago
  75%? Those are rookie numbers. Gotta get that up around 200, 250%, then we'll really get things started.
Damage @feddit.it 3w ago
You're like one of those movie guys who gets days into a zombie apocalypse before realizing anything's wrong

renegadespork @lemmy.jelliefrontier.net 3w ago

I can hear the smug grins on homelabber's/self-hoster's faces from here.

towerful @programming.dev 3w ago
Oh look, fediverse is still working.
You can share in the smug grin
- renegadespork @lemmy.jelliefrontier.net 3w ago
  double grins in self-hosted Fedi instances
Mangoholic @lemmy.ml 3w ago
Not like their systems never have downtime.
- SkaveRat @discuss.tchncs.de 3w ago
  They'd type up a really angry reply, but they need to fix a their router config real quick
- renegadespork @lemmy.jelliefrontier.net 3w ago
  lol yeah I definitely have more downtime than AWS.
  The main differences are:
  I usually control when that downtime is.
  I can inform literally every single person that uses it exactly what is down, why, and approximately for how long.
- MangoPenguin @lemmy.blahaj.zone 3w ago
  The difference is I can do something about my downtime and fix it.

weariedfae @sh.itjust.works 3w ago

I had surgery today and couldn't pick up my meds at the pharmacy because my insurance uses AWS somewhere in the billing process. We had to pay out of pocket and pray we get reimbursed because they're expensive. This took 6 phone calls to find out and overall, sucked. I didn't think AWS going down could affect my damn insurance.

zip @lemmy.blahaj.zone 3w ago
Jeez, that's messed up. I'm so sorry! I hope you're able to get reimbursed without much more trouble, and I hope your recovery goes well!
ayyy @sh.itjust.works 3w ago
In any sane place, with sane laws, they would just have to approve everything until they fix their own shit.

HootinNHollerin @lemmy.dbzer0.com 3w ago

Flames5123 @sh.itjust.works 3w ago

As someone that works at another Amazon AWS dependent org, it also took out us. It was awful. Nothing I could do on my end. Why the fuck didn’t it get rolled back immediately? Why did it go to a second region? Fucking idiots on the big teams side.

I got paged 140 times between 12 and 4 am PDT. Then there was another one where I had to hand it off at 7am because I needed fucking sleep. And they handled it until 1pm. I love my team, but it’s so awful that this even was able to happen. All our our fuck ups take 5-30 mins to roll back or manually intervene. This took them 2+ hours, and it was painful. Then it HAPPENED AGAIN! Like what the fuck.

douglasg14b @lemmy.world 3w ago
This is a good reason to start investing in multi region architecture at some point.
Not trying to be smug here or anything, but we updated a single config value, made a PR, and committed the change and we were switched over to a different region in a few minutes. Smooth sailing after that.
(This is still dependent to some degree on AWS in order to actually execute the failover, something we're mulling over how to solve)
Now, our work demands we invest in such things, we're even investing in multi-cloud (an actual nightmare). Not everyone can do this, and some systems are just not built to be able to, but if it's within reach it's probably worth it.
- Flames5123 @sh.itjust.works 3w ago
  Last night from 12-4am, it was almost every region impacted so it didn't help that much.
  But we do have failovers for customers that they need to activate to just start working in another region.
  But our canaries and infrastructure alarms cannot do that since they are for alerts in the region.
return2ozma @lemmy.world 3w ago
Oof.

Today @lemmy.world 3w ago

Worst part for me was when the rewards app at the tea shop wouldn't work.

Victor @lemmy.world 3w ago
I hated it when I was trying to break to avoid hitting that pedestrian at the cross walk, and the brake pedal input does a roundtrip to AWS before activating the wheel brakes. For user statistics, for my safety. Not at all for AI training, we swear.
Oh well, had no choice but to drive-by that old hag.
Sentient Loom @sh.itjust.works 3w ago
What kind of tea did you get?
- Today @lemmy.world 3w ago
  Iced combo of lemonade, plain, and blueberry. Crazy refreshing.

innermachine @lemmy.world 3w ago

I keep seeing all this stuff all over Lemmy and FB about the AWS outage and being in the northeast I dont understand how it hasn't effected me in the slightest but I'll take the W I guess !

Reygle @lemmy.world 3w ago
boaratio @lemmy.world 3w ago
I'm in the northeast as well, I couldn't watch TV last night thanks to this. Count your blessings I guess.
- innermachine @lemmy.world 3w ago
  Lol no kidding. All my friends complaining about the outage are in ri/ ma, I'm in VT. Not sure if I'm on the same servers as the rest of east coast? Don't even know how to check honestly.

yoshisaur @lemmy.blahaj.zone 3w ago

Can't even do any of my work for college until the outage is over

MyNameIsAtticus @lemmy.world 3w ago
Same. I’m sitting in my college’s library right now trying to work and the outage threw a wrench in all of my plans. I’m thankful I downloaded the files to my hard drive though so I can do most of the work on Pen and Paper
- yoshisaur @lemmy.blahaj.zone 3w ago
  Yeah. I'm really glad that I already finished all my work that was due today. Otherwise I'd be screwed. I was trying to get a head start on some stuff due Wednesday, but I guess I can't until aws is back up

MSids @lemmy.world 3w ago

I don't even want to hear an argument for moving back on prem with how badly Broadcom/VMware ripped our eyes out this year. 350% increase over 2 years ago, and I still have to buy the hardware, secure it in a room, power it, buy redundant Internet and networking equipment, get a backup facility, buy and test a generator/UPS, and condition the damn air. Oh then every few years we have to switch out all the hardware when it stops getting vendor support.

At least everyone was all in the same boat today, and we all know what was broken.

Brkdncr @lemmy.world 3w ago
Moving to Nutanix soon. Love their product. Proxmox looks good on paper too, just not mature enough in the enterprise to bet my paycheck on it.
Cloud infrastructure is expensive.
- MSids @lemmy.world 3w ago
  We're a year or so into our AWS migration, but will have a presence on prem for years to come. Broadcom has seen our last dollar, so for what remains on prem, we will be moving to a different hypervisor. I was kinda hoping that Proxmox could be the one, but it may not shake out that way.
- zergtoshi @lemmy.world 3w ago
  Lucky me needs Proxmox only for self-hosting and loves it :)

MadMadBunny @lemmy.ca 3w ago

Okay, who messed with the Network Switches?

lnxtx (xe/xem/xyr) @sopuli.xyz 3w ago
It's always DNS™

fubarx @lemmy.world 3w ago

AWS salespeople, meeting customers today.

PoopingCough @lemmy.world 3w ago
Today should have been easy for them tbh. "See? That's why you should pay us more money to have active infra on our other regions to failover to!"

MehBlah @lemmy.world 3w ago

I hardly noticed. When I look at the shit companies that are affected I'm happy I'm not their customer.

SparkyBauer44 @lemmy.world 3w ago

"Sir. Sir? Sir. (sigh) have you tried turning the internet off and on again?"

medem @lemmy.wtf 3w ago

I have ZERO sympathy for companies whose services are affected by this. Because seriously, fuck Amazon.

other_cat @piefed.zip 3w ago

The laundry room monitor in my building went down lol... (The monitor being the thing that lets you check a website to see if any of the machines are free or when they're done, etc.)

Today's Massive AWS Outage That Took Down Your Favorite Sites Is Still Going On

Comments

64 Comments