AWS is having a bad day

It's wild that these cloud providers were seen as a one-way stop to ensure reliability, only to make them a universal single point of failure.

Nighed @feddit.uk 3d ago
But if everyone else is down too, you don't look so bad 🧠
- queerlilhayseed @piefed.blahaj.zone 3d ago
  No one ever got fired for buying IBM.
- clif @lemmy.world 3d ago
  
  One of our client support people told an angry client to open a Jira with urgent priority and we'd get right on it.
  ... the client support person knew full well that Jira was down too : D
  At least, I think they knew. Either way, not shit we could do about it for that particular region until AWS fixed things.
GissaMittJobb @lemmy.ml 3d ago
It's mostly a skill issue for services that go down when USE-1 has issues in AWS - if you actually know your shit, then you don't get these kinds of issues.
Case in point: Netflix runs on AWS and experienced no issues during this thing.
And yes, it's scary that so many high-profile companies are this bad at the thing they spend all day doing
- village604 @adultswim.fan 3d ago
  Yeah, if you're a major business and don't have geographic redundancy for your service, you need to rework your BCDR plan.
- B0rax @feddit.org 2d ago
  
  Case in point: Netflix runs on AWS and experienced no issues during this thing.
  But Netflix did encounter issues. For example the account cancel page did not work.
- tourist @lemmy.world 3d ago
  What's the general plan of action when a company's base region shits the bed?
  Keep dormant mirrored resources in other regions?
  I presumed the draw of us-east-1 was its lower cost, so if any solutions involve spending slightly more money, I'm not surprised high profile companies put all their eggs in one basket.
- corsicanguppy @lemmy.ca 3d ago
  I love the "git gud" response. Sacred cashcows?
- Danquebec @sh.itjust.works 2d ago
  Netflix did encounter issues. I couldn't access it yesterday at noon EST. And I wasn't alone, judging by Downdetector.ca
tburkhol @lemmy.world 3d ago
It is still a logical argument, especially for smaller shops. I mean, you can (as self-hosters know) set up automatic backups, failover systems, and all that, but it takes significant time & resources. Redundant internet connectivity? Redundant power delivery? Spare capacity to handle a 10x demand spike? Those are big expenses for small, even mid-sized business. No one really cares if your dentist's office is offline for a day, even if they have to cancel appointments because they can't process payments or records.
Meanwhile, theoretically, reliability is such a core function of cloud providers that they should pay for experts' experts and platinum standard infrastructure. It makes any problem they do have newsworthy.
I mean,it seems silly for orgs as big and internet-centric as Fortnite, Zoom, or forturne-500 bank to outsource their internet, and maybe this will be a lesson for them.
- village604 @adultswim.fan 3d ago
  It's also silly for the orgs to not have geographic redundancy.
ms.lane @lemmy.world 3d ago
They zigged when we all zagged.
Decentralisation has always been the answer.
corsicanguppy @lemmy.ca 3d ago
universal single point of failure.
If it's not a region failure, it's someone pushing untested slop into the devops pipeline and vaping a network config. So very fired.
- 4am @lemmy.zip 2d ago
  Apparently it was DNS. It’s always DNS…
Matt The Horwood @lemmy.horwood.cloud 3d ago
yeah, so many things now use AWS in some way. So when AWS has a cold, the internet shivers
joel_feila @lemmy.world 2d ago
Well companies use not for relibibut to outsource responsibility. Even a medium sized company treated Windows like a subscription for many many years. People have been emailing files to themself since the start of email.
For companies moving everything to msa or aws just was the next step and didn't change day to operations
- NotMyOldRedditName @lemmy.world 2d ago
  People also tend to forget all the compliance issues that can come around hosting content, and using someone with expertise in that can reduce a very large burden. It's not something that would hit every industry, but it does hit many.
relativestranger @feddit.nl 3d ago
sidekicks in '09. had so many users here affected.
never again.
wirebeads @lemmy.ca 3d ago
A single point of failure you pay them for.

Sips' @slrpnk.net 3d ago

I hate how Signal went down because of this... Wish it wasn't so centralised.

/home/pineapplelover @lemmy.dbzer0.com 3d ago
My friend messaged me on Signal asking if Instructure (runs on AWS) was down. I got the message. That being said, it's scary that Signal's backbone depends on AWS
- retro @infosec.pub 2d ago
  Why is this scary? That's what e2ee is for, so that no one besides your recipient can view the contents of a message. It does not matter which server is used. If anything for a service like Signal, you want a server with high availability like AWS, Azure, Google Cloud or Cloudflare.
- jali67 @lemmy.zip 2d ago
  I hope they consider other ways of doing things after this incident.
howlingecko @sh.itjust.works 2d ago
I have been able to use Signal like any other day. I haven’t seen any disruption in sending or receiving.
- Sips' @slrpnk.net 2d ago
  For me it was not possible to send or receive messages for a couple of hours.
MrMcGasion @lemmy.world 2d ago
Started moving to Element/Matrix this weekend when I attended a protest and wanted to have some kind of communication, but also wanted to leave my primary phone at home. I was using a de-googled android fork and an e-sim, but being a data-only e-sim, I couldn't use Signal due to the phone number requirement.
Annoying to have try to get contacts to get another app, but at least it's decentralized and comes with the option of being self-hosted once I'm ready to tackle that.
- pedroapero @lemmy.ml 2d ago
  Hey, note that you can use mautrix-signal to access your Signal account within Element on this phone.
- Andres @social.ridetrans.it 2d ago
  @MrMcGasion @Sunny Come to the dark side (xmpp, and jmp.chat) and get decentralized messaging and SMS support with that data-only sim!
herseycokguzelolacak @lemmy.ml 2d ago
Signal's love affair with big tech is deeply disturbing.

Trying2KnowMyself @lemmy.blahaj.zone 3d ago

chellomere @lemmy.world 3d ago
This gif is audible

jali67 @lemmy.zip 2d ago

Why do we place so much reliance on one mega company? This level of importance. It should be seized by the government.

PM_ME_VINTAGE_30S [he/him] @lemmy.sdf.org 2d ago
It should be seized by the ~~government~~ people and mercilessly decentralized.
- jali67 @lemmy.zip 2d ago
  Agree
- atmorous @lemmy.world 1d ago
  Agreed same for Facebook then call it Readabook
Noxy @pawb.social 2d ago
AWS aggressively pursues high priced and years-long spending commitments with large customers, and they incentivize it with huge discounts for doing so.
And when AWS does this they intentionally incentivize these large customers to migrate existing workloads away from other cloud service providers as well, going so far as to offer assistance in doing so.
- jali67 @lemmy.zip 2d ago
  And when they hit their end inevitable enshittification, what then?
Alaknár @sopuli.xyz 2d ago
Why do we place so much reliance on one mega company? This level of importance.
Because it's cheaper and (in broad terms) more reliable than everybody having a data centre.
It should be seized by the government.
Oh yeah, what could possibly go wrong if the US government owned Amazon!
- Andres @social.ridetrans.it 2d ago
  @Alaknar @jali67 It is absolutely not cheaper. Monopolists have a tendency to raise prices once they corner the market. I took over maintenance of a journalism site and cut hosting costs roughly in half while increasing performance by switching from AWS to DigitalOcean.
- 1984 @lemmy.today 2d ago
  
  Leta give it to Trump and Elon Musk, they will take good care of it... Lol.
  Trump will isolate aws to America only, claiming other countries are ripping him off.
  Aws becomes American Web Services.
- atmorous @lemmy.world 1d ago
  Best alternatives is making Amazon something owned by the people and not any corporation/government but who knows if that would ever happen
Bluewing @lemmy.world 2d ago
Do you really want someone like the magahats having control over something like that?
Matt The Horwood @lemmy.horwood.cloud 2d ago
God no, not the government!
They couldn't organise a paper bag party
- jali67 @lemmy.zip 2d ago
  
  Large corporations and oligarchs are better? I’ll take the government. At least we can vote on them.
- mic_check_one_two @lemmy.dbzer0.com 2d ago
  That’s largely because one half of the elected officials are dedicated to defunding and deconstructing government organizations, so they can then point at those same organizations and go “look, the government doesn’t work! We should stop funding it!” The government is actually great at organizing a lot of things. But they’re all so engrained in society that you don’t even think about them as being organized by the government. Systems that just work, reliably, all the time.
  The government’s job is stability and reliability, not being as efficient as possible. Where a corporation may only have one person doing a job, the government will have four or five. Those people aren’t bloat; They’re on the payroll because the government is expected to keep functioning during emergencies. People would lose their minds if the streets department (responsible for clearing downed trees out of public roads) shut down after a bad storm rolled through, just because a few government employees had a tree branch fall on their house. What if firefighters stopped working because a local wildfire burnt a few firefighters’ houses? What if the city water department shut down because three or four city employees’ water supply was affected? What if the health department shut down during a pandemic?
  The people who work in government also live in the same areas they serve. Which means that they are affected by the same emergencies. The government needs enough redundancy to be able to continue functioning, even after those employees are affected by the same emergencies as the general public. If some emergency affects 75% of the public in a given area, then 75% of the local government employees are likely going to be affected. So if the government doesn’t have enough redundancy to be able to redistribute the work, people will see their government shutting down in the wake of the emergency. And to make matters even worse, during (and in the wake of) those emergencies, people look to the government for help. Which means that’s the most critical time for the government to continue functioning.
  I say all of this because the same is true for the infrastructure that runs critical government systems. The government expands and implements things slowly by design, because everything critical has to go through multiple levels of design approval, and have multiple redundancies built in. If the government has updated a critical system, I can guarantee that new system has been in the works for the past two years at least. That process is designed to ensure everything works as intended. I wouldn’t want my city traffic lights managed by a private company, because they’d try to cut costs and avoid building in redundant systems.
- ayyy @sh.itjust.works 2d ago
  When was the last time you heard about a large government computer outage? (I don’t count the VA because that’s broken on purpose.)

Tolookah @discuss.tchncs.de 3d ago

The one that hits us in self hosted is https://auth.docker.io/

HelloRoot @lemy.lol 3d ago
You guys don't selfhost a registry?
- magguzu @lemmy.ml 3d ago
  I know this is selfhosted so most people here are hobbyists, but it's a ton of work to selfhost in enterprise setting. I'd wager 90%+ of people using image registries are using Docker Hub, GHCR, or AWS ECR.
- SecureTaco @lemmy.asc6.org 3d ago
  I hadn’t actually considered that before. What’s your preferred way to do that?
spacelord @sh.itjust.works 3d ago
Podman😉
- tofu @lemmy.nocturnal.garden 2d ago
  How does using Podman help when the registry is down?
krimson @lemmy.world 3d ago
Yeah I ran into this as well. Wondered why it needs a call to auth for public container images in the first place.
moonpiedumplings @programming.dev 2d ago

mirror.gcr.io is google's public mirror of dockerhub.
https://moonpiedumplings.github.io/blog/docker-registry/
Matt The Horwood @lemmy.horwood.cloud 3d ago
Oh god, that just 404s for me
- Trying2KnowMyself @lemmy.blahaj.zone 3d ago
  https://auth.docker.io/token

RecipeForHate1 @lemmy.ml 3d ago

A bad day for Jeff Bezos is a good day for all of us

-RJ- @lemmy.world 3d ago

Who wants to bet Amazon gave AI full access to their prod config and it screwed it up.

Possibly linux @lemmy.zip 3d ago
Or some engineer decide today would be a great day to play with BGP
dan1101 @lemmy.world 3d ago
That's a good theory haha

SayCyberOnceMore @feddit.uk 3d ago

Yeah, was reading about it here too

https://www.techradar.com/news/live/amazon-web-services-alexa-ring-snapchat-fortnite-down-october-2025

Ring doorbells, Alexa, ahh... the joys of selfhosting.

Otter @lemmy.ca 3d ago
Is there no way to check the doorbell video locally?
An Amazon employee misconfigures something and now your doorbell doesn't work
- MinFapper @startrek.website 3d ago
  Obligatory
  https://reolink.com/
  Oh wow their front page doesn't mention at all that their products run locally and don't require subscriptions.
- SayCyberOnceMore @feddit.uk 3d ago
  I don't have one (because of that point), so I don't know...
  Presumably the app and doorbell are hardcoded to go to an AWS URL (so it's "easier" for consumers), but in theory the data's all on your wifi.
- Damage @feddit.it 3d ago
  I would be very surprised if there was

Lucy :3 @feddit.org 3d ago

And I'm having a very good day now :3

sugar_in_your_tea @sh.itjust.works 3d ago
Are you an IT contractor or something?
- Lucy :3 @feddit.org 3d ago
  In some way, I am, but mainly I feel my need to only use selfhosteable stuff, and selfhost 90% of those services, confirmed.

AllHailTheSheep @sh.itjust.works 2d ago

according to that page the issue stemmed from an underlying system responsible for health checks in load balancing servers.

how the hell do you fuck up a health check config that bad? that's like messing up smartd.conf and taking your system offline somehow

ayyy @sh.itjust.works 2d ago
Well, you see, the mistake you are making is believing a single thing the stupid AWS status board says. It is always fucking lying, sometimes in new and creative ways.
flux @lemmy.ml 2d ago
I mean if your OS was "smart" as not to send IO to devices that indicate critical failure (e.g. by marking them read-only in the array?), and then thinks all devices have failed critically, wouldn't this happen in that kind of system as well..
tatterdemalion @programming.dev 2d ago
If your health check is broken, then you might not notice that a service is down and you'll fail to deploy a replacement. Or the opposite, and you end up constantly replacing it, creating a "flapping" service.

Domi @lemmy.secnd.me 3d ago

That explains why my Matrix <-> Signal bridge was complaining about being disconnected.

ayyy @sh.itjust.works 2d ago
Does this break all the various security features that are the reason to use Signal in the first place?
- Domi @lemmy.secnd.me 2d ago
  The Matrix server is a normal Signal client that can encrypt/decrypt messages from your account.
  Assuming you trust your server, no. I would not use it on a third party Matrix server.
Lumisal @lemmy.world 2d ago
You can connect Matrix with Signal?
- Domi @lemmy.secnd.me 2d ago
  Sure, I got all my Signal/Telegram chats synced to my Matrix server.
  https://matrix.org/ecosystem/bridges/

herseycokguzelolacak @lemmy.ml 2d ago

that is an understatement 😂

regedit @lemmy.zip 2d ago

This kind of shit will only increase as more of these companies believe they can vibe-code their way out of paying software devs what they are worth.

Tuxxer @lemmy.world 3d ago

For some reason I hear Gilfoyle pontificating about what he does

Wildmimic @anarchist.nexus 3d ago

It takes 5-10 reloads to get an page from IMDB lol

Damage @feddit.it 3d ago
themoviedb.org unfazed
Matt The Horwood @lemmy.horwood.cloud 3d ago
OMG, IMDB too
- Kernal64 @sh.itjust.works 3d ago
  They are an Amazon company, so it makes sense they'd be using AWS.
  A fun game to play right now is to try to hit any of your regularly visited sites and see which ones are down. 😂

aichan @piefed.blahaj.zone 3d ago

Good

Otter @lemmy.ca 3d ago

It makes me wish I was selfhosting more services, music & chat in particular. It wasn't important enough to set up yet

Matt The Horwood @lemmy.horwood.cloud 3d ago
Can recommend Jellyfin, I use it for both music and tv/movies. Not sure on the chat bit, there are so many option it could get a long list
- Otter @lemmy.ca 3d ago
  I have Jellyfin, but I haven't tried it with music. How does it compare to Navidrome?
  For chat, I was thinking something super simple for the weird situations like this. Alternatively, Briar if you're near the person you want to contact
utopiah @lemmy.world 2d ago
important enough to set up yet
FWIW for music LMS is 1 container command including it in the location in real-only of you music directory with all your files, that's it. So... if you are used to self-hosting (e.g. already have a reverse proxy and container setup) that's maybe 1h top.

notarobot @lemmy.zip 2d ago

I'm not a was costumer. What's their usual SLA?

AWS is having a bad day

Comments

150 Comments