AWS is having a bad day
AWS is having a bad day
AWS is having a bad day
I hate how Signal went down because of this... Wish it wasn't so centralised.
My friend messaged me on Signal asking if Instructure (runs on AWS) was down. I got the message. That being said, it's scary that Signal's backbone depends on AWS
Why is this scary? That's what e2ee is for, so that no one besides your recipient can view the contents of a message. It does not matter which server is used. If anything for a service like Signal, you want a server with high availability like AWS, Azure, Google Cloud or Cloudflare.
I hope they consider other ways of doing things after this incident.
I have been able to use Signal like any other day. I haven’t seen any disruption in sending or receiving.
For me it was not possible to send or receive messages for a couple of hours.
Started moving to Element/Matrix this weekend when I attended a protest and wanted to have some kind of communication, but also wanted to leave my primary phone at home. I was using a de-googled android fork and an e-sim, but being a data-only e-sim, I couldn't use Signal due to the phone number requirement.
Annoying to have try to get contacts to get another app, but at least it's decentralized and comes with the option of being self-hosted once I'm ready to tackle that.
Hey, note that you can use mautrix-signal to access your Signal account within Element on this phone.
@MrMcGasion @Sunny Come to the dark side (xmpp, and jmp.chat) and get decentralized messaging and SMS support with that data-only sim!
Signal's love affair with big tech is deeply disturbing.
This gif is audible
Why do we place so much reliance on one mega company? This level of importance. It should be seized by the government.
It should be seized by the government people and mercilessly decentralized.
Agree
Agreed same for Facebook then call it Readabook
AWS aggressively pursues high priced and years-long spending commitments with large customers, and they incentivize it with huge discounts for doing so.
And when AWS does this they intentionally incentivize these large customers to migrate existing workloads away from other cloud service providers as well, going so far as to offer assistance in doing so.
Why do we place so much reliance on one mega company? This level of importance.
Because it's cheaper and (in broad terms) more reliable than everybody having a data centre.
It should be seized by the government.
Oh yeah, what could possibly go wrong if the US government owned Amazon!
Leta give it to Trump and Elon Musk, they will take good care of it... Lol.
Trump will isolate aws to America only, claiming other countries are ripping him off.
Aws becomes American Web Services.
Best alternatives is making Amazon something owned by the people and not any corporation/government but who knows if that would ever happen
Do you really want someone like the magahats having control over something like that?
God no, not the government!
They couldn't organise a paper bag party
Large corporations and oligarchs are better? I’ll take the government. At least we can vote on them.
That’s largely because one half of the elected officials are dedicated to defunding and deconstructing government organizations, so they can then point at those same organizations and go “look, the government doesn’t work! We should stop funding it!” The government is actually great at organizing a lot of things. But they’re all so engrained in society that you don’t even think about them as being organized by the government. Systems that just work, reliably, all the time.
The government’s job is stability and reliability, not being as efficient as possible. Where a corporation may only have one person doing a job, the government will have four or five. Those people aren’t bloat; They’re on the payroll because the government is expected to keep functioning during emergencies. People would lose their minds if the streets department (responsible for clearing downed trees out of public roads) shut down after a bad storm rolled through, just because a few government employees had a tree branch fall on their house. What if firefighters stopped working because a local wildfire burnt a few firefighters’ houses? What if the city water department shut down because three or four city employees’ water supply was affected? What if the health department shut down during a pandemic?
The people who work in government also live in the same areas they serve. Which means that they are affected by the same emergencies. The government needs enough redundancy to be able to continue functioning, even after those employees are affected by the same emergencies as the general public. If some emergency affects 75% of the public in a given area, then 75% of the local government employees are likely going to be affected. So if the government doesn’t have enough redundancy to be able to redistribute the work, people will see their government shutting down in the wake of the emergency. And to make matters even worse, during (and in the wake of) those emergencies, people look to the government for help. Which means that’s the most critical time for the government to continue functioning.
I say all of this because the same is true for the infrastructure that runs critical government systems. The government expands and implements things slowly by design, because everything critical has to go through multiple levels of design approval, and have multiple redundancies built in. If the government has updated a critical system, I can guarantee that new system has been in the works for the past two years at least. That process is designed to ensure everything works as intended. I wouldn’t want my city traffic lights managed by a private company, because they’d try to cut costs and avoid building in redundant systems.
When was the last time you heard about a large government computer outage? (I don’t count the VA because that’s broken on purpose.)
The one that hits us in self hosted is https://auth.docker.io/
You guys don't selfhost a registry?
How does using Podman help when the registry is down?
Yeah I ran into this as well. Wondered why it needs a call to auth for public container images in the first place.
mirror.gcr.io is google's public mirror of dockerhub.
Oh god, that just 404s for me
A bad day for Jeff Bezos is a good day for all of us
Who wants to bet Amazon gave AI full access to their prod config and it screwed it up.
Or some engineer decide today would be a great day to play with BGP
That's a good theory haha
Yeah, was reading about it here too
Ring doorbells, Alexa, ahh... the joys of selfhosting.
Is there no way to check the doorbell video locally?
An Amazon employee misconfigures something and now your doorbell doesn't work
Obligatory
Oh wow their front page doesn't mention at all that their products run locally and don't require subscriptions.
I don't have one (because of that point), so I don't know...
Presumably the app and doorbell are hardcoded to go to an AWS URL (so it's "easier" for consumers), but in theory the data's all on your wifi.
I would be very surprised if there was
And I'm having a very good day now :3
Are you an IT contractor or something?
In some way, I am, but mainly I feel my need to only use selfhosteable stuff, and selfhost 90% of those services, confirmed.
according to that page the issue stemmed from an underlying system responsible for health checks in load balancing servers.
how the hell do you fuck up a health check config that bad? that's like messing up smartd.conf and taking your system offline somehow
Well, you see, the mistake you are making is believing a single thing the stupid AWS status board says. It is always fucking lying, sometimes in new and creative ways.
I mean if your OS was "smart" as not to send IO to devices that indicate critical failure (e.g. by marking them read-only in the array?), and then thinks all devices have failed critically, wouldn't this happen in that kind of system as well..
If your health check is broken, then you might not notice that a service is down and you'll fail to deploy a replacement. Or the opposite, and you end up constantly replacing it, creating a "flapping" service.
That explains why my Matrix <-> Signal bridge was complaining about being disconnected.
Does this break all the various security features that are the reason to use Signal in the first place?
The Matrix server is a normal Signal client that can encrypt/decrypt messages from your account.
Assuming you trust your server, no. I would not use it on a third party Matrix server.
that is an understatement 😂
This kind of shit will only increase as more of these companies believe they can vibe-code their way out of paying software devs what they are worth.
For some reason I hear Gilfoyle pontificating about what he does
It takes 5-10 reloads to get an page from IMDB lol
themoviedb.org unfazed
OMG, IMDB too
They are an Amazon company, so it makes sense they'd be using AWS.
A fun game to play right now is to try to hit any of your regularly visited sites and see which ones are down. 😂
Good
It makes me wish I was selfhosting more services, music & chat in particular. It wasn't important enough to set up yet
Can recommend Jellyfin, I use it for both music and tv/movies. Not sure on the chat bit, there are so many option it could get a long list
important enough to set up yet
FWIW for music LMS is 1 container command including it in the location in real-only of you music directory with all your files, that's it. So... if you are used to self-hosting (e.g. already have a reverse proxy and container setup) that's maybe 1h top.
I'm not a was costumer. What's their usual SLA?
It's wild that these cloud providers were seen as a one-way stop to ensure reliability, only to make them a universal single point of failure.
But if everyone else is down too, you don't look so bad 🧠
No one ever got fired for buying IBM.
One of our client support people told an angry client to open a Jira with urgent priority and we'd get right on it.
... the client support person knew full well that Jira was down too : D
At least, I think they knew. Either way, not shit we could do about it for that particular region until AWS fixed things.
It's mostly a skill issue for services that go down when USE-1 has issues in AWS - if you actually know your shit, then you don't get these kinds of issues.
Case in point: Netflix runs on AWS and experienced no issues during this thing.
And yes, it's scary that so many high-profile companies are this bad at the thing they spend all day doing
Yeah, if you're a major business and don't have geographic redundancy for your service, you need to rework your BCDR plan.
But Netflix did encounter issues. For example the account cancel page did not work.
What's the general plan of action when a company's base region shits the bed?
Keep dormant mirrored resources in other regions?
I presumed the draw of us-east-1 was its lower cost, so if any solutions involve spending slightly more money, I'm not surprised high profile companies put all their eggs in one basket.
I love the "git gud" response. Sacred cashcows?
Netflix did encounter issues. I couldn't access it yesterday at noon EST. And I wasn't alone, judging by Downdetector.ca
It is still a logical argument, especially for smaller shops. I mean, you can (as self-hosters know) set up automatic backups, failover systems, and all that, but it takes significant time & resources. Redundant internet connectivity? Redundant power delivery? Spare capacity to handle a 10x demand spike? Those are big expenses for small, even mid-sized business. No one really cares if your dentist's office is offline for a day, even if they have to cancel appointments because they can't process payments or records.
Meanwhile, theoretically, reliability is such a core function of cloud providers that they should pay for experts' experts and platinum standard infrastructure. It makes any problem they do have newsworthy.
I mean,it seems silly for orgs as big and internet-centric as Fortnite, Zoom, or forturne-500 bank to outsource their internet, and maybe this will be a lesson for them.
It's also silly for the orgs to not have geographic redundancy.
They zigged when we all zagged.
Decentralisation has always been the answer.
If it's not a region failure, it's someone pushing untested slop into the devops pipeline and vaping a network config. So very fired.
Apparently it was DNS. It’s always DNS…
yeah, so many things now use AWS in some way. So when AWS has a cold, the internet shivers
Well companies use not for relibibut to outsource responsibility. Even a medium sized company treated Windows like a subscription for many many years. People have been emailing files to themself since the start of email.
For companies moving everything to msa or aws just was the next step and didn't change day to operations
People also tend to forget all the compliance issues that can come around hosting content, and using someone with expertise in that can reduce a very large burden. It's not something that would hit every industry, but it does hit many.
sidekicks in '09. had so many users here affected.
never again.
A single point of failure you pay them for.