Skip Navigation

sh.just.works down?

Going to post this here. Assuming their server doesn't have to work to post this to the federated section on this server...

I know the .ml s are having issues because of the government taking over the .ml domain. But what is going on with sh.itjust.works?

17 comments
  • Hey all,

    There has been a few issues over the last couple of weeks. Here's a rundown of some of the issues we've had.

    • Broken Images: A few weeks ago we had issues with broken images. This was due to me migrating our local image storage to object storage (like AWS S3). When I did the switch it broke all old images as they needed to be migrated. Pictrs at that time did not support concurrent uploads which means the migration would have taken days or weeks to complete during which time the image service would be offline. Instead I waiting for a newer version to be released that supported concurrent connections and did the migration in about 30-40 minutes one evening.
    • 5xx server errors: Some of you may have experience a lemmy page with an error code on it. This was due to me trying to implement an additional proxy to shield and mitigate future risks. While rolling this out I hit a few blockers that caused downtime as we worked to rectify it. I'm glad to say that as of Today this has since been implemented.

    In the addition to the above, the lemmyverse and especially this instance has been under bot attack almost daily. These bot attacks are eating resources and causing a query floods.

    Lemmy is still very young and in its early phases. In time these issues will slowly go away.

    P.S you don't have to worry about me leaving you guys hanging.

  • They were down but aren’t. This is going to happen from time to time for reasons, but most importantly (and this is not an advert or endorsement for centralized services like reddit):

    • these instances are run by small teams, maybe even one person per instance. By “run by” I mean the admins who can actually host and support the hosting environment of the instance, not moderators though that’s an important task too.
    • At reddit or other for-profit companies, multiple teams of people monitor multiple data centers worth of servers, have 24/7 tech support crew, dashboards, alarms, alerts, escallation proceedures drafted by other teams, people they can escallate problems to including usually a decent sized team at the physical datacenter due to the amount of servers they buy because of what they can afford based off advertising income because the site is popular enough, which is why it’s much more rare to see these services go down.

    But so many things can and do fail, including:

    • updates (dependencies, breaking updates, “this should just have worked but it didn’t, why?!”)
    • server issues (too many memes and now the disk has runeth over)
    • one server that gets overloaded or is in a data center that has a network failure, or a hardware failure on the server where the virtual server is hosted
    • account got hacked
    • 0 day exploit targeted directly at this server
    • DoS or DDoS attack
    • Admin has a day job that they need to do to keep the lights on at home and at the lemmy instance and has to do their day job work.

    Speaking from experience, but not with lemmy in particular.

  • Cool it's back up again and my post from lemm.ee is here.

    I didnt expect a message posted to a local instance on another server with the main server being down would make it back to the target server when it comes back up. I guess it's just showing that this system does work as intended.

17 comments