Remote View

2y ago

Lemmy.world status update 2023-07-05

Another day, another update.

More troubleshooting was done today. What did we do:

Yesterday evening @phiresky@phiresky@lemmy.world did some SQL troubleshooting with some of the lemmy.world admins. After that, phiresky submitted some PRs to github.
@cetra3@lemmy.ml created a docker image containing 3PR's: Disable retry queue, Get follower Inbox Fix, Admin Index Fix
We started using this image, and saw a big drop in CPU usage and disk load.
We saw thousands of errors per minute in the nginx log for old clients trying to access the websockets (which were removed in 0.18), so we added a return 404 in nginx conf for /api/v3/ws.
We updated lemmy-ui from RC7 to RC10 which fixed a lot, among which the issue with replying to DMs
We found that the many 502-errors were caused by an issue in Lemmy/markdown-it.actix or whatever, causing nginx to temporarily mark an upstream to be dead. As a workaround we can either 1.) Only use 1 container or 2.) set ~~proxy_next_upstream timeout;~~ max_fails=5 in nginx.

Currently we're running with 1 lemmy container, so the 502-errors are completely gone so far, and because of the fixes in the Lemmy code everything seems to be running smooth. If needed we could spin up a second lemmy container using the ~~proxy_next_upstream timeout;~~ max_fails=5 workaround but for now it seems to hold with 1.

Thanks to @phiresky@lemmy.world , @cetra3@lemmy.ml , @stanford@discuss.as200950.com, @db0@lemmy.dbzer0.com , @jelloeater85@lemmy.world , @TragicNotCute@lemmy.world for their help!

And not to forget, thanks to @nutomic@lemmy.ml and @dessalines@lemmy.ml for their continuing hard work on Lemmy!

And thank you all for your patience, we'll keep working on it!

Oh, and as bonus, an image (thanks Phiresky!) of the change in bandwidth after implementing the new Lemmy docker image with the PRs.

Edit So as soon as the US folks wake up (hi!) we seem to need the second Lemmy container for performance. So that's now started, and I noticed the proxy_next_upstream timeout setting didn't work (or I didn't set it properly) so I used max_fails=5 for each upstream, that does actually work.

327 comments

server load is too low, everyone upvote more stuff so i can optimize more
edit: guess there is some more work to be done 😁
- Upvote causes an endless spinner on Liftoff. 😁
  
  I'm getting 504 gateway time outs when I try to upvote
- I don't understand your graph. It says you are measuring gigabit/sec but shouldn't the true performance rating be gigabeans/sec for a Lemmy instance?
  
  And where's the statistics for days between each core dump? A healthy instance should have at least three days between each one
  
  gigabean/s
  heh
- aye aye sir, to the upvote machine!
- Double the image upload size and you will see more shitposts
  
  I was gonna argue that you'd see more bean posts, but at this point they're the same thing, both in the pun sense and the literal sense
- Web-ui is very smooth rn.. is this .world?
  😅
  Joke aside, the improvement is like heaven and earth. Love it!. Good work teams!
- I'm on another instance, but here's some federated activity for you.
- All hail @phiresky@lemmy.world! Today is your day. You have made the single most valuable contribution and you must be celebrated! Bravo! Hurrah!
- I was just going to post a meme about choosing either creating activity or spare the server from overloading. Now the joke won't stick.

Test:
Upvote if you can see this comment. 👍
- Looking good from here.
  Edit: And comment rapidly going through. :)
- Can't see anything...
- Shame you're getting no karma. Take my upvote.
  
  Oh, I thought that was the joke 😅
  Like a tongue-in-cheek, remember our silly old Reddit ways kinda thing.
  
  I think i was on r for a year or two before I learned what karma was. I still don't understand it's value.
- I can see and upvote this comment. 👍
  
  I can also see and updoot yours. 👍
- 502, sorry mate.
  
  Damn, better luck next time.

The change is noticeable. Good job guys.
Thanks for the updates.
- I agree. Felt it immediately when I started browsing. Everything is faster and more responsive, on top of the error messages disappearing
  
  Yup I can even post comments first try, without getting an error! Things are working well!
- Really noticeable. Cool update. Thank you, guys! ❤️

This is why having a big popular instance isn't all bad. It helps detect and fix the scaling problems and inefficiencies for all the other 1000s of instances out there!
- This, if everyone kept just spreading out to smaller instances as suggested in the beginning, while still a sensible thing to do, no one would have noticed these performance issues. We need to think a few years out, assuming Lemmy succeeds and Reddit dies, and expect that "small instance" will mean 50k users.
  
  I sincerely doubt reddit will die anytime soon, it'll just exist as its own thing that it's new target audience gets bored with and moves on from in a few years when something new and flashy catches their eye in the app store. Just like they do all the other apps designed in exactly the same fashion that reddit is currently morphing into.
  Meanwhile Lemmy will be slowly building it's communities up to be what reddit used to be.
- I'm actually kinda waiting a few releases to start promoting my instance anywhere, letting some other brave instance admins work the kinks out a bit first.
- Agreed. I decided to keep my community on lemmy.world specifically because of the community investment I see being put into it.
- If this project is to stay for the long haul, we gotta load test it and stabilize it. These folks are doing the important work here. Large instances are more or less inevitable if Lemmy sticks.

Wow. So much smoother today.
Great work.
You dropped this 👑
- Damn tootin.
  
  You're gonna be when you see how fast I can bean post now.

You guys had better quit it with all this amazing transparency or it's going to completely ruin every other service for me. Seriously though amazing work and amazing communication.

My upvote can go through fast now
Good work

I love the smell of updates in the morning.

Thank you guys for your awesome work!
Also to other people: DONATE TO FOSS PROJECTS. If 50.000 people donate only 0.5€, we have 25.000€ for funding the servers, coding, motivating/ people etc. Just don't take a cup of coffee for 1 day. We are already 2 millions in Lemmy instances. We can build a decentralized world together!!
- You can pry my cup of coffee from my my cold, dead hands.
  Will donate anyway, I really want this project to keep going.
- euro contributed 🫡
- Is there a good link? One for USD?
  
  If there's a thing you really need to pay in a foreign currency, look into Revolut or Wise. Since I occasionally have to pay stuff in Turkish lira, GBP and donated to USD-only Receivers I like to keep Revolut as my secondary bank account since exchanging one currency to another is completely free!
- you inspired me to serve the greater good!
  
  For example, if you speak a second language, you can even help with translation in projects. Its very easy. E.g. I translated the Jerboa (Lemmy client for Android) in Greek 2-3 days ago. I needed only 1 hour to finish and special 15-20 minutes for fixes that I missed, yesterday.

Boy does it feel good to have those reports and understand the work you guys do. It's really inspiring! Thanks for your hard work, everything has been silk smooth! This instance is really great, Lemmy and its devs are really amazing and I feel at home in a nice, cozy community.

I'm not sure wtf you just said, but lemmy.world feels very smooth today, so thank you for your continued hard work!

Am I getting this correct: the whole lemmy.world instance run in one single container on one single host?
- You'd be surprised at how much performance this kind of setup can squeeze off. Often the limitation is more on the DB/storage than network handling and processing power.
  
  This. Most of the time, the bottleneck will be the database backend.
  Curious if lemmy.world uses separate reader/writer instances.
- Impressive if true!
- if it runs this well on a single container, considering the amount of users it has, I fear the power it'd hold with more
- Lemmy is built on an async work stealing runtime and you can get very large instances from Amazon and the like.
  I'm sure the instance isn't massive but number of containers and a single "host" are not great indicators of efficiency.

So that's why it was so smooth today... Great work!

Submitting PRs is literally the most effective response that helps everyone who uses Lemmy. Thanks to you all.

This is better optimization than most enterprise devs will see in their lifetimes.
- Some managers of the devs are not that interested in significant optimizations... Depends on what incentives and company culture drives them
- Some company would rather throw more hardware at the problem and make the devs work on another useless feature no one use
  
  Yes! And that's a very short seighted solution. And it feels so good to improve performance in code! That's extra performance for "free" 😎

upvoting posts is so much more stable now, we might actually see more bean posts as a result

Upvotes are still getting rejected. Replies hang so I cancel out and it turns out they did post.
That said, browsing is pretty snappy and smooth. I know the kinks will get worked out eventually. Thanks for the update.
Edit: This now appears resolved minutes later. All smooth on my end.
- Hmm. Seems to work for me.. (Yes this is a test reply)
  
  Everyone it’s a test reply, deploy the upvotes
  
  I’ve since successfully upvoted some comments and made replies without it hanging with the spinning circle. Not sure what the issue was but it all seems to be running smoothly now. Thanks.
- I’ve also had errors posting but the reply still goes through.
- Test comment …
  
  Not getting the upvote or comment hang anymore so far.

This is why I love open source. The fact that a community can directly debug the code that's it's being hosted on and directly contribute the improvements back is just wild. Thanks for all the hard work @ruud@lemmy.world and the rest of the lemmy.world team! The site already feels much more responsive.

The server is absofuckinglutely flying today! It feels smooth and bug free!!! You guys are legends.

As a data engineer, I'd be interested in hearing more about the SQL troubleshooting.
EDIT: It looks like !lemmyperformance@lemmy.ml is a good place to subscribe to for more technical info on some of these performance improvements.
Also the Lemmy GitHub of course contains more information on bugs/enhancements/etc.
- Yes please
- Same, my job is like 80% SQL, so it'd be cool to see what is used in the background and maybe help improve things.
- Me too!
- Same here! If you find a way to contribute please give us an update!
  
  Just edited my comment!

Good to see a heavy production server taking on the scaling issues. Thank you! To discuss Lemmy performance issues, there is a dedicated community: !lemmyperformance@lemmy.ml

Appreciate that these updates use the yyyy-mm-dd format :D

It now feels pretty good to browse and it now makes the experience of using Lemmy much more enjoyable. Having to spam the vote buttons was really annoying.
- The vote button change was also the one I immediately noticed!

Thanks to all involved across the board. Great work all around 👏👏

Even though i'm not from this instance, this is such a nice way of keeping the users posted about changes. I wish more companies (I know this is not a company) went straight to the point, instead of using vague terms like "improved stability, fixed few issues with an update" when things are changed. I hope all instance owners follow this trend.

Can we have an update on the status of Lemmy.world and how close ties we are going to have with Meta's threads? Threads is going to support ActivityPub, but time has shown that this is an attempt to try to kill this open platform and eventually replace it with theirs once they get everyone in their ecosystem. (Embrace, Extend...extinguish) Mastodon has said today that they don't mind sleeping with vipers when their demise / dissolution is in Meta's best interest.
Please tell me we are defederating from Meta....or let us know what to expect
EDIT: I originally stated that Mastodon told them to fuck off, but I got confused with Fosstodon (who did that). Mastodon doesn't mind being in bed with Meta
- Where have you seen Mastodon formally state they have no interest in working with them?
  I'm genuinely asking because I'm relatively new to Mastodon and Lemmy and want to be as informed as possible with this whole Meta situation. And just to be transparent, I'm not a Meta fan at all, to the extent I've never had an account with any of their products
  I did read this official Mastodon blog post today...
  https://blog.joinmastodon.org/2023/07/what-to-know-about-threads/
  and my question to you is because I'm not seeing as quite aggressive a stance as you mentioned.
  
  I realized I made a mistake and got Mastodon and Fosstodon confused.
- You may be right and as sad as it is...Meta wanting to work with the federated network should be seen as two things... an attack on the ActivityPub standard via infiltration...and that Meta sees the Fediverse as a threat to its own closed ecosystem.
  We need to be on the defense and protect our platform and standard from corporate meddling and fuckery which Meta will absolutely not hesitate to do.

It's so smooth now; the speed difference is insane! You all are doing excellent work!

Thanks for this very nice report.

Huge props to everyone working on the project. It's awesome seeing everyone work together and resolving issues so quickly!

Thanks for the detailed update and all the hard work you guys are doing!

Gadzooks! These are huge fixes. Compliments to the team, you guys pulled off a small miracle today.

Lemmy's devs and the .world admins have done in a month what Reddit hasn't done in it's whole existence: having a smooth and almost bug-free experience.
Jerboa feels so damn FRESH to use now!
- Not to undervalue the efforts going into this, because I appreciate the new community and the transparency, but I believe we have wildly different definitions of 'almost bug free'
  Which, is also something to consider about user experience consistency. Will be a challenge with growth. Fortunately, plugged in admins and devs will help.
  
  By almost bug free I was speaking only about my own experience. There are probably loads of things under the hood I'm not noticing, but it's been hours since I last noticed any issue.
  I agree that we've still got a long way to go though: both Lemmy and Jerboa are far from their 1.0 release yet.

I don't understand anything other than you worked diligently to make things smoother. Thanks to everyone for their wonderful work!
- Same! My first thought was “that’s an impressing looking graph. I have absolutely no idea what it means.” The proof is in the pudding though - lemmy.world is much improved!

The site is running so much better now, thanks to all. BTW: Love these updates!

I'm very curious: does single Lemmy instance have the ability to horizontally scale to multiple machines? You can only get so big of a machine. You did mention a second container, so that would suggest that the Lemmy software is able to do so, but I'm curious if I'm reading that right.
- A single instance, no. You run multiple instances on multiple machines, then put a frontend (nginx in this case) to distribute the traffic among them.
  
  Interesting, how does that actually work then? Are they just sharing the same database? Is that a supported configuration?

Shouldn't the correct HTTP status code for a removed API be 410? 404 indicates the domain wasn't found or doesn't exist, 410 indicates a resource being removed
- Or 418 for the wrong API being used :)
  
  Unless one is attempting to brew tea.

Awesome work - things seem to be running much more smoothly today.
Do you have anything behind CDN by chance? Looking at the lemmy.world IPs, the server appears to be hosted in Europe and web traffic goes directly there? IPv4 apparently seems to be resolving to a Finland-based address, and IPv6 apparently seems to be resolving to a Germany-based address.
If you put the site behind a CDN, it should significantly reduce your bandwidth requirements and greatly drop the number of requests that need to hit the origin server. CDNs would also make content load faster for people in other parts of the world. I'm in New Zealand, for example, and I'm seeing 300-350 ms latency to lemmy.world currently. If static content such as images could be served via CDN, that would make for a much snappier browsing experience.
- Yes that's one of the things on our To Do list
  
  We use Cloudflare at work. It’s been invaluable so far.
  
  Excellent! Thank you for the hard work and transparency. It's great to see.

How great is it to be a part of history in the making -
This is Web 3 in its fomenting -
Headlines ~5yrs:
The ending of Web 2 was unceremonious and just ugly. u/spez and moron@musk watched as their social media networks signaled the end of Web 2 and slowly dissolved. Blu bird’s value disintegrated and Reddit’s hopes for IPO did likewise. Twitter and Reddit dissolved into odorous flatulence as centralization fell apart to the world’s benefit. Decentralized/federated social media such as Mastodon and Lemmy made their convoluted progress and led Web 3’s development and growth…
This is how history is made, it’s ugly and convoluted but comes out sweeet…

Whilst I'm aware that too many users on one instance can be a bad thing for the wider Fediverse, I think it is a great thing at the moment in terms of how well people are banding together to fix the issues being encountered from such a surge in users.
The issues being found on lemmy.world results in better lemmy instances for everyone and improves the whole Fediverse of lemmy instances.
I'm very impressed with how well things are being debugged under pressure, well done to all those involved 👏
- I agree, it's great to see Lemmy being battle-tested in large instance scenarios.

Is it safe to use 2FA yet?

That's so awesome! Look at that GRAPH!
I'd volunteer to be a technical troubleshooter - very familiar with docker/javascript/SQL, not super familiar with rust - but I'm sure yall also have an abundance of nerds to lend a hand.
- You should try to contact one of the admins of this server (Ruud is very busy tho, lots of mentions) and see if you could be of any help. I am sure they would appreciate even just the offer 😄

You guys are absolute legends, thanks for the update!

Hey I can upvote now!

You guys are absolutely amazing. So many thanks to you @Ruud and the entire admin/troubleshooting team! Thank you.

It blows my mind with the amount of traffic you guys must be getting that you are only running one container and not running in a k8s cluster with multiple pods (or similar container orchestration system)
Edit: misread that a second was coming up, but still crazy that this doesn’t take some multi node cluster with multiple pods. Fucking awesome

smoooooooooth! Keep up the good work!

Is it weird that I’m always excited to read the update posts?
- Not at all. The admins here are doing great work and their updates are often informative and helpful, it makes sense you'd look forward to them.

Wow it is smooth as butter now. Great job ruud and team!

Installed Jerboa again and it feels smoother than Reddit itself, great job!
- It reminds me somehow thw old Google+ in the mid 2010s

Things have been super smooth lately, thanks for all the work!
- Yeah this morning everything has loaded so much quicker, I’ve been able to post and vote on comments no problem! Lemmy is really starting to take form. I fucking love this whole thing.

@ruud crazy impressive
- Agreed

Really great job, guys! I know from my experience in SRE that these types of debugs, monitoring and fixes can be much pain, so you have all my appreciation. I'm even determined to donate on Patreon if it's available

327 comments

Test: