Skip Navigation

Update: Pushing back against the wave of bot accounts on Lemmy

This is an update to my previous post about suspicious inactive accounts on a handful of instances: (https://sh.itjust.works/post/998307).

I ended up messaging the admins at the 16 instances show in the attached image. I pointed out their wild user numbers, and referenced the lemmy.ninja post detailing how that instance scrubbed suspicious accounts from their user database.

6 admins responded. They had all noticed the odd accounts and either thought the numbers were wrong, or weren't sure how to purge the suspicious accounts without nuking their databases. In the end they managed to delete a combined total of about 338k dormant accounts from their instances. (One of the instances seems to have gone down since then.)

I never received a reply from the other 10 instance admins, though 8 of those 10 instances appear to be down (as of 27 July 2023). 2 instances are still up and unchanged.

Between the actively removed accounts and the downed instances, this represents a loss of 930,004 inactive Lemmy accounts!

You can see the drop in the graphs on The Federation. The total number of Lemmy accounts has been cut in half over the past 3 weeks, from a peak of 2.18M to today's 1.09M. The change is mostly from these 16 instances.

I have to admit, I did not expect such a large change when I started this! Hopefully this bodes well for Lemmy's future as a place where actual humans interact, rather than a cesspool of automated comments and upvote/downvote brigading.

That's all I have for now. Keep your stick on the ice; we're all in this together.

36 comments
  • Fantastic work.

    Do you think the bot numbers for Reddit will be as bad or worse? Or is there better protection over there?

    • No major social media site publishes estimates on bot activity, so unless someone is citing a research paper with a reasonable bot-id technique, they're speculating. That said, there are a few useful things we can say with only modest speculation:

      1. No commercial social media site has as trivial a sign up process as these instances. They had no email verification, no captcha, and no validation or gating process of any kind. Scripts created this users with a single API call, hitting it as fast as the server would respond. So on the account validation front, reddit does better than these instances of keeping bots out.
      2. Every commercial social media site has a security team that attempts to monitor bots and has the capability to remove them. Some of these admins were aware of the signups, and others didn't know how to respond. So on the monitoring and response front, reddit is more sophisticated at detecting and responding to bots.
      3. These instances I believe had zero or one active users vs 100k+ bot accounts. It's hard to say what the bot rates are on commercial social media sites, but I think we can confidently bound it to something lower than 100k to 1 in favor of bots.
      4. The aggregate number of bots represented about half the total lemmyverse. I'm sure someone will disagree with me, but I would be pretty surprised if half the signups at commercial sites are malicious. But that's more plausible than 100k to 1.
      5. But one the other hand, the activity of these bots is public, and they demonstrably didn't do anything. At least some of the malicious/clandestine bot accounts on commercial social media sites are active... so maybe here Lemmy gets the win since this massive wave of bots went unused. Now, that doesn't mean that OTHER more sophisticated and undetected bits aren't active on Lemmy just as they are on other social sites. But my bet is there is little to none because Lemmy doesn't matter enough to be worth attacking by the people who are able to run sophisticated bots. But this is hard to prove one way or another.

      TLDR: This signup wave was so unsophisticated it would never have been possible on a major social site with a security team. But it also didn't do any altanfible damage, unlike clandestine bot activity on major social sites. Depending on what metrics you use to compare (and how made up your metrics are, since this is all about activity that attempts to stay hidden), either side can come out on top.

    • I can't say. I don't know of a good way to tell an authentic human-driven account from a bot account, either on Lemmy or Reddit. Here on Lemmy we can at least get aggregate user data and point to suspicious trends, which is all I have done. Reddit, on the other hand, is a completely closed box.

  • You say there's nothing thing to worry about because you're narrowing your focus to one specific week of high activity. If this is the week the Reddit API changes took effect, this is the week I migrated over and I haven't logged into Reddit since. However, there has been some growing pains and with intermittent issues on different instances, I ended up creating accounts on multiple servers that week. I've only commented using this one so far but I appreciate having the others logged in on Jerboa so I can jump between them. Sometimes this is necessary when servers go down temporarily. So far, these other accounts are just for lurking but they can clearly be captured in your net of possible "bots". I think you are greatly underestimating the number of legit accounts you are sweeping up. One thing I have not seen from your posts is any evidence from of malicious activity on these accounts. Therefore, I don't agree with pressuring admins to terminate these accounts in bulk. I would prefer to see action based on truth rather than baseless speculation. What's wrong with removing accounts only if they ever become a problem?

36 comments