Remote View

10mo ago

Better Lemmy Through Automated Moderation

Santa is a robot moderator. Santa will decide if you're naughty or nice. Santa has no chill.

Hi everyone!

The slrpnk admins were nice enough to let me try a little moderation experiment. I made a moderation bot called Santa, which tries to ease the amount of busywork for moderators, and reduce the level of unpleasantness in conversations.

If someone's interactions are attracting a lot of downvotes compared to their upvotes, they are probably not contributing to the community, even if they are not technically breaking any rules. That's the simple core of it. Then, on top of that, the bot gives more weight to users that other people upvote frequently, so it is much more accurate than simply adding up the up and down vote totals. In testing, it seemed to do a pretty good job figuring out who was productive and not.

Most people upvote more than they downvote. To accumulate a largely negative opinion from the community, your content has to be very dislikable. The current configuration bans less than 3% of the users that it evaluates, but there are some vocal posters in that 3%, which is the whole point.

It is currently live and moderating !pleasantpolitics@slrpnk.net. It is experimental. Please don't test it by posting bad content there. If you have a generally good posting history, it will probably let you get away with being obnoxious, and it won't be a good test. Test it by posting good things that you think will attract real-life jerks, and let it test its banhammer against them instead of you.

FAQ

Q: I just saw content that wasn't pleasant!

A: "Pleasant" was the wrong word for the test community. People will sometimes say things you find unpleasant, potentially more so, since the human moderation is lighter. That's by design. Many Lemmy communities contain a large amount of content which is "polite" or "civil" but which in total is detracting significantly from the experience. I do plan to allow content which is offensive, up to a certain point, as long as it doesn't become a dominant force.

The theory is that we're all adults, and we can handle an occasional rude comment or viewpoint we don't like. If someone is a habitual line-stepper, then they will get shown the door, but part of the whole point is that the good actors can be free of a moderator looking over their shoulder on every comment deciding whether or not they're allowed to say it.

That's not to mean this is a "free speech" community. If content that's offensive for the sake of offensiveness starts to proliferate, then I'll probably put rules into place to address it. But you will find content that is not "pleasant."

Q: Why was my comment deleted?

A: Sorry. If you haven't posted a lot in the recent past, but you've been getting some downvotes, the bot will err on the side of caution and not let you post. This isn't a perfect solution, since it starts to verge on removing unpopular viewpoints, but it's necessary to protect the community from malicious content from throwaway accounts.

If you don't have a lot of recent activity in your account, but you've posted some unpopular things, Santa may come after you. It may not be fair. The best thing to do is to post productively and actively outside of controversial topics, wait a few days, and try again.

Q: Why was I banned?

A: You may be a jerk. Sorry you had to find out this way.

It's not hard to accumulate more weighted upvotes than downvotes. In the current configuration, 99% of the users on Lemmy manage it. If you are one of the 1%, it's because you have enough posting history that the bot has observed a firm community consensus that your contributions are more negative than positive.

The bot is not making a decision about you. The community is. If you are banned, it's because you are being downvoted overwhelmingly. The viewpoint you are expressing is probably not the issue. The Lemmy community is very tolerant of a wide variety of views. Some people may disagree with you and you may find that oppressive, but the bot will not ban you simply because some users argue with you when you say certain things. Those users are allowed to have their view, just like you have yours.

If you find you are banned and you're willing to hear suggestions about how to present your argument without everyone downvoting you, leave a comment. Reducing your downvotes will help the bot recognize you as reasonable, but it will also probably help you get your point across more successfully. In order for the bot to ban you, you have to be received overwhelmingly negatively by the community, which probably means you're not convincing very many people of what you're saying.

If you're not willing to hear those suggestions and simply want to insist that it's everyone else that is the problem, the bot is being evil to you, your free speech is being infringed, and I am a tyrant if I don't let you into the community to annoy everybody, I would respectfully request that you take it somewhere else.

Q: How long do bans last?

A: Bans are transient and based on user sentiment going back one month from the present day. If you have not posted much in the last month, even a single downvoted comment could result in a ban. If that happened to you, it should be easy to reverse the ban in a few days by engaging and posting outside of the moderated community, showing good faith and engagement, and bringing your average back up.

If you are at all a frequent poster on Lemmy and received a ban, you might have some negative rank in your average, and your ban may be indefinite until your habitual type of postings and interactions changes, and your previous interactions age past the one month limit.

Q: How can I avoid getting banned?

A: Engage positively with the community, respect others’ opinions, and contribute constructively. Santabot’s algorithm values the sentiment of trusted community members, so positive interactions are key.

If you want to hear examples of positive and negative content from your history, let me know and I can help. Pure voting totals are not always a good guideline to what the bot is reacting to.

Q: How does it work?

A: The code is in a Codeberg repository. There's a more detailed description of the algorithm there, or you can look at the code.

Q: Won't this create an echo chamber?

A: It might. I looked at its moderation decisions a lot and it's surprisingly tolerant of unpopular opinions as long as they're accompanied by substantial posting outside of the unpopular opinion. More accurately, the Lemmy community is surprisingly tolerant of a wide range of opinions, and that consensus is reflected when the bot parses the global voting record.

If you're only posting your unpopular opinion, or you tend to get in arguments about it, then that's going to be a problem, much more than someone who expresses an unusual opinion but still in a productive fashion or alongside a lot of normal interactions.

If you feel strongly that some particular viewpoint, or some particular person's ability to stand up for it, is going to be censored, post a comment below with your concerns, and we can talk. It's a fair concern, and there might be cases where it's justified, and the bot's behavior needs to be adjusted. Without some particular case to reference, though, it's impossible to address the concern, so please be specific if you want to do this.

Q: Won't people learn to fake upvotes for themselves and trick the bot?

A: They might. The algorithm is resistant to it but not perfectly. I am worried about that, to be honest, much more than about the bot's decisions about aboveboard users being wrong all that often.

Q: Why doesn't the bot notify for bans?

There are a few users who get banned or unbanned very day, as the pattern of user comments and votes changes over time. It's important that bans be "lightweight," and always reversible for anyone who is banned. It's not a heavy thing like most Lemmy moderation. It already bothers me that the flow of bans creates spam in the modlog. I don't want to amplify that to DM spam across all of Lemmy.

I did have functionality at one point to notify for certain situations, and it triggered once, and that user complained to me that my bot was notifying them about a ban in a community they had never heard of and didn't care about at all. I think they were right to complain. I don't want to send out spam. Multiplying that interaction by 100 user actions per month isn't something I want to do.

I do want to make sure it's transparent to people why they are banned, and what they can do to get unbanned, if it comes up. If anyone has any ideas about how I can make it more clear to people who do try to post and find they are banned, that they are banned and why, I'm open to the suggestion. I've tried to do this, but I found that the people who are banned aren't interested in any reasonable conversation about any of that, so I doubt that anything I could do on my end would make it work any better. You have to be very unreasonable for the bot to blacklist you outright.

What do you think?

It may sound like I've got it all figured out, but I don't think I do. Please let me know what you think. The bot is live on !pleasantpolitics@slrpnk.net so come along and give it a try. Post controversial topics and see if the jerks arrive and overwhelm the bot. Or, just let me know in the comments. I'm curious what the community thinks.

Thank you!

You're viewing a single thread.

29 comments

There are so many problems with this.
It would be extraordinarily easy to bot it and just silence anyone you want.
I agree, moderation is absolutely necessary to maintaine civil discussion, but silencing people, because they have unpopular opinions, is a really bad idea.
I love lemmy because it is the ultimate embodiment of decentralised free speech. This destroys that.
If I were a bad actor, hypothetically, let's just say lammy.ml or haxbear and I decided I wanted to silence anyone who disagrees with what I have to say. Then I could just make a fork of this project to only value my instances votes and censor anyone who doesn't agree with what my community thinks.
This tool simply acts as a force multiplier for those who want to use censorship as a tool for mass silencing of descent.
- Oh no! It hadn't occurred to me that excluding unpopular opinions might be a problem. If only I'd thought of that, I might have looped in some other people, talked extensively about the problem and carefully watched how it was working in practice and tweaked it until it seemed like it was striking the right balance. I might have erred heavily on the side of allowing people to speak to the point that I was constantly fielding complaints from people wanting me to remove something they said shouldn't be allowed.
  And furthermore, you're right. If this catches on then lemmy.ml might be able to silence dissenting views. That would be terrible.
  
  So I just noticed that your reply has more downvotes than upvotes.
  And also, you tone seems to be sarcastic and going straight against what you I thought you were actually advocating for, which is positive communication.
  I like the idea/theory of your bot, but the tone of your response to that person totally caught me off-guard.
  If the santa bot were modding this very community, with all the negative downvotes your posts have gotten, wouldn't you be banned according to it's programming?
  
  Being friendly at a surface level with other people is not at all the same as bringing pleasant interactions and patterns to the conversation as whole.
  You, of all people, should reflect on that. Making sure to smile while you're shitting on the carpet doesn't mean you're welcome at the party.
  For transparency's sake, I didn't intend to ban anyone from the Santabot meta community, but you and your alts obviously deserve an exception. I'm banning you and I'll make a post recommending that the admins do something about your other ban-evading accounts.
  
  So the talk of some of the more eccentric parts of the fedi got me thinking here. I run a currently single user instance largely because the state of mod tools is scary (the inability to easily look over the activities on my instance for example) but would potentially like to open the doors at some point. Tools like this could help that.
  A couple edge situations that I wonder how it would respond.
  I've a time or two relocated the instance in my lab just by rebuilding it because migrating DBs is a pain and I'm the only one here anyhow but used the same user and domain names, would the bot recognize those recreated users as separate entities or would any actions be based purely on the name?
  In a couple cases I've run one of those subscriber bots and as a result found some communities in circle-jerk parts of the fedi. Posting in them with anything dissenting from their views ends up all kinds of negative. Does the bot take into consideration scoring based on the user profile including actions outside the moderated community, or just within its own territory?
  
  I’ve a time or two relocated the instance in my lab just by rebuilding it because migrating DBs is a pain and I’m the only one here anyhow but used the same user and domain names, would the bot recognize those recreated users as separate entities or would any actions be based purely on the name?
  If it's the same user and domain name, but something's been rebuilt behind the scene, it'll identify it as the same user. The same user name on a different instance, as identified by hostname, shows up as a different user.
  In a couple cases I’ve run one of those subscriber bots and as a result found some communities in circle-jerk parts of the fedi. Posting in them with anything dissenting from their views ends up all kinds of negative. Does the bot take into consideration scoring based on the user profile including actions outside the moderated community, or just within its own territory?
  It's an interesting question. It's going to be different case by case, but I think most of the time, it shouldn't affect you too much. Most of the users in those circle-jerk communities have very small rank values, or negative. I think any participation you're doing in those communities won't matter one way or another.
  A while ago, the algorithm was so simplistic that a user with heavily negative rank would have their votes count backwards. If a bunch of the circle-jerk people were downvoting you, that would actually raise your rank, and upvotes would lower it. I left it that way for quite a while, both because it's funny and because it does have a certain logic. If someone's wrong more often than they're right, and they're upvoting you, that probably means you're doing something wrong.
  After a while, I got rid of it. On top of the obvious possibility it creates for abuse, it turns out that a lot of the low-rank people still give out some sensible votes sometimes, and those people would get penalized unreasonably. So now, anything that someone does in a community that the wider community doesn't respect, just gets ignored.
  
  So that second part makes for some considerations. I can see merit in both taking consideration of outside actions and to keeping it limited in scope.
  Using the user profile as a source rather than the local community you can get an instant status on the person even if they've never been in the moderated space. If they exclusively or even mostly posted in 'those' places though then their extremely popular views in that space may well not be suited for the wider fedi but still leave them with enough rank as I'm interpreting it to be allowed room to put out their rhetoric.
  While there's certainly something to be said for engaging with dissenting views, there are also a fair number of spaces out there full of people that are just a general net negative to civil conversation, but very strongly agree with each other, usually they're found under a bridge harassing goats. Wondering here if there's a way to create a weighting to discount actions in specific communities without creating the echo chamber effect as a result.
  
  We talked about that issue early on. As a matter of fact, Santa initially had some options to give different types of weight to different instances, so that the "wrong" people wouldn't wind up controlling its output, but in the end it turned out that simply by giving it a good amount of data about the holistic picture of all the communities, it was able to figure out who the jerks were without needing guidance about it.
  The thing is that "rank" circulates within the community and feeds on itself. It's not just that upvotes give you positive rank. It's contingent on the rank of the person voting for you, in a huge recursive expression, so trusted users have more weight than untrusted users. I can show you the math, it's really neat.
  In the PageRank version of the algorithm, you could model PageRank as being a finite physical substance flowing through a network between the different web pages. That means that if you can create fake nodes adding up to 1% of the network, you'll have 1% of the rank at your disposal to be able to use for gaming the system. My version of the algorithm isn't like that. Because of the way I decided to change things to add downvotes into it, rank can build up and multiply within feedback loops. So if you create a little cycle of 100 users that all upvote each other, they'll have some rank, but it's not going to be able to outweigh tens of thousands of users that all upvote each other and multiply their rank on each other. So if the two networks are coming into communication, any downvotes flowing from the big network to the small network are going to have a huge weight that'll overwhelm the small network's ability to game the system. As far as I can tell, adding downvotes to the math actually made it more resistant to some failure modes than PageRank was.
  It's hard to say, of course, without seeing specifics. And it's a tricky balancing act. You don't want a minority community to be subject to censorship, but you also don't want an obnoxiously vocal group of trolls to be able to overcome the community's disapproval because they all agree with each other in their obnoxiousness.
  I can talk in specifics, if that makes it easier. If you point out somebody from one of those minority communities that you think is "escaping" into the wider community and causing problems, I can do some introspection and let you know how Santa views their user and why. I think the way I have the algorithm tuned now is pretty good, but it's always good to check, because I keep finding stuff that I missed.
  Getting back to what you were saying initially, if you want to try it out on some of your communities to ease the moderation load for you, I think it's ready. I've been running !pleasantpolitics@slrpnk.net and doing almost no moderation of my own, and things have been working fine. It's been a little too eager to delete comments from people it doesn't have a clear picture of, and I think I fixed that now, but the fix hasn't been in action for long enough to be confident. But I do think that it's realistic that this could be in alpha release as a hands-off moderation tool for a real sizable community. There are some new features I wanted to add to make it useful for real moderation, but if you want my help to set it up to be useful for running on a while instance and vetting the new users and controlling the communities and things, I think it's ready to do that.
  
  Very interesting, it sounds similar to the web-of-trust scoring system implemented on Freenet some while ago I'd done some reading on. For right now most of my curiosity is in an academic sense. My node is functionally a client at this point largely set up because I can and enjoy the technical aspects rather than creating an account elsewhere. If I did start actually having local communities and users though I expect something like this would be useful, often times I would trust the impartiality of an algorithm over the more emotional response of people for consistency.
  
  I agree. It helps if you model the thing as implementing the will of the community, outsourcing moderation to the consensus instead of having one person making the best decisions they can come up with.

29 comments