The best part of the fediverse is that anyone can run their own server. The downside of this is that anyone can easily create hordes of fake accounts, as I will now demonstrate.
Fighting fake accounts is hard and most implementations do not currently have an effective way of filtering out fake accounts. I'm sure that the developers will step in if this becomes a bigger problem. Until then, remember that votes are just a number.
This was a problem on reddit too. Anyone could create accounts - heck, I had 8 accounts:
one main, one alt, one "professional" (linked publicly on my website), and five for my bots (whose accounts were optimistically created, but were never properly run). I had all 8 accounts signed in on my third-party app and I could easily manipulate votes on the posts I posted.
I feel like this is what happened when you'd see posts with hundreds / thousands of upvotes but had only 20-ish comments.
There needs to be a better way to solve this, but I'm unsure if we truly can solve this. Botnets are a problem across all social media (my undergrad thesis many years ago was detecting botnets on Reddit using Graph Neural Networks).
Maybe you're right, but it just felt uncanny to see thousands of upvotes on a post with only a handful of comments. Maybe someone who active on the bot-detection subreddits can pitch in.
Reddit had ways to automatically catch people trying to manipulate votes though, at least the obvious ones. A friend of mine posted a reddit link for everyone to upvote on our group and got temporarily suspended for vote manipulation like an hour later. I don't know if something like that can be implemented in the Fediverse but some people on github suggested a way for instances to share to other instances how trusted/distrusted a user or instance is.
An automated trust rating will be critical for Lemmy, longer term. It's the same arms race as email has to fight. There should be a linked trust system of both instances and users. The instance 'vouches' for the users trust score. However, if other instances collectively disagree, then the trust score of the instance is also hit. Other instances can then use this information to judge how much to allow from users in that instance.
I got suspended multiple times because my partner and daughter were also in our city's sub, and sometimes one of them would upvote my comments without realizing it was me. It got really fucking annoying, and of course there's no way to talk to a real person at reddit to prove we're different people. I'd appeal every time and they'd deny it every time. How reddit could have gotten so huge without realizing that multiple people can live in the same household is beyond me. In the end they both just stopped upvoting anything in the sub because it was too risky (for me).
Yes, I feel like this is a moot point. If you want it to be "one human, one vote" then you need to use some form of government login (like id.me, which I've never gotten to work). Otherwise people will make alts and inflate/deflate the "real" count. I'm less concerned about "accurate points" and more concerned about stability, participation, and making this platform as inclusive as possible.
In my opinion, the biggest (and quite possibly most dangerous) problem is someone artificially pumping up their ideas. To all the users who sort by active / hot, this would be quite problematic.
I'd love to actually see some social media research groups actually consider how to detect and potentially eliminate this issue on Lemmy, considering Lemmy is quite new and is malleable at this point (compared to other social media). For example, if they think metric X may be a good idea to include in all metadata to increase chances of detection, then it may be possible to include this in the source code of posts / comments / activities.
I know a few professors and researchers who do research on social media and associated technologies, I'll go talk to them when they come to their office on Monday.
On Reddit there were literally bot armies by which thousands of votes could be instantly implemented. It will become a problem if votes have any actual effect.
It’s fine if they’re only there as an indicator, but if the votes are what determine popularity, prioritize visibility, it will become a total shitshow at some point. And it will be rapid. So yeah, better to have a defense system in place asap.
I always had 3 or 4 reddit accounts in use at once. One for commenting, one for porn, one for discussing drugs and one for pics that could be linked back to me (of my car for example) I also made a new commenting account like once a year so that if someone recognized me they wouldn't be able to find every comment I've ever written.
On lemmy I have just two now (other is for porn) but I'm probably going to make one or two more at some point
If you and several other accounts all upvoted each other from the same IP address, you'll get a warning from reddit. If my wife ever found any of my comments in the wild, she would upvoted them. The third time she did it, we both got a warning about manipulating votes. They threatened to ban both of our accounts if we did it again.
I'd just make new usernames whenever I thought of one I thought was funny. I've only used this one on Lemmy (so far) but eventually I'll probably make a new one when I have one of those "Oh shit, that'd be a good username" moments.
I would think that they need to set a somewhat permissive threshold to avoid too many false positives due to people sharing a network. For example, a professor may share a reddit post in a class with 600 students with their laptops connected to the same WiFi. Or several people sharing an airport's WiFi could be looking at /r/all and upvoting the top posts.
I think 8 accounts liking the same post every few days wouldn't be enough to trigger an alarm. But maybe it is, I haven't tried this.
I had one main account but also a couple for using when I didn't want to mix my "private" life up with other things. I don't even know if it's not allowed in the TOS?
Anyway, I stupidly made a Valmond account on several Lemmy instances before I got the hang of it, and when (if!) my server will one day function I'll make an account there so ...
I guess it might be like in the old forum days, you have a respectable account and another if you wanted to ask a stupid question etc. admin would see (if they cared) but not the ordinary users.
I think the best solution there is so far is to require captcha for every upvote but that’d lead to poor user experience. I guess it’s the cost benefit of user experience degrading through fake upvotes vs through requiring captcha.
IMO the best way to solve it is to 'lower the stakes' - spread out between instances, avoid behaviors like buying any highly upvoted recommendation without due diligence etc. Basically, become 'un-advertiseable', or at least less so
I don't know how you got away with that to be honest. Reddit has fairly good protection from that behaviour. If you up vote something from the same IP with different accounts reasonably close together there's a warning. Do it again there's a ban.
I did it two or three times with 3-5 accounts (never all 8). I also used to ask my friends (N=~8) to upvote stuff too (yes, I was pathetic) and I wasn't warned/banned. This was five-six years ago.
The latter. I was making bots to collect data (for the previously-mentioned thesis) and to make some form of utility bots whenever I had ideas.
I once had an idea to make a community-driven tagging bot to tag images (like hashtags). This would have been useful for graph building and just general information-lookup. Sadly, the idea never came to fruition.
The lack of karma helps some. There's no point in trying to rack up the most points for your account(s), which is a good thing. Why waste time on the lamest internet game when you can engage in conversation with folks on lemmy instead.
I don't think other people can see it though. On Reddit bot accounts would rack up karma so that when they switch to posting spam it looks like they have a lot of karma and are someone who posts worthwhile things.
EDIT I was wrong! Lemmy does have karma, even listed in the API, though for some reason it doesn't show this to you itself. So, those of us just using Lemmy directly have been under the mistaken idea that it didn't do it, and those using third party apps are seeing it: https://lemmy.world/post/1250922?scrollToComments=true
~~That's interesting, because on the Lemmy website, there is no total upvotes number visible. It only shows the total number of posts and total number of comments. It then shows the list of posts and comments, and you can see the scores for each, but there's no total. Memmy must be calculating this itself. This seems to be something third party app developers are adding which is not present in actual Lemmy itself, in order to try to replicate Reddit Karma somewhat.
As Lemmy works itself: On Reddit, in addition to your posts and comments having visible scores, your username also has an aggregate score, which Lemmy does not have. At least, when I go to your profile, I can see the scores for your posts and comments, but I cannot see any aggregate score for you as a user. That's what Reddit Karma is. I don't know what black magic formula Reddit calculates it from, as old Reddit and new Reddit show different Karma numbers for the same user, but whatever algorithm they use, it's an overall user score that Lemmy does not have (so far, at least). ~~
The lack of karma also makes it worse. Usually if I saw a discussion that felt kinda off I'd check the accounts age and karma. Made it easier to sniff out bots.
In case anyone's wondering this is what we instance admins can see in the database.
In this case it's an obvious example, but this can be used to detect patterns of vote manipulation.
I appreciate it, good for demonstration and just tickles my funny bone for some reason. I will be delighted if this user gets to 100,000 upvotes—one for every possible iteration of shill#####.
If you're an indie dev marketing game, it's cheap as shit. Shoving your post into the faces of thousands would very easily get you more than that in sales.
Web of trust is the solution. Show me vote totals that only count people I trust, 90% of people they trust, 81% of people they trust, etc. (0.9 multiplier should be configurable if possible!)
I've been thinking about an admin that votes on example posts to define the policy, and then getting users scored against it, then using high scorers to represent user copies of the admins spirit of moderation, and then make systems that use that for automoderation.
e.g. I vote yes, no, yes. I then run the script that checks my users that have voted in all three, and the ones with the highest matching votes that i define(must be 100% matching to my votes) gets counted as "matching my spirit of moderation". If a spirit of moderation user downvotes or reports then it can be auto flagged into an admin console for me to then rapidly view instead of sifting through user complaints, and if things get critically spicy i can promote them to emergency mods, or automate their reports so that if a spirit user and a random user both report, it gets auto removed.
For each vote, read user post content and vote history and age
This should happen in the client and easily controllable by the user. As well as to investigate why one particular post or current was selected by the local content discovery algorithm. So you can quickly find fraudulent accounts and block them.
And this public, user led moderation actions then go on to inform the content discovery algorithm of other users until we have consensus user led content discovery and moderation.
And just like that we eliminate the need for shadowy humans of the moderator priesthood to play human spamfilter / human thought manipulator
The nice things about the Federated universe is that, yes, you can bulk create user accounts on your own instance - and that server can then be defederated by other servers when it becomes obvious that it's going to create problems.
It's not a perfect fix and as this post demonstrated, is only really effective after a problem has been identified. At least in terms of vote manipulation from across servers, it could act if it, say, detects that 99% of new upvotes are coming from a server created yesterday with 1 post, it could at least flag it for a human to review.
It actually seems like an interesting problem to solve. Instance runners have the sql database with all the voting record, finding manipulative instances seems a bit like a machine learning problem to me
One other thing is that you can bulk create your own instances, and that's a lot more effort to defederate. People could be creating those instances right now and just start using them after a year; at least they have incurred some costs during that..
I believe abuse management in openly federated systems (e.g. Lemmy, Mastodon, Matrix) is still an unsolved problem. I doubt good solutions will arrive before they become popular enough to attract commercial spammers.
This is really important to call out. Also though the bots have gotten so good it would be hard to tell the difference. To be honest though I'm pretty sure reddit was teeming withing them and it didn't really bother me. lol
Votes were just a number on reddit too... There was no magic behind them, and as Spez showed us multiple times: even reddit modified counts to make some posts tell something different.
And remember: reddit used to have a horde of bots just to become popular.
Honestly, thank you for demonstrating a clear limitation of how things currently work. Lemmy (and Kbin) probably should look into internal rate limiting on posts to avoid this.
I'm a bit naive on the subject, but perhaps there's a way to detect "over x amount of votes from over x amount of users from this instance"? and basically invalidate them?
How do you differentiate between a small instance where 10 votes would already be suspicious vs a large instance such as lemmy.world, where 10 would be normal?
I don't think instances publish how many users they have and it's not reliable anyway, since you can easily fudge those numbers.
10 votes within a minute of each other is probably normal. 10 votes all at once, or microseconds of each other, is statistically less likely to happen.
I won't pretend to be an expert on the subject, but it seems like it's mathematically possible to set some kind of threshold? If a set percent of users from an instance are all interacting microseconds from each other on one post locally, that ought to trigger a flag.
Not all instances advertise their user counts accurately, but they're nevertheless reflected through a NodeInfo endpoint.
Federated actions are never truly private, including votes. While it's inevitable that some people will abuse the vote viewing function to harass people who downvoted them, public votes are useful to identify bot swarms manipulating discussions.
This. It's only a matter of time until we can automatically detected vote manipulation. Furthermore, there's a possibility that in future versions we can decrease the weight of votes coming from certain instances that might be suspicious.
And it’s only a matter of time until that detection can be evaded. The knife cuts both ways. Automation and the availability of internet resources makes this back and forth inevitable and unending. The devs, instance admins and users that coalesce to make the “Lemmy” have to be dedicated to that. Everyone else will just kind of fade away as edge cases or slow death.
How would that work? How will new instances/servers ever get a chance to grow if the fediverse only allowed those who are already whitelisted? Sorry for my limited knowledge about fediverse but it sounds like that goes directly against the base principle of a federated space?
I think people often forget federation is not a new thing, it's a first design for internet communication services. Email, which is predating the Internet, is also federated network and most popular widely adopted of them all modes of Internet communication. It also had spam issues and there where many solutions for that case.
The one I liked the most was hashcash, since it requires not trust. It's the first proof-of-work system and it was an inspiration to blockchains.
Now days email spam filter especially proprietary from Google or Verizon yahoo really make indie mail server harder to maintain and always got labeled as spam even with DKIM, dmarc, right spf, and clean reputable public IP
I don't know what the answer is, but I hope it is something more environmentally friendly than burning cash on electricity. I wonder if there could be some way to prove time spent but not CPU.
maybe we can show a breakdown of which servers the votes are coming from so anything sus can be found out right away. Like, it would be easy enough to identify a bot farm I'd think
This is something that will be hard to solve. You can't really effectively discern between a large instance with a lot of users, and instance with lot of fake users that's making them look like real users. Any kind of protection I can think of, for example based on the activity of the users, can be simply faked by the bot server.
The only solution I see is to just publish the vote% or vote counts per instance, since that's what the local server knows, and let us personally ban instances we don't recognize or care about, so their votes won't count in our feed.
that would be the best way to do it, i guess if you go further you could let users filter which instances they would like to "count" and even have whole filter lists made by the community.
I like that idea. A twist on it would be to divide the votes on a post by the total vote count or user count for that instance, so each instance has the same proportional say as any other. e.g. if a server with 1000 people gives 1000 upvotes, those count the same as a server with 10 people giving 10 votes.
Wouldn't that make it actually a lot worse? As in, if I just make my own instance with one user total, I'll just singlehandedly outvote every other server.
I think it would actually be pretty easy to detect because the bots would vote very similarly to each other (otherwise what's the point), which means it would look very different from the distribution of votes coming from an organic user base
So far, the majority of content that approaches spam I've come across on Lemmy has been posts on !fediverse@lemmy.ml which highlight an issue attributed to the fediverse, but which ultimately have a corollary issue on centralised platforms.
Obviously there are challenges to address running any user-content hosting website, and since Lemmy is a comminity-driven project, it behooves the community to be aware of these challenges and actively resolve them.
But a lot of posts, intentionally or not, verge on the implication that the fediverse uniquely has the problem, which just feeds into the astroturfing of large, centralized media.
I disagree, i just got massively bandwagon downvoted into oblivion in this thread and noticed that as soon as a single downvote hits, it's like blood in the water and the piranhas will instantly downvote, even if its nonsensical. Downvotes act as a guide for people that don't really think about the message contents, and need instructions on how to vote. I'd love if comments got their votes censored for 1 hour after posting.
IMO, likes need to be handled with supreme prejudice by the Lemmy software. A lot of thought needs to go into this. There are so many cases where the software could reject a likely fake like that would have near zero chance of rejecting valid likes. Putting this policing on instance admins is a recipe for failure.
I think the point is that the Fediverse is severely limited by this vulnerability. It's not supposed to solve that specific problem, but that problem might need to be addressed if we want the Fediverse to be able to do what we want it to do (put the power back in the hands of the users)
Yes but you presumably had to go through a captcha to make each one, whereas here someone can spin up an instance and 'create' 1 million accounts immediately.
Don't store incoming data from remote instances into the "Main DB" immediately. Store them into SUBORDINATE DATABASES!
The logic of how you arrange these subordinate databases should be simple; depending on which instance you're communicating with you could select a subordinate database like so;
First; we need to have a "Main Delay" database. This database is used by all the instances we Both Federate With, and Mark as one we Trust! and we merge all records here into the main database on a specified timeframe to give ourselves a little time to roll back the clock if something betrays that trusted status.
Secondly we need to have unique little databases for each little instance that we Federate with, but do not yet mark with trust! These little DBs are merged into "Main Delay", then Main on a different time-delay schedule. This gives us even more time to roll back large-scale attacks, spam or flooding via ActivityPub as well as time to just smack the "Defederate" button as soon as they start to misbehave and, optionally, jettison the garbage data that caused the need for Defederation as well.
I don't have experience with systems like this, but just as sort of a fusion of a lot of ideas I've read in this thread, could some sort of per-instance trust system work?
The more any instance interacts positively (posting, commenting, etc.) with main instance 'A,' that particular instance's reputation score gets bumped up on main instance A. Then, use that score with the ratio of votes from that instance to the total amount of votes in some function in order to determine the value of each vote cast.
This probably isn't coherent, but I just woke up, and I also have no idea what I'm talking about.
Something like that already happened on Mastodon! Admins got together and marked instances as "bad". They made a list. And after a few months, everything went back to normal. This kind of self organization is normal on the fediverse.
Well, you see Kif, my strategy is so simple an idiot could have devised it: reputation is adjusted by "votes" so that other users can up or downvote another.
As mentioned: It's not the silver bullet solution but something that raises the bar for abuse.
The reputational score is build up over time on the specific server based on the up- and downvotes you received.
So, yes, this can be abused itself as well - but it requires a lot more effort.
I wonder if it's possible ...and not overly undesirable... to have your instance essentially put an import tax on other instances' votes. On the one hand, it's a dangerous direction for a free and equal internet; but on the other, it's a way of allowing access to dubious communities/instances, without giving them the power to overwhelm your users' feeds.
Essentially, the user gets the content of the fediverse, primarily curated by the community of their own instance.
Voting does allow the cream to rise to the top, which is why reddit was much better than a forum.
Honestly, I think part of the problem is that companies don't have an incentive to fight bots or spam: higher numbers of users and engagement make them look better to investors and advertisers.
I don't think it's that difficult of a problem to solve. It should be quite possible to detect patterns between real users and bots.
I keep thinking about this. The only reason for votes that a forum cant do, is filtering massive content quantities through an equally massive userbase to get pages of great and revolving posts. In a forum you can just filter with comments/hour and give free promotion to new posts.
I like upvotes, otherwise I’d have stayed on forums. It’s also one of the only ethical algorithmic sorting methods as long as you can whitelist your members.
I’ve always wondered if it would help to have to reply in order to give an up/downvote but I assume it would likely just result in more spam. Still, I hope people are thinking of new ways to try things
Chatgpt is for chatting, you’re talking about regular ol machine learning. I imagine you could use one of OpenAIs other ai models that support data insights rather than simple text generation
I would imagine this is the same with bans I imagine there will be a future reputation watchdog set of servers which might be used over this whole everyone follows the same modlog. The concept of trust everyone out of the gate seems a little naive
Here’s an idea: adjust the weights of votes by how predictable they are.
If account A always upvotes account B, those upvotes don’t count as much—not just because A is potentially a bot, but because A’s upvotes don’t tell us anything new.
If account C upvotes a post by account B, but there was no a priori reason to expect it to based on C’s past history, that upvote is more significant.
This could take into account not just the direct interactions between two accounts, but how other accounts interact with each of them, whether they’re part of larger groups that tend to vote similarly, etc.
What if account B only ever posts high quality content? What if everybody upvotes account B because their content is so good? What if they rarely post so it would be reasonable that a smaller subset of the population has ever seen their posts?
Your theory assumes large volumes of constant posts seen by a wide audience, but that's not how these sites work, your ideal would censor and disadvantage many accounts.
If an account is upvoted because it’s posting high-quality content, we’d expect those votes to come from a variety of accounts that don’t otherwise have a tendency to vote for the same things.
Suppose you do regression analysis on voting patterns to identify the unknown parameters determining how accounts vote. These will mostly correlate with things like interests, political views, geography, etc.—and with bot groups—but the biggest parameter affecting votes will presumably correlate with a consensus view of the general quality of the content.
But accounts won’t get penalized if their votes can be predicted by this parameter: precisely because it’s the most common parameter, it can be ignored when identifying voting blocs.
I wonder if an instance could only allow votes by users who are part of instances that require email verification or some other verification method. I would imagine that would heavily help reduce vote manipulation on that particular instance.
This alone wouldn't help because I can just set up an instance that requires email verification (or any other kind) and automate it still since I can make infinite emails with my own domain.
Votes are just a number that determine what everybody sees. This will be manipulated by all the bad actors of this world once Lemmy becomes mainstream. Politicians, dictators, Hollywood, tech companies....
But that's kinda the point of all posts. You post because you want people to see something and you want your post to be popular so it can be seen by the largest amount of people.
That can only be done after the fact, and people can just create new ones constantly can they not? There needs to be a different pro-active defense to watch for the signs of manipulation and counter them as they happen.
Whitelist federation is one strategy. Rather than defaulting to federation with every instance a proactively moderated instance would only federate with approved requests.
Wouldn't a detection system be way better? I can see a machine learning model handling this rather well. Correlate the main accounts to their upvoters across all their posts and create a flag if it returns positive. It would be more of a mod tool, really.
I have already ran into a very obvious Russian troll factory account and it really drags down the quality of the place. Freedom of speech shouldn't extend to war criminals and I'd rather leave any clusterfuck that allows it, whether they do it through will or incompetence.
If we stop spam accounts from brand new or low usage servers those could both be easily mailed (emulated activity, pre-create instances and let them marinate)
I don't know much about how making new instances works, but could someone create instances in large qualities with smaller populations with the goal of giving human moderators too much work to defederate them all?
There are legitimate reasons for creating a “low-usage” server to host your personal account, so you have full control over federating etc.
If we start assuming all small instances are spam by default, we’ll end up like email now—where it’s practically impossible for small sites to run their own mail servers without getting a corporate stamp of approval from Google.
This would actually be a bit more difficult. So first it would be easy for me to set up lemmy1.derproid.com, lemmy2.derproid.com, etc. but if you could just defed from *.derproid.com it's no problem. However setting up lemmy1.com, lemmy2.com, etc. is more expensive because you would need to register and pay for each of those domains individually.
That's not to say it's impossible but there is a bigger barrier to it.
I agree, but it's also worth keeping in mind that a bot swarm approach could be much more distributed. There used to be a guy on the Fediverse that set up "relay accounts" on many, many instances with public signups, prior to hooking them all together with a single app and making them spit out torrential fountains of garbage.
It is 100% possible to abuse other people's public services to make remediation more complicated. Blocking a bad instance or a series of bad instances is easy. Dealing with a run-away spam problem from dozens of friendly servers is way harder.
This blog post is fantastic! It's packed with valuable insights and actionable advice. Thanks for sharing such an informative and well-written article.
buy Linkedin Connections
You don’t. Ranked content is a solution for owners of social media platforms to avoid paying moderators. It’s a no brainer if you want a cheap automatic advertising platform but isn’t great and requires constant intervention if you’re not monetizing somehow.