1y ago

YSK: Your Lemmy activities (e.g. downvotes) are far from private

I was unsure where to cross-post this. But maybe we should discuss this to make sure Lemmygrad users are staying safe? Similar to the unspoken rule that we strongly discourage people using their real names or giving away too many personal details.

cross-posted from: https://mylemmy.win/post/89871

Edit: obligatory explanation (thanks mods for squaring me away)...
What you see via the UI isn't "all that exists". Unlike Reddit, where everything is a black box, there are a lot more eyeballs who can see "under the hood". Any instance admin, proper or rogue, gets a ton of information that users won't normally see. The attached example demonstrates that while users will only see upvote/downvote tallies, admins can see who actually performed those actions.
Edit: To clarify, not just YOUR instance admin gets this info. This is ANY instance admin across the Fediverse.

The Andromedus Galacticus Collection @lemm.ee

Cross-Post lemmy.world

YSK: Your Lemmy activities (e.g. downvotes) are far from private -

46 12

You Should Know @lemmy.world

YSK: Your Lemmy activities (e.g. downvotes) are far from private

3401 1208

28 comments

Well, this is a bit uncomfortable. I wasn't aware that votes were shared as well; IIRC an admin said that the occasional downvote brigades we've had couldn't be reversed without losing posts, so I assumed the votes were anonymized somehow 😐
I wonder how feasible it would be to only send an anonymous vote count for every post/comment to other instances, keeping the details within the instance's database only. Federation is already "broken" when an older database has to be restored or even when someone is banned from an instance, so it doesn't seem to be fundamentally necessary for all instances to match
More importantly, is more sensitive data also shared? I would hope that IP addresses, which posts you've viewed, etc. aren't stored anywhere, or at least not forwarded to other instances
- That sounds like a good solution to me.
  Just knowing that any such data is collected or shared, and when, could help to improve one's privacy.

There's a lot of info and discussion on this post that explains why. Pretty much that voting has never been private on other platforms as votes must be tied to users, otherwise users could add more than one vote per post. And this data must also be federated so that other instances' posts are also safeguarded.
Lemmy isn't designed as a privacy platform, it's a socia media type link aggregator powered by ActivityPub. And with this federation brings decentralization, where it's possible to not share data with other instances, but it will have to be shared in some way with any linked instances. There are pros/cons to each style: the current issues with Reddit show the problems with centralization, and there's going to be an adjustment period as more people join Lemmy who don't already know about the Fedi.
- I see that, but I didn't vote on other platforms. I knew I'd be giving data directly to the owners and their five eyes operatives. I know this platform is public, so I'm careful with my words. Now I know I should be careful with voting, too.
  The people who want our details are incredibly creative at how to interpret data that seems innocuous.
  Edit: that link is very informative, thanks. I should confirm that I assume everything on the internet is public in one way or another and confirm that I don't have any major concerns with the general security architecture of the Lemmy software or the way Lemmygrad is run. I just thought it's something that we should talk about to make sure that we're not increasing the chances of being doxxed by giving away useful metadata.
  Edit 2: when I say, 'I don't have any major concerns with the general security architecture', people should know that I'm not qualified to judge this from the coding side of things!
  
  For sure, people definitely should be educated on what data is open (posts/comments), closed (voting on Lemmy as kbin seems to show them publically), "private" (DMs which are explicitly described as not private and to use Matrix etc. for actual encryption), or secure (Matrix). I feel like a lot of us on Lemmygrad are aware of privacy more than the average netizen, but it wouldn't hurt to have a primer for new users.
  I think for social media the best thing would just be compartmentalization of identities, so the usual advice of don't give away too much of who you are and keep usernames separate unless you want them to be connected/known.

"or giving away too many personal details" but hey! take our demographics survey!
- I see the irony. I suppose two differences are that, one, the survey was optional, and two, although participants might not necessarily know who was reading the answers, the answers weren't (I don't think) generally accessible.
  Voting data is not generally accessible, either, but it appears to be accessible to admins of other instances. If this is the case, it wouldn't be difficult to set up an instance, make a post through any account about a specific topic, and observing the data to see who up/downvotes it. This could narrow down the list of people that a bad actor might want to target for further data harvesting.
  Considering how many billions are put into surveillance, data collection, and controlling (social) media, I don't think we should discount this fear as farfetched. Even if, on its own, it's fairly benign data.
  I've also noticed that some of the new instances/users are a lot happier with 'local' communities. If this grows into, say, a list of 'things to do in my area' communities, it wouldn't take much for a fascist to identify local targets by a series of what I'm going to call 'voting traps'.
  Maybe I'm paranoid.
- Exactly why I didn't; I still don't know how much data flows through lemmy-actual's veins to be comfortable like that.
  
  The nice thing about these projects is that the community can review what kind of data is collected because it is FOSS. Obviously, things like votes, comments, DMs, posts, and posts marked as read are all logged in the DB and tied to your account. One could extrapolate a lot about a person by extracting and doing data analysis against that data. Because of how federation works, it also means you can't just rely on your instance operators to be trustworthy.
  DM's I think operate more like an Email message and likely are not federated in a way other instance hosts, other than the origin and destination hosts, can view. But the Origin host and the Destination host obviously could do a database query and pull your DMs. It should be noted, this is also true of Email unless you are using encrypted mail.
  At some point, DMs could be built to be end-to-end encrypted with PGP if the devs/community desire that, but that's not how it works now.
  I'm sure that as a Lemmy Instance operator, you can also use your host server to log connections (read, IP addresses), but I'm unsure if Lemmy itself logs that information in its database alongside your account information (probably not?). You would probably want to log the connections as good operators, so you can find patterns and remove bad actors trying to say DDOS your box.
  However, if more robust moderation tools were to be implemented, which include an IP-based ban, then that would have to be tied to your account to make it work.
  There are platforms like Nostr for example, where everything is encrypted, even the primary content, and you have to provide the system some kind of encryption key to even view the feed.

I don't see why anyone should care about votes, but if IP addresses were shared it would be concerning.
- It logs the timing as well, which could be sensitive data. For example if an employer were to gain access to this information and tie it to an account that someone thinks is anonymous, they'll know when you weren't working but getting paid to be at work. Or it could be used to determine when some is at home or out. Or be used as evidence for holding certain views.
  It's unlikely for a single employer to get that data. But I wouldn't put it past the five eyes to set up an instance, mine data, and use analysts/AI to cross-reference it with other user metadata.
  It's like J Sakai says in his security pamphlet. It's bad practice to give any information to feds, even 'benign' data because it helps them to build profiles on you and others. To offer a rather extreme example, if they know you were upvoting a comment at 16.07 EST, they can guess it wasn't you at a protest at the same time. This means they can narrow down the suspect list to the other handful of people with your build, etc, who go to protests. Being lax with data means the feds have an easier time undermining the efforts of people who are tight with their data.
  It's a privacy issue that I hadn't considered. I knew our admins could see this data. I didn't realise it was visible to other instances.
  
  Good points, I didn't consider all of that either.

Aaand this is particularly why I have a separation between my lefty lemmygrad account and my general use account. Don’t need to be witch hunted from other instances because of some liberal mod.

To be honest, as a dev very experienced in SQL, I'd love an opportunity to query info like this online, like SELECT-only permissions on a browser-based editor that removes any sensitive info before hitting the client. I'd love to play with this dataset and find cool trends. Am I weird?

28 comments