Skip Navigation

Seeking feedback: how should lemm.ee move forward with external images? (related to frequent broken images)

Hey folks!

I am looking for feedback from active lemm.ee users on what you all value when it comes to images on Lemmy. I'll go into a bit of detail about what our options are, and then I would ask you to voice your opinion about the issue in the comments.

First, some context for those who don't know. Lemmy software can be configured to handle images in three different ways:

  1. Store images locally - whenever an external image is posted somewhere, lemm.ee will download a permanent local copy. When you view posts, you are seeing our local copy of the image.
  2. Proxy all images - similarly to the first option, lemm.ee will download a local copy of external images, however, this copy is temporary. It will be automatically deleted shortly after, and if users open the relevant post/comment again in the future, there will be another attempt to download a temporary copy at that point.
  3. Pass through external images directly - lemm.ee never downloads any external images, users will always connect directly to the source servers to load the images.

There are pros and cons to each configuration.

Storing images locally

Benefits:

  1. Your IP address is never leaked to external image hosts, as you never connect directly to the source server. External image hosts only see the IP address of the lemm.ee server.
  2. External servers don't become bottlenecks for opening lemm.ee posts. If an external server is slow, it won't matter, because the image is always available locally

Downsides:

  1. As time goes on, our storage will fill up with hundreds of gigabytes of useless images, most of which will never be viewed again after the relevant posts fall off the front page.
  2. Many big external image hosts will rate limit bigger Lemmy servers, causing broken images when we fail to make a local copy.
  3. Crucially: some people love to spend their time uploading illegal content to online servers. There are tools to try and filter out such content, but these are not perfect. The end result is that there is a high chance of some content like this inadvertently reaching lemm.ee storage and staying there permanently. This downside is why lemm.ee has not, and will not, use this particular configuration.

Proxying images

Benefits: In addition to the same benefits as exist for the permanent local storage, by only temporarily making local copies for the moment they are requested by our users, we free up a ton of storage & remove the risk of permanently storing illegal content on our servers.

Downsides: The key downside is that external rate limits hit us much harder, as we will be requesting external images far more often. This results in a lot of constant broken images on lemm.ee.

Passing through external images

Benefits:

  1. Images are rarely broken, unless the source server goes down.
  2. The images never touch our servers, removing a lot of risk with illegal content as well as with storage costs.

Downsides:

  1. Our users lose a degree of privacy. Every external image that is loaded on your browser will result in the remote server getting a request directly from your computer to fetch that image - this is pretty much the same as you had visited that external server directly, which lets them log your IP address if they wish.
  2. When remote servers are slow, it can slow down the entire page load in some cases.

Current situation

Initially, lemm.ee was using the third option of passing through images. Ever since support for option 2, image proxying, was implemented in Lemmy code, we immediately switched to that option, mainly for the privacy benefits. However, after many months, and being blocked by more and more external servers, it is clear that image proxying is seriously degrading the user experience on lemm.ee. We often end up with broken images, and our users have to deal with the results.

I still believe image proxying is a really valuable feature, but I am starting to believe it is a better fit for small instances which make much less requests to external servers.

As a result, I am now seriously considering switching back to the previous method of passing through external images.

This is where you come in - I would ask you as users to please let me know which do you value more: the privacy that you get from image proxying, or the better user experience you get from directly passing through images from their source. Please let me know in the comments how you feel. If I get enough feedback about people being against image proxying, then I will be switching it off for lemm.ee soon. Thanks for reading & sharing your thoughs, and I hope you have a great weekend!

77 comments
  • Not a lemm.ee user, but here's my thoughts on #2 since it affects me via federation:

    I am not a fan of how Lemmy chose to implement image proxying Specifically, federating the proxied URL.

    That frequently prevents my instance from fetching a thumbnail locally (option 1 above). Which, ironically, increases the load on your server as my instance has to fetch it from your proxy every time instead of just once to generate a local copy here.

    From a UI development standpoint, the proxied thumbnail URLs also make it harder to detect the image type (gif, static image, video) to handle rendering. It also complicates other proxying/caching methods I have in place. Ultimately, in the UI I develop, I've had to resort to passing thumbnail images through a function to un-proxy them so they can be handled sanely.

    So I generally wish that admins avoid Lemmy's proxying until it no longer federates the proxied URL and does something sane like just return that for the local API calls.

  • I like 3 also. if you can't get a good user experience privacy matters less because there are no users, and anyone can spin up a super privacy enhanced instance if they want.

    I was thinking however that a super stand alone image server would be a great thing for Lemmy, and pixelfed and the Fediverse in general.

    Imgur blew up and was the go to for image hosting on Reddit for years, until Reddit realized it was leaking traffic and users to them and started their own. But hosting images has a lot of potential headaches like copyright violations and big corps suing you into oblivion, in addition to inadvertently hosting illegal stuff. The Fediverse will need some good image hosting servers and video hosting servers as part of the plan in the long run though.

  • I'm in favour of Option 3, privacy concerns considered.

    User experience is big for me here, the broken images are something of a frustration that I've been dealing with for a while now, so the option to combat that is a clear winner for me.

    Also, I want to thank you for coming to us for feedback, yet another reason I'm glad I decided to settle here on Lemm.ee.

  • Thanks for your hard work as always.

    I'm in favor of moving away from proxying. Too many images break and proxying in general is very wasteful, having to download images from potentially small servers constantly would definitely get you ratelimited.

    Passing through external images is OK. Many people often post external links anyways to sites like imgur and catbox because of the file size limits anyways.

    I think the end goal would always to store images locally though - or at least caching them for extended periods of time. Don't large instances like Lemmy World and huge Mastodon instances work this way? How do they manage the risk?

  • I think it's important for us to be mindful of content retention for posterity's sake if we want Lemmy to compete with Reddit long-term. If possible, I'd hope we can avoid dead image links like we see with old forums and photobucket pics, for example.

  • Option 3.

    Privacy concerns aside, which I am willing to bear, Option 3 is the most sustainable option.

    Looking at our status page, the projected monthly expenses is greater than the revenues. If passing through external images allows us to reduce operational costs and ensure lemm.ee's sustainability despite the loss of a degree of privacy, then it's a tradeoff I'm willing to make.

    Thanks, admins and mods, for everything that you do!

  • Option 3

    Reasoning:

    • Upside 2: 100% best for lemm.ee health; lowest legal risk, lower cost to run.
    • Downside 1: I think it comes down to what lemm.ee is trying to provide as a user experience; in my use and expectation, it's not for masking my IP, making me anonymous or similar. It's for reading and interacting with people, looking at memes and reading lots of news stories. I have no expectation my IP is masked from remote sites - I open all external news links in a Private tab anyways (to stop cookies and other junk) so they're already getting my IP anyway. "why should images be any different, really?" There are other lemmy instances out there catering to extreme privacy.
    • Downside 2: this could be, should be, whatever handles by better page loading threading in the code; the content surrounding an image is just HTML, the load of the image is a secondary task. If the rendering of the view of the page is reliant upon 200 OK image loads, that feels like a deficiency in design and it needs to be async threaded to "lazy load" and not block.

    At a high level, many other solutions - Mastodon, even Nostr webapps and phone apps which is all about being anonymous for some folks - do direct content load from the source and do not proxy loading. The switch back to option 3 falls in line with what every other generic service/solution does in the social web space.

  • Can't you store them in a cache that keeps images that have been accessed in the last 48 hours (or whatever) and deletes others? Should someone request these images after that, cache them again for 48 hours.

  • I think proxying is very important else anyone can simply upload an iplogger or possibly more advanced fingerprinting image. This in combination with observing federated actions will make it very easy to deanonimise almost every single user who interacts in any way even upvoting.

    Can u simply increase the time period that nginx caches images for to avoid some of the rate limiting issues? Otherwise perhaps using proxy lists to proxy requests from lemm.ee to the image hosts is doable (im not sure about the legality of this tho).

    Have u emailed the image hosts letting them know what u do and asking if they can remove ur rate limit (idk if they would be receptive to this without a financial incentive).

  • The only issue I have is catbox being broken on my phone. If proxying fixes that then I'd support that one. We don't need to keep the image on our servers forever I think.

  • Frankly, one of the big reasons why I like the Fediverse is that you don't have to depend on the source keeping up their stuff - we get our own copies. That includes both the text (the posts) and the media (the images), and to me, that's one third of the point of the entire thing. If I wanted an image that can only be seen at the dumpster cage that is Twitter, I'd go over there shrug and move on.

    Of course that's not always workable because storage is (despite everything) not cheap, medias grow large (I can eg.: understand saving pictures, but heaven forbid trying to save a video) and there's still not a good way to deal with """problematic""" storage. So, my recommendation and expectation would be something that functions like Option 2: Proxying Images. Basically, we download our own copy but only store it "while it matters".

    How to Supplement Options

    Now, maybe some proposals I would lift to have their feasibility studied. Any combination of one or more of these could, if implemented, help in enhancing or even supplantting the chosen method for storing remote media. I personally see them more as a means to enhance Option 2.

    • Only save copies of images (perpetual or proxy) that are below certain thresholds: file size, resolution, trusted hosts, etc... Sure, that still means you have to download every image at least once to evaluate it, but at least we get some sort of automated guarantee that for "easy" stuff, we won't be filing more connections than necessary. The big win of this option is that lemm.ee is not paying for the larger resource cost of fetching larger images after their post is made. I'm guessing the main drawback would be the bikeshedding required to decide which images get saved locally.
    • Related option to the above: only save (perpetual) thumbnails or smallened versions of images, never the real ones. Would pretty much instantly cover the case of eg.: most memes. The big drawback I can see to this is managing those images for deletions would be harder unless a Lemmy instance can keep a searchable map of hashes from each image to their thumbnail and vice versa.
    • Require that any image is linked from an imagehost or filehost that is trustable for durability. Big drawback: notoriously more effort (and potential loss of privacy) for users means this actually disincentivizes posting rich content.
    • Pool up resources and get some cooperative work going with some other instance(s) to set up a shared proxy agent that downloads the images for us, so that lemm.ee doesn't have to host the images but can have the clients fetch them without sacrificing privacy. I feel this one incentivizes posting rich content because you can get some level of assurance that it'll remain available and be "cheap" to access from across the Fediverse, but requires more instances to chip in.
    • Images? Pfff. Text is where it's at.
  • I am probably one of the power user on Lemm.ee who post a lot of Images.

    I really like 1st option but It's not financially feasible. So, I am choosing 2nd option. I used to post on catbox.moe but lemm.ee get rate limited by catbox lately. So, I choose a paid image hoster. It's actually screenshot hoster but I am using it like a unlimited image hoster xD don't know if it's allowed or not. 3rd option can leak IP addresses. So, no.

    TL;DR I like 2nd option. But whatever you do. I am with you.

77 comments