Skip Navigation

Lemmy Federation Architecture Change Proposal

https://github.com/LemmyNet/lemmy/issues/3245

I posted far more details on the issue then I am putting here-

But, just to bring some math in- with the current full-mesh federation model, assuming 10,000 instances-

That will require nearly 50 million connections.

Each comment. Each vote. Each post, will have to be sent 50 million seperate times.

In the purposed hub-spoke model, We can reduce that by over 99%, so that each post/vote/comment/etc, only has to be sent 10,000 times (plus n*(n-1)/2 times, where n = number of hub servers).

The current full mesh architecture will not scale. I predict, exponential growth will continue to occur.

Let's work on a solution to this problem together.

41 comments
  • But, just to bring some math in- with the current full-mesh federation model, assuming 10,000 instances-

    That will require nearly 50 million connections.

    Each comment. Each vote. Each post, will have to be sent 50 million seperate times.

    Well your whole premise is just utterly wrong.

    The way federation actually works:

    A user on lemmy.ml subscribes to a community on lemmy.world. Say, !funny@lemmy.world

    Assume that this user is the first lemmy.ml user to do so - basically what happens is the lemmy.world community sees that a member of a never before seen instance just subscribed. !funny@lemmy.world then adds lemmy.ml to its list of instances it needs to tell whenever something happens in the community.

    No matter how many users of lemmy.ml subscribe, this only happens once.

    Now when a user of sh.itjust.works upvotes a post on !funny@lemmy.world, the sh.itjust.works instance then tells !funny@lemmy.world of this change. It accepts the change, then tells everyone on its list of instances that have subscribers on them.

    So essentially, sh.itjust.works talks to lemmy.world, lemmy.world tells everyone else. There is no "full mesh". The instance hosting the community is the "hub", everything else is a spoke.

    So if there's 10,000 instances, and they all just so happen to have at least one subscriber to some community, each change will be sent out 9,999 times. Your "50 million" premise is just completely wrong and I'm not sure where it's coming from.

    • Its not wrong- we just have opposite ideas here-

      The 50 million, is based on the formula for a full-mesh network. Where all instances talk to each other. In the case of lemmy, this would be an absolute worst-case scenario, where every instance, is subscribed to a community on every other instance.

      In your example of only 10,000 messages, you are assuming that of the 10,000 instances in existence, they are ONLY looking at a single community, on a single server.

      Lets say, those 10,000 instances all decide to look at a community on another server. Now you have 20,000 connections.

      Lets add another community, hosted on yet another instance. That is 30,000 connections.

      TLDR;

      My example, is based on worst-case scenario. (A pretty unachievable one at that!)

      Your example, is based on best-case scenario.

      Realistically, the actual outcome would be somewhere much closer to best-case scenario(As communities seem to lump up on the big servers). However, for planning architecture, you always assume worse-case scenario.

      • No - you said:

        Each comment. Each vote. Each post, will have to be sent 50 million seperate times.

        That won't ever happen. Unless there's 50 million instances. That's not worst case, it's just not a case.

        There is no case in the current implementation where any one action is replicated more times than there are total instances.

        And it doesn't matter what "model" you assume, each action will have to federate to each instance eventually. That count is minimally, the total number of instances.

        Lets say, those 10,000 instances all decide to look at a community on another server. Now you have 20,000 connections.

        Looking does nothing, each instance hosts essentially a copy of the "host instance" for each community. Only interactions (comments, likes, posts, etc) are federated.

  • Activities aren't sent on every "connection" in the network in the current model. There isn't indirect transmission nor polling so even though there's a theoretical 50 mil connections in the scenario you gave, any one activity will already only be sent up to 10k times. That's why instances require TLS and being internet accessible, so they can receive direct communication. I agree with you that there's some difficult scaling issues with federation but your representation of it is inaccurate.

  • Other people in the thread have already made this point: even with a full mesh network, the number of remote calls made for a single activity is equal to the number of instances subscribing to that activity (plus one if the activity originates from an instance that's not the host of the activity).

    A hub/spoke model doesn't change this, it just moves the load from the host instance to the hub. The number of connections is still the same: if N instances need to receive the activity, N calls will have to be made. If anything this adds 1 more call from the host instance to the hub.

    Even peer-to-peer distribution of activities, mentioned by @hazelnoot@beehaw.org, wouldn't actually change the amount of calls being made. You still have N servers that have to receive the activity, so you need at least N calls overall. What this would do is redistribute the load better over instances, so the host doesn't have to make all N calls. It would definitely be an improvement, but it would not be easy to implement successfully, and it would almost surely break ActivityPub compatibility.

    The only thing I can think of that would actually reduce the overall network load, though, is batching: sending multiple activities/updates together in a single message. AFAIK this is not supported by ActivityPub, though, so implementing it would mean breaking compatibility, and also implementing an entirely updated version of the protocol (which is a massive undertaking).

41 comments