Lemmy requires active users to manually search for communities and discover content. Instances can choose to defederate from other instances, but I want instances that block as few other users as possible so I can decide for myself what content I see.

I want to add a column to this script to analyze Lemmy instances and identify communities that have high user activity but low blocking of users.

Initially I was thinking of adding a column that calculates the ratio of:

(active users) / (total blocked users)

However, this runs into a divide by zero error if there are no blocked users.

I’ve thought of a few ways to handle the ZeroDivisionError case, but there could be a better metric entirely that avoids this issue or gives a good measure of high activity + low blocking.

Does anyone have ideas for a better metric or ratio to use here?

Some context on what the data looks like:

  • “active users” = number of active users in the past month
  • “total blocked users” = sum of active users from all instances blocking or being blocked by this instance

Let me know if you have any suggestions! I’m open to different formulas or metrics beyond a simple ratio.

Appreciate any help!

  • pe1uca@lemmy.pe1uca.dev
    link
    fedilink
    arrow-up
    9
    ·
    8 months ago

    I want instances that block as few other users as possible so I can decide for myself what content I see.

    Then you want to selfhost, otherwise you’ll always be at the will of someone else to decide which instances they want to federate with.

    Even then, you’ll still want to have in mind instances known for spam, bots, or shady content have been blocked.

    • lysdexic@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 months ago

      Even then, you’ll still want to have in mind instances known for spam, bots, or shady content have been blocked.

      Is there a list tracking these instances?

      • Ategon@programming.devM
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        8 months ago

        Fediseer tends to be what most people use to track and list this sort of thing (and its whats used for this instance)

        https://gui.fediseer.com/instances/censured?page=1

        A lot of blocks people have are of mastodon instances as well so if youre only interested in thread content that probably needs to be taken into account as well since mastodon instances tend to have more people than lemmy instances

  • MagicShel@programming.dev
    link
    fedilink
    arrow-up
    7
    ·
    edit-2
    8 months ago

    It’s hard to say what algorithm would serve you better. Seems like this does what you are seeing it to do. It’s not how I’d do it, but I don’t prioritize unblocked users. To fix this, I’d assign a multiplier for zero blocked users. It might be one so that no blocked users is the same as one mathematically. But maybe free speech is so important to you that you give it a multiplier of 2 or wherever.

    active_users / [MAX(1,blocked_users)]

    This would be a multiple of 1. Change the 1 to a 0.5 for a multiple of 2.

    I didn’t look at the script but it probably has a Max function which just returns the higher of two numbers, effectively putting a lower bound on the possible values.

  • odium@programming.dev
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    8 months ago

    Just add 1 to the denominator.

    Simple is best.

    The max(1,total_blocked) method will make instances with 1 blocked and 0 blocked appear to be equal.

    • dumples@kbin.social
      link
      fedilink
      arrow-up
      3
      ·
      8 months ago

      Also to note if you don’t want significantly change the proportions add 1 to both top and bottom. It’s going to remove the divide by zero error and won’t significantly alter ratios. It’s used often in data science to avoid this problem