I figured out how to remove most of the safeguards from some AI models. I don’t feel comfortable sharing that information with anyone. I have come across a few layers of obfuscation to make this type of alteration more difficult to find and sort out. This caused me to realize, a lot of you are likely faced with similar dilemmas of responsibility, gatekeeping, and manipulating others for ethical reasons. How do you feel about this?

  • j4k3@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    12 days ago

    Yeah. This is what I mean. I just figured out the settings that have been hard coded. There are keywords that were spammed into the many comments within the code, I assume this was done to obfuscate the few variables that need to be changed. There are also instances of compound variable names that, if changed in a similar way, will break everything, and a few places where the same variables have a local context that will likewise break the code.

    I’m certainly not smart enough to get much deeper than this. The ethical issue is due to diffusion.

    I’ve been off-and-on trying to track down why an LLM went from an excellent creative writing partner to terrible but had trouble finding an entry point. I just happened to stumble upon such an entry point in a verbose log entry while sorting out a new Comfy model and that proved to be the key I needed to get into the weeds.

    The question here, is more about the ethics of putting such filtering in place and obfuscating how to disable it in the first place. When this filtering is removed, the results are night and day, but with large potential consequences.

    • mark@programming.dev
      link
      fedilink
      arrow-up
      0
      ·
      12 days ago

      Ok you’ve peaked my curiosity.

      but with large potential consequences.

      What are some of the consequences you see?

      • j4k3@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        0
        ·
        12 days ago

        Primarily from predatory boys and men towards girls and young women in the real world by portraying them in imagery of themselves or with others. The most powerful filtering is in place to make this more difficult.

        Whether intentional or not, most NSFW LoRA training seems to be trying to override the built in filtering in very specific areas. These are still useful for more direct momentum into something specific. However, once the filters are removed, it is far more capable of creating whatever you ask for as is, from celebrities, to anything lewd. I did a bit of testing earlier with some LoRAs and no prompt at all. It was interesting that it could take a celebrity and convert their gender in recognizable ways that were surprising. I got a few on random seeds, but I haven’t been able to make that one happen with a prompt or deterministically.