Representative take:

If you ask Stable Diffusion for a picture of a cat it always seems to produce images of healthy looking domestic cats. For the prompt “cat” to be unbiased Stable Diffusion would need to occasionally generate images of dead white tigers since this would also fit under the label of “cat”.

  • self@awful.systems
    link
    fedilink
    English
    arrow-up
    9
    ·
    1 year ago

    leading to the obvious question: if putting the words “best, good, high quality” in your generative AI prompt isn’t a placebo, then why is all the AI art I’ve seen absolute garbage

    • froztbyte@awful.systems
      link
      fedilink
      English
      arrow-up
      10
      ·
      1 year ago

      I forget where I saw it, but the phrase/comparison stuck with me and I think of it often: all of this shit is a boring person’s idea of interesting

      but the “just slap some prompt qualifiers on it (to deal with the journalists)” …god. it is of course entirely unsurprising to have an orange poster be so completely assured of their self-correctness to not even question anything, but the outright direct “just dress it up in vibes until they shut up”

      you just have to wonder what (and who?) else in their life they treat the same way

      • 200fifty@awful.systems
        link
        fedilink
        English
        arrow-up
        7
        ·
        edit-2
        1 year ago

        a boring person’s idea of interesting

        Agh this is such a good way of putting it. It has all the signifiers of a thing that has a lot of detail and care and effort put into it but it has none of the actual parts that make those things interesting or worth caring about. But of course it’s going to appeal to people who don’t understand the difference between those two things and only see the surface signifiers (marketers, executives, and tech bros being prime examples of this type of person)

        ETA: and also of course this explains why their solution to bias is “just fake it to make the journalists happy.” Why would you ever care about the actual substance when you can just make it look ok from a distance

        • froztbyte@awful.systems
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 year ago

          it was from a post around the time when dall-e and such were first catching social hype, I think. iirc the article was touching specifically on the output product of visual generators

          if I find the article again I’ll link it

    • datarama@awful.systems
      link
      fedilink
      English
      arrow-up
      6
      ·
      1 year ago

      Ah, but have you considered how much worse they could be if they weren’t prompted with “high quality, masterpiece, best”?

      • self@awful.systems
        link
        fedilink
        English
        arrow-up
        7
        ·
        1 year ago

        all of my generative AI results have been disappointing because I didn’t give it the confidence it needed to succeed

        • datarama@awful.systems
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          1 year ago

          One of the reasons I dislike this technology so much is that some of the ridiculous tricks actually (sometimes, sort of) work. But they don’t work for the reasons the interface invites the user to think that they do, they don’t work reproducibly or consistently, so the line between “getting large neural networks to behave requires strange tricks” and pure cargo-cult thinking is blurred.

          I have no idea what exactly went into the training sets of Midjourney (or DALL-E), except that it’s probably safe to assume it’s a set of (image, text) pairs like the open source image generators. The easy thing to put in the text component is the caption, any accessibility alt-text the image might have, and whatever a computer vision system decides to classify the image as. When the scrapers appropriate images from artists’ forums, personal webpages and social media accounts, they could then also scrape any comments present, process them and put some of them into the text component as well. So, it’s entirely possible that 1. there are some of the images the generator saw during training that had “masterpiece”, “great work” etc. in the text component, and 2. there is a statistically significant correlation between those words being present in the text, and the image being something people like looking at. So, when the generator is trying to pull images out of gaussian noise, it’ll be trying to spot patterns that match “masterpiece-ness” if prompted with “masterpiece”. Clearly this doesn’t work consistently - eg. if the generator has never seen a masterpiece-tagged painting of a snake, it’s not at all obvious that its model of “masterpiece-ness” can be applied to snakes at all. Neural networks infamously tend to learn shortcuts rather than what their builders want them to learn.

          Even then, most of it still looks like the result of a mugging in the Uncanny Alley. There’s almost always something “off” about it, even when it is technically impressive. Details that make no sense, weird lighting, shadows and textures, and a feeling of “eeriness” that I’d probably have the vocabulary to describe if I were a visual artist.

          (PS: Does the idea of using well-intentioned accessibility features and kind words to artists to create a machine intended to destroy their livelihood make you feel a bit iffy? Congratulations, you are probably not a sociopath.)