• teawrecks@sopuli.xyz
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    6 months ago

    Oh I see, you’re saying the training set is exclusively with yes/no answers. That’s called a classifier, not an LLM. But yeah, you might be able to make a reasonable “does this input and this output create a jailbreak for this set of instructions” classifier.

    Edit: found this interesting relevant article

    • sweng@programming.dev
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      LLM means “large language model”. A classifier can be a large language model. They are not mutially exclusive.