OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

  • Blapoo@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    We have to distinguish between LLMs

    • Trained on copyrighted material and
    • Outputting copyrighted material

    They are not one and the same

      • scv@discuss.online
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Legally the output of the training could be considered a derived work. We treat brains differently here, that’s all.

        I think the current intellectual property system makes no sense and AI is revealing that fact.

  • Technoguyfication@lemmy.ml
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    People are acting like ChatGPT is storing the entire Harry Potter series in its neural net somewhere. It’s not storing or reproducing text in a 1:1 manner from the original material. Certain material, like very popular books, has likely been interpreted tens of thousands of times due to how many times it was reposted online (and therefore how many times it appeared in the training data).

    Just because it can recite certain passages almost perfectly doesn’t mean it’s redistributing copyrighted books. How many quotes do you know perfectly from books you’ve read before? I would guess quite a few. LLMs are doing the same thing, but on mega steroids with a nearly limitless capacity for information retention.

    • Teritz@feddit.de
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      1 year ago

      Using Copyrighted Work as Art as example still influences the AI which their make Profit from.

      If they use my Works then they need to pay thats it.