Yes if you are solving the exact problem that the original code solved and that original code was labeled as solving that exact problem then that’s very good reason for the LLM to produce that code.
Researchers have shown that an LLM was able to reproduce the verbatim text of the first 4 Harry Potter books with 96% accuracy.
> that an LLM was able to reproduce the verbatim text of the first 4 Harry Potter books with 96% accuracy.
Kinda weird argument, in their research (https://forum.gnoppix.org/t/researchers-extract-up-to-96-of-...) LLM was explicitly asked to reproduce the book. There are people that can do so without LLMs out there, by this logic everything they write is a copyright infringement an every book they can reproduce.
> Yes if you are solving the exact problem that the original code solved and that original code was labeled as solving that exact problem then that’s very good reason for the LLM to produce that code.
I think you're overestimating LLM ability to generalize.
The point about Harry Potter was just that the verbatim text for popular text in the training set is in there.
It’s the same as when you ask a model to generate an Italian plumber with overalls and it produces something close enough to Mario to be a copyright violation.
If you ask it to solve a very specific problem for which there is a solution well represented in its train set, you can definitely get back enough verbatim snippets to cause problems.
It’s also not a theoretical problem, you can Google for studies showing real world production of verbatim code with non-adversarial prompts.
This is not an argument against coding in a different language, though. It would be like having it restate Harry Potter in a different language with different main character names, and reshuffled plot points.
If you find a single paragraph that is a direct translation with different names that’s definitely enough for copyright infringement.
Reshuffling plot points is doing a lot of lifting here. Just looking at a specific chapter near the end of the book, if you change the the order of the trials, change the names, and translate it into a different language, you’re still going to have a very hard time arguing that what you’ve produced isn’t a derivative work.
Researchers have shown that an LLM was able to reproduce the verbatim text of the first 4 Harry Potter books with 96% accuracy.