Not a strong case for NYT, but I’ve long believed that AI is vulnerable to copyright law and likely the only thing to stop/slow it’s progression. Given the major issues with all AI and how inequitable and bigoted they are and their increasing use, I’m hoping this helps to start conversations about limiting the scope of AI or application.
A human brain is just the summation of all the content it’s ever witnessed, though, both paid and unpaid. There’s no such thing as artwork that is completely 100% original, everything is inspired by something else we’re already familiar with. Otherwise viewers of the art would just interpret it as random noise. There has to be some amount of familiarity for a viewer to identify with it.
So if someone builds an atom-perfect artificial brain from scratch, sticks it in a body, and shows it around the world, should we expect the creator to pay licensing fees to the owners of everything it looks at?
This comparison doesn’t make sense to me. If the person then makes money off it: yes.
Otherwise the question would be if copyright law should be abolished entirely. E.g. if I create a new news portal with content copied form other source, would that be okay then?
You are comparing a computer program to a human. Which… is weird.
Just because it’s weird to you doesn’t make it any less valid.
As a species we sit at the threshold of artificial life, created by us. Seems silly to think that such a monumental jump would not be accompanied by substantial changes in our made up rules of engagement.
Might be a fundamental difference in opinion. I don’t see us anywhere near anything related to artificial life.
What they’ve built there is a product, a computer program and they used other folks data to build it without getting their permission. I also cannot go and just copy and paste source code from all over the internet to build my program. There are licenses attached to it that determine what you can or can’t do with it.
I feel like just because the term “learning” is involved people no longer view it as simply building or programming a system. Which it is.
Every idea you’ve ever profited from was inspired by something you saw in the past. That’s my point. There are no ideas that exist entirely within a vacuum, they all stem from something else, we just draw a line arbitrarily and say “this idea is too much like that other idea”. But if you combine 3 other ideas into something that is sufficiently non-obvious (which is entirely relative) then we call it “novel” and “original”.
I think the line should probably be, either it’s a tool and you need to license any work it references, OR it’s conscious, has rights, gets paid, and is a person. I think most tech companies would much rather stay in the former camp, not having to answer any ethical dilemmas if they don’t have to. But on the other hand, the first company to make something that people consider actually “conscious” will make history.
You are comparing a computer program to a human. Which… is weird.
Sounds like you have about 100 years of philosophical discussion, AI research, and scifi to catch up on 😄.
I am so fucking sick of this “AI art is just doing what humans do" bullshit. It is so utterly devoid of any kind of critical thinking that it sounds like a 100% bad faith argument every time it comes up.
AI can only give you a synthesis of exactly what you feed it. It can’t use its life experience, its upbringing, its passions, its cultural influences, etc to color its creativity and thinking, because it has none and it isn’t thinking. Two painters who study and become great artists, and then also both take time to study and replicate the works of Monet can come away from that experience with vastly different styles. They’re not just puking back a mashup of Monet’s collected works. They’re using their own life experience and passions to color their experience of Impressionism.
That’s something an AI can never do, and it leaves the result hollow and meaningless.
There is so so so so so much more to human experience, life experience, and just being alive than simply absorbing “content.”
No offense, but I get the sense that you don’t actually know how ML works and you’re just familiar with pop science descriptions of it. Am I wrong?
It’s an incredibly bold claim to say that a human brain is doing something an AI could never do. That is a very antiquated notion, to the point that I would say it’s 100% devoid of any critical thinking.
Now if you’re arguing that there is a supernatural plane of some kind that cannot be measured in any way, and is fully responsible for our consciousness, then that’s a different story, there’s nothing I can say to change your mind.
There is so so so so so much more to human experience, life experience, and just being alive than simply absorbing “content.”
That’s the thing though, it’s all the same “content” to a living brain. Your brain doesn’t distinguish between your lived experiences and watching cat videos, the experience of watching those videos is also a lived experience.
I know it’s tempting to say humans (or living creatures) are special and unique in their ability to experience emotions and consciousness etc, but the reality is, you’re a biological machine. You take inputs via various senses, chemical reactions happen throughout your body, and the illusion of memory and experience is created. Now either prove to me that this phenomenon is not replicable in a lab or virtual setting, or get off your high horse and join the actual discussion that needs to happen.
A human brain is just the summation of all the content it’s ever witnessed, though, both paid and unpaid.
But copyright is entirely artificial. The deal is that the law says you have to pay when you copy a bunch of copyrighted text and reprint it into new pages of a newly bound book. The law also says you don’t have to pay when you are giving commentary on a copyrighted work, or parodying a copyrighted work, or drawing inspiration from a copyrighted work to create something new but still influenced by that copyrighted work. The question for these lawsuits is whether using copyrighted works to train these models and generate new text (or art or music) is infringement of those artificial, human-made, legal rights.
As an example, sound recording copyrights only protect the literal copying of a sound recording. Someone who mimics that copyrighted recording, no matter how perfectly, doesn’t actually infringe on the recording copyright (even if they might infringe on the composition copyright, a separate and distinct copyright). But a literal duplication process of some kind would be infringement.
We can have a debate whether the law draws the line in the correct places, or whether the copyright regime could be improved, and other normative discussion what what the rules should be in the modern world, especially about whether the rules in one area (e.g., the human brain) are consistent with the rules in another area (e.g., a generative AI model). But it’s a separate discussion from what the rules currently are. Under current law, the human brain is currently allowed to perform some types of copying and processing and remixing that some computer programs are not.
I agree with the summary of the situation in your first paragraph.
Your second paragraph about sound mimickry, as far as I’m aware, is not accurate. Musicians have been ordered to pay for much less than rote mimickry, even simple things like using the same melody or beat as a backing track have been ruled as infringement. In the US, at least.
And I agree with the 3rd paragraph.
So I believe my original question still stands: should an artificial brain be required to pay licensing fees for everything it sees?
I’m not familiar with the situation, but I imagine if Southpark went around suing people for using their stuff, people wouldn’t take them seriously. Virtually everything in Southpark relies on their abuse of Fair Use. Just because it IS infringement, doesn’t mean you have to sue them.
It looks like there are a few other tracks that Bobby Prince is responsible for that made it into Doom. In this interview he states that he made them for fun, labeled the files to not be used in the final game, and was surprised id even had copies. When Romero made the decision to include them in the game, Bobby (who is/was a lawyer apparently) says he was sure they would get sued.
The exclusive rights of the owner of copyright in a sound recording under clauses (1) and (2) of section 106 do not extend to the making or duplication of another sound recording that consists entirely of an independent fixation of other sounds, even though such sounds imitate or simulate those in the copyrighted sound recording.
So if I want to go record a version of “I Will Always Love You” that mimics and is inspired by Whitney Houston’s performance, I actually only owe compensation to the owner of the musical composition copyright, Dolly Parton. Even if I manage to make it sound just like Whitney Houston, her estate doesn’t hold any rights to anything other than the actual sounds actually captured in that recording.
So if someone builds an atom-perfect artificial brain from scratch, sticks it in a body, and shows it around the world, should we expect the creator to pay licensing fees to the owners of everything it looks at?
That’s unrelated to an LLM. An LLM is not a synthetic human brain. It’s a computer program and sets of statistical data points from large amounts of training data to generate outputs from prompts.
If we get real general-purpose AI some day in the future, then we’ll need to answer those sorts of questions. But that’s not what we have today.
The discussion is about law surrounding AI, not LLMs specifically. No we don’t have an AGI today (that we know of), but assuming we will, we will probably still have the laws we write today. So regardless of when it happens, we should be discussing and writing laws today under the assumption it will eventually happen.
To me there’s a bit of a difference because humans are not controllable and cannot (legally) be slaves. So in the case of this hypothetical artificial brain, that brain could leave and take the profits of it’s work elsewhere, with the creator no longer benefiting.
Yes, it will be interesting if a court is ever receptive to the notion that we’re creating something close to actual “consciousness”, because it does lead to ethical dilemmas very quickly.
I think it’s probably in the tech companies’ best interests to play along with the trademark requirements and treat the AI like a tool for as long as possible.
Yeah I’ve heard a lot of people talking about the copyright stuff with respect to image generation AIs, but as far as I can see there’s no fundamental reason that text generating AIs wouldn’t be subject to the same laws. We’ll see how the lawsuit goes though I suppose.
Neither are infringement. Artists attempting to bully platforms into not training on them doesn’t change the fact that training on information would be black and white fair use if it didn’t have absolutely nothing in common with copyright infringement. Learning from copyrighted material is not distributing it.
If the court doesn’t just ignore the law, which has nothing that could theoretically be interpreted to support the idea that training is infringement in any way, this case will be the precedent that sets AI training free.
And you, as an individual, should want that. Breaking the ability to learn from prior art is still literally guaranteed to disenfranchise the overwhelming majority of creators in all formats, because there are massive IP holders who have the data sets to build generative AI and produce unlimited “free” content, while no individual will be able to do the same because they’ll have nothing to train on. If you think Disney has a monopoly now, wait until they can train AI on 100 years of 95% of TV and movies and no one else can make AI.
Well I hear what you’re saying, although I don’t much appreciate being told what I should want the outcome to be.
My own wants notwithstanding, I know copyright law is notoriously thorny – fair use doubly so – and I’m no lawyer. I’d be a little bit surprised if NYT decides to raise this suit without consulting their own lawyers though, so it stands to reason that if they do indeed decide to sue then there are at least some copyright lawyers who think it’ll have a chance. As I said, we’ll see.
Copyright law does not prevent learning from copyrighted material. There is no potential infringement for fair use to be applied to. Nothing is being copied and shared.
If they’re suing, they’re doing it because they think they can manipulate a ruling that does not in any way follow the law and because the benefit if they can do so is huge, not because any intelligent rational human being can read the law and possibly interpret anything as infringement. It’s not ambiguous in any way. There can’t be infringement if you don’t distribute someone else’s work.
It seems like you’re working under the core assumption that the trained model itself, rather than just the products thereof, cannot be infringing?
Generally if someone else wants to do something with your copyrighted work – for example your newspaper article – they need a license to do so. This isn’t only the case for direct distribution, it includes things like the creation of electronic copies (which must have been made during training), adaptations, and derivative works. NYT did not grant OpenAI a license to adapt their articles into a training dataset for their models. To use a copyrighted work without a license, you need to be using it under fair use. That’s why it’s relevant: is it fair use to make electronic copies of a copyrighted work and adapt them into a training dataset for a LLM?
You also seem to be assuming that a generative AI model training on a dataset is legally the same as a human learning from those same works. If that’s the case then the answer to my question in the last paragraph is definitely, “yes,” since a human reading the newspaper and learning from it is something that, as you say, “any intelligent rational human being” would agree is fine. However, as far as I know there’s not been any kind of ruling to support the idea that those things are legally equivalent at this point.
Now, if you’d like to start citing code or case law go ahead, I’m happy to be wrong. Who knows, this is the internet, maybe you’re actually a lawyer specializing in copyright law and you’ll point out some fundamental detail of one of these laws that makes my whole comment seem silly (and if so I’d honestly love to read it). I’m not trying to claim that NYT is definitely going to win or anything. My argument is just that this is not especially cut-and-dried, at least from the perspective of a non-expert.
Electronic copies are copies. Copying the story is a copy. You need a license to copy someone’s work. You unconditionally do not need a license to learn from it and use that knowledge for any purpose you wish. There are no laws that could possibly be interpreted to require this.
Derivative works are copies using substantial portions of someone else’s original work. You need a license to adapt a book into a movie because you’re copying their whole story, characters, etc. You don’t need a license to tell a similar story from a similar idea because you are not. Literally everything that has been created in the past 10,000 years is built on the ideas of others. Everything is a derivative work if you think learning from an article is. You’re allowed to summarize copyrighted material and present your own interpretation of it to others. You’re allowed to do so commercially. It isn’t copying.
The New York Times owns their articles. They own their specific packaging of the facts inside. They don’t and unconditionally can’t own the facts themselves. Nothing they own is being copied. Having files in memory is not copyright infringement. It’s the literally guaranteed result of publishing anything digitally.
There is nothing that OpenAI is doing that any law in existence even loosely implies might need a license.
You don’t need a license to learn from a story, but if learning requires you first to make an enduring copy of the story on your laptop then you could be violating copyright.
And neural nets generally require a local enduring copy of their training data, which means they too could be violating copyright.
But there is no one learning from it. It serves as a building block / source material to build these LLMs. I feel like the fact that it’s called learning gives folks the impression that it’s similar to what a human would do.
“AI” isn’t intelligent, but that has literally zero relevance.
Seeing copyrighted material and forming takeaways does not in any way resemble copyright infringement. It’s not the fact that a human is doing so that matters. It’s the fact that no sort of analysis constitutes copying or copyright infringement.
But they aren’t forming take aways from it. They literally used that material to build this system. I also cannot just go around and take arbitrary data from anywhere and use it to build my own program. There are licenses attached to it and I have to be mindful of who’s work I can use to build my system and who’s I can’t use without explicit permission.
Building this system isn’t looking at other folks material and forming take aways from it. It’s literally using that material as input for building the system.
And yes, you absolutely can use entirely New York Times articles as research material to write your own article based on conclusions from them. You can’t outright copy paste their articles, but you can freely use information learned from their articles however the hell you want.
It’s the exact same thing. “AI” looks at their articles, integrates information, and does not retain the actual article. That has no similarity in any way to copyright infringement.
It is similar to what humans do. The principal difference is that the AI tech we have (as of yet) can’t learn how to learn: Those systems come with pre-determined rules to learn, we come with pre-determined rules on how to learn how to learn.
And yes AIs abstract the knowledge they get fed. What they have trouble with is not forgetting how to play soccer when learning how to cook spaghetti as without the capacity to learn to learn they can’t vary encoding of information between topics and everything gets mushed together, new information blindly overwriting unrelated old information.
Since the DMCA, just the circumvention of copyright protection measures is a crime. It’s stupid, but the point is that even if training AI on the data is completely legal, if the data was protected with something that is also used to protect copyright, and you needed to circumvent that to get access even for legitimate purposes, you’ve broken a law.
Copyright has been made obtuse and stupid and damaging to society by big IP holders, it’s just now there’s big corps on the “infringing” side too. This will get interesting.
I’m slightly optimistic. It might slow down the progression of those language models now, but I hope that it becomes a “benign disincentive” in the long run, forcing a shift from LLM to better models.
Not a strong case for NYT, but I’ve long believed that AI is vulnerable to copyright law and likely the only thing to stop/slow it’s progression. Given the major issues with all AI and how inequitable and bigoted they are and their increasing use, I’m hoping this helps to start conversations about limiting the scope of AI or application.
It’s pretty apparent that AI developers are training their applications using stolen images and data.
This was always going to end up in the courts.
A human brain is just the summation of all the content it’s ever witnessed, though, both paid and unpaid. There’s no such thing as artwork that is completely 100% original, everything is inspired by something else we’re already familiar with. Otherwise viewers of the art would just interpret it as random noise. There has to be some amount of familiarity for a viewer to identify with it.
So if someone builds an atom-perfect artificial brain from scratch, sticks it in a body, and shows it around the world, should we expect the creator to pay licensing fees to the owners of everything it looks at?
This comparison doesn’t make sense to me. If the person then makes money off it: yes.
Otherwise the question would be if copyright law should be abolished entirely. E.g. if I create a new news portal with content copied form other source, would that be okay then?
You are comparing a computer program to a human. Which… is weird.
Just because it’s weird to you doesn’t make it any less valid.
As a species we sit at the threshold of artificial life, created by us. Seems silly to think that such a monumental jump would not be accompanied by substantial changes in our made up rules of engagement.
Might be a fundamental difference in opinion. I don’t see us anywhere near anything related to artificial life.
What they’ve built there is a product, a computer program and they used other folks data to build it without getting their permission. I also cannot go and just copy and paste source code from all over the internet to build my program. There are licenses attached to it that determine what you can or can’t do with it.
I feel like just because the term “learning” is involved people no longer view it as simply building or programming a system. Which it is.
Seems a tad dramatic no?
deleted by creator
Because it’s parroted relentlessly by people that think it sounds right but refuse to scrutinize it because they want to call us luddites.
Every idea you’ve ever profited from was inspired by something you saw in the past. That’s my point. There are no ideas that exist entirely within a vacuum, they all stem from something else, we just draw a line arbitrarily and say “this idea is too much like that other idea”. But if you combine 3 other ideas into something that is sufficiently non-obvious (which is entirely relative) then we call it “novel” and “original”.
I think the line should probably be, either it’s a tool and you need to license any work it references, OR it’s conscious, has rights, gets paid, and is a person. I think most tech companies would much rather stay in the former camp, not having to answer any ethical dilemmas if they don’t have to. But on the other hand, the first company to make something that people consider actually “conscious” will make history.
Sounds like you have about 100 years of philosophical discussion, AI research, and scifi to catch up on 😄.
No.
I am so fucking sick of this “AI art is just doing what humans do" bullshit. It is so utterly devoid of any kind of critical thinking that it sounds like a 100% bad faith argument every time it comes up.
AI can only give you a synthesis of exactly what you feed it. It can’t use its life experience, its upbringing, its passions, its cultural influences, etc to color its creativity and thinking, because it has none and it isn’t thinking. Two painters who study and become great artists, and then also both take time to study and replicate the works of Monet can come away from that experience with vastly different styles. They’re not just puking back a mashup of Monet’s collected works. They’re using their own life experience and passions to color their experience of Impressionism.
That’s something an AI can never do, and it leaves the result hollow and meaningless.
There is so so so so so much more to human experience, life experience, and just being alive than simply absorbing “content.”
No offense, but I get the sense that you don’t actually know how ML works and you’re just familiar with pop science descriptions of it. Am I wrong?
It’s an incredibly bold claim to say that a human brain is doing something an AI could never do. That is a very antiquated notion, to the point that I would say it’s 100% devoid of any critical thinking.
Now if you’re arguing that there is a supernatural plane of some kind that cannot be measured in any way, and is fully responsible for our consciousness, then that’s a different story, there’s nothing I can say to change your mind.
That’s the thing though, it’s all the same “content” to a living brain. Your brain doesn’t distinguish between your lived experiences and watching cat videos, the experience of watching those videos is also a lived experience.
I know it’s tempting to say humans (or living creatures) are special and unique in their ability to experience emotions and consciousness etc, but the reality is, you’re a biological machine. You take inputs via various senses, chemical reactions happen throughout your body, and the illusion of memory and experience is created. Now either prove to me that this phenomenon is not replicable in a lab or virtual setting, or get off your high horse and join the actual discussion that needs to happen.
But copyright is entirely artificial. The deal is that the law says you have to pay when you copy a bunch of copyrighted text and reprint it into new pages of a newly bound book. The law also says you don’t have to pay when you are giving commentary on a copyrighted work, or parodying a copyrighted work, or drawing inspiration from a copyrighted work to create something new but still influenced by that copyrighted work. The question for these lawsuits is whether using copyrighted works to train these models and generate new text (or art or music) is infringement of those artificial, human-made, legal rights.
As an example, sound recording copyrights only protect the literal copying of a sound recording. Someone who mimics that copyrighted recording, no matter how perfectly, doesn’t actually infringe on the recording copyright (even if they might infringe on the composition copyright, a separate and distinct copyright). But a literal duplication process of some kind would be infringement.
We can have a debate whether the law draws the line in the correct places, or whether the copyright regime could be improved, and other normative discussion what what the rules should be in the modern world, especially about whether the rules in one area (e.g., the human brain) are consistent with the rules in another area (e.g., a generative AI model). But it’s a separate discussion from what the rules currently are. Under current law, the human brain is currently allowed to perform some types of copying and processing and remixing that some computer programs are not.
I agree with the summary of the situation in your first paragraph.
Your second paragraph about sound mimickry, as far as I’m aware, is not accurate. Musicians have been ordered to pay for much less than rote mimickry, even simple things like using the same melody or beat as a backing track have been ruled as infringement. In the US, at least.
And I agree with the 3rd paragraph.
So I believe my original question still stands: should an artificial brain be required to pay licensing fees for everything it sees?
Then explain how Bobby Prince could literally steal a South Park song to make “Shawn’s Got the Shotgun” to make Doom 2.
I’m not familiar with the situation, but I imagine if Southpark went around suing people for using their stuff, people wouldn’t take them seriously. Virtually everything in Southpark relies on their abuse of Fair Use. Just because it IS infringement, doesn’t mean you have to sue them.
It looks like there are a few other tracks that Bobby Prince is responsible for that made it into Doom. In this interview he states that he made them for fun, labeled the files to not be used in the final game, and was surprised id even had copies. When Romero made the decision to include them in the game, Bobby (who is/was a lawyer apparently) says he was sure they would get sued.
It is. The recording copyright is separate from the musical composition copyright. Here’s the statute governing the rights to use a recording:
So if I want to go record a version of “I Will Always Love You” that mimics and is inspired by Whitney Houston’s performance, I actually only owe compensation to the owner of the musical composition copyright, Dolly Parton. Even if I manage to make it sound just like Whitney Houston, her estate doesn’t hold any rights to anything other than the actual sounds actually captured in that recording.
That’s unrelated to an LLM. An LLM is not a synthetic human brain. It’s a computer program and sets of statistical data points from large amounts of training data to generate outputs from prompts.
If we get real general-purpose AI some day in the future, then we’ll need to answer those sorts of questions. But that’s not what we have today.
The discussion is about law surrounding AI, not LLMs specifically. No we don’t have an AGI today (that we know of), but assuming we will, we will probably still have the laws we write today. So regardless of when it happens, we should be discussing and writing laws today under the assumption it will eventually happen.
This thread is about ChatGPT, an LLM. It is not a general purpose AI.
To me there’s a bit of a difference because humans are not controllable and cannot (legally) be slaves. So in the case of this hypothetical artificial brain, that brain could leave and take the profits of it’s work elsewhere, with the creator no longer benefiting.
Yes, it will be interesting if a court is ever receptive to the notion that we’re creating something close to actual “consciousness”, because it does lead to ethical dilemmas very quickly.
I think it’s probably in the tech companies’ best interests to play along with the trademark requirements and treat the AI like a tool for as long as possible.
Yeah I’ve heard a lot of people talking about the copyright stuff with respect to image generation AIs, but as far as I can see there’s no fundamental reason that text generating AIs wouldn’t be subject to the same laws. We’ll see how the lawsuit goes though I suppose.
Neither are infringement. Artists attempting to bully platforms into not training on them doesn’t change the fact that training on information would be black and white fair use if it didn’t have absolutely nothing in common with copyright infringement. Learning from copyrighted material is not distributing it.
If the court doesn’t just ignore the law, which has nothing that could theoretically be interpreted to support the idea that training is infringement in any way, this case will be the precedent that sets AI training free.
And you, as an individual, should want that. Breaking the ability to learn from prior art is still literally guaranteed to disenfranchise the overwhelming majority of creators in all formats, because there are massive IP holders who have the data sets to build generative AI and produce unlimited “free” content, while no individual will be able to do the same because they’ll have nothing to train on. If you think Disney has a monopoly now, wait until they can train AI on 100 years of 95% of TV and movies and no one else can make AI.
Well I hear what you’re saying, although I don’t much appreciate being told what I should want the outcome to be.
My own wants notwithstanding, I know copyright law is notoriously thorny – fair use doubly so – and I’m no lawyer. I’d be a little bit surprised if NYT decides to raise this suit without consulting their own lawyers though, so it stands to reason that if they do indeed decide to sue then there are at least some copyright lawyers who think it’ll have a chance. As I said, we’ll see.
Fair use isn’t relevant.
Copyright law does not prevent learning from copyrighted material. There is no potential infringement for fair use to be applied to. Nothing is being copied and shared.
If they’re suing, they’re doing it because they think they can manipulate a ruling that does not in any way follow the law and because the benefit if they can do so is huge, not because any intelligent rational human being can read the law and possibly interpret anything as infringement. It’s not ambiguous in any way. There can’t be infringement if you don’t distribute someone else’s work.
It seems like you’re working under the core assumption that the trained model itself, rather than just the products thereof, cannot be infringing?
Generally if someone else wants to do something with your copyrighted work – for example your newspaper article – they need a license to do so. This isn’t only the case for direct distribution, it includes things like the creation of electronic copies (which must have been made during training), adaptations, and derivative works. NYT did not grant OpenAI a license to adapt their articles into a training dataset for their models. To use a copyrighted work without a license, you need to be using it under fair use. That’s why it’s relevant: is it fair use to make electronic copies of a copyrighted work and adapt them into a training dataset for a LLM?
You also seem to be assuming that a generative AI model training on a dataset is legally the same as a human learning from those same works. If that’s the case then the answer to my question in the last paragraph is definitely, “yes,” since a human reading the newspaper and learning from it is something that, as you say, “any intelligent rational human being” would agree is fine. However, as far as I know there’s not been any kind of ruling to support the idea that those things are legally equivalent at this point.
Now, if you’d like to start citing code or case law go ahead, I’m happy to be wrong. Who knows, this is the internet, maybe you’re actually a lawyer specializing in copyright law and you’ll point out some fundamental detail of one of these laws that makes my whole comment seem silly (and if so I’d honestly love to read it). I’m not trying to claim that NYT is definitely going to win or anything. My argument is just that this is not especially cut-and-dried, at least from the perspective of a non-expert.
Electronic copies are copies. Copying the story is a copy. You need a license to copy someone’s work. You unconditionally do not need a license to learn from it and use that knowledge for any purpose you wish. There are no laws that could possibly be interpreted to require this.
Derivative works are copies using substantial portions of someone else’s original work. You need a license to adapt a book into a movie because you’re copying their whole story, characters, etc. You don’t need a license to tell a similar story from a similar idea because you are not. Literally everything that has been created in the past 10,000 years is built on the ideas of others. Everything is a derivative work if you think learning from an article is. You’re allowed to summarize copyrighted material and present your own interpretation of it to others. You’re allowed to do so commercially. It isn’t copying.
The New York Times owns their articles. They own their specific packaging of the facts inside. They don’t and unconditionally can’t own the facts themselves. Nothing they own is being copied. Having files in memory is not copyright infringement. It’s the literally guaranteed result of publishing anything digitally.
There is nothing that OpenAI is doing that any law in existence even loosely implies might need a license.
You don’t need a license to learn from a story, but if learning requires you first to make an enduring copy of the story on your laptop then you could be violating copyright.
And neural nets generally require a local enduring copy of their training data, which means they too could be violating copyright.
Isn’t copyright only an issue when a copy is sold?
But there is no one learning from it. It serves as a building block / source material to build these LLMs. I feel like the fact that it’s called learning gives folks the impression that it’s similar to what a human would do.
“AI” isn’t intelligent, but that has literally zero relevance.
Seeing copyrighted material and forming takeaways does not in any way resemble copyright infringement. It’s not the fact that a human is doing so that matters. It’s the fact that no sort of analysis constitutes copying or copyright infringement.
But they aren’t forming take aways from it. They literally used that material to build this system. I also cannot just go around and take arbitrary data from anywhere and use it to build my own program. There are licenses attached to it and I have to be mindful of who’s work I can use to build my system and who’s I can’t use without explicit permission.
Building this system isn’t looking at other folks material and forming take aways from it. It’s literally using that material as input for building the system.
Yes they are.
And yes, you absolutely can use entirely New York Times articles as research material to write your own article based on conclusions from them. You can’t outright copy paste their articles, but you can freely use information learned from their articles however the hell you want.
It’s the exact same thing. “AI” looks at their articles, integrates information, and does not retain the actual article. That has no similarity in any way to copyright infringement.
It is similar to what humans do. The principal difference is that the AI tech we have (as of yet) can’t learn how to learn: Those systems come with pre-determined rules to learn, we come with pre-determined rules on how to learn how to learn.
And yes AIs abstract the knowledge they get fed. What they have trouble with is not forgetting how to play soccer when learning how to cook spaghetti as without the capacity to learn to learn they can’t vary encoding of information between topics and everything gets mushed together, new information blindly overwriting unrelated old information.
Since the DMCA, just the circumvention of copyright protection measures is a crime. It’s stupid, but the point is that even if training AI on the data is completely legal, if the data was protected with something that is also used to protect copyright, and you needed to circumvent that to get access even for legitimate purposes, you’ve broken a law.
Copyright has been made obtuse and stupid and damaging to society by big IP holders, it’s just now there’s big corps on the “infringing” side too. This will get interesting.
I’m slightly optimistic. It might slow down the progression of those language models now, but I hope that it becomes a “benign disincentive” in the long run, forcing a shift from LLM to better models.