• 5 Posts
  • 471 Comments
Joined 1 year ago
cake
Cake day: August 27th, 2023

help-circle
  • Reddit and newspapers selling their data preemptively has to do with LLMs. Can you clarify what scenario you are aiming for? It sounds like you want the courts to rule that AI companies need to ask each individual redditor if they can use his comments for training. I don’t see this happening personally.

    Getty gives itself the right to license all photos uploaded and already trained a generative model on those btw.


  • They won’t need to, they will get it from Getty. All these websites have a ToS that make it very clear they can do whatever they want with what you upload. The courts will simply never side with the small time photographer who makes 50$ a month with his stock photos hosted on someone else’s website. The laws will be in favor of databrokers and the handful of big AI companies.

    Anyone self hosting will simply not get a call. Journalists will keep the same salary while the newspaper’s owner gets a fat bonus. Even Reddit already sold it’s data for 60 million and none of that went anywhere but spezs coke fund.


  • If we can’t train on unlicensed data, there is no open-source scene. Even worse, AI stays but it becomes a monopoly in the hands of the few who can pay for the data.

    Most of that data is owned and aggregated by entities such as record labels, Hollywood, Instagram, reddit, Getty, etc.

    The field would still remain hyper competitive for artists and other trades that are affected by AI. It would only cause all the new AI based tools to be behind expensive censored subscription models owned by either Microsoft or Google.

    I think forcing all models trained on unlicensed data to be open source is a great idea but actually rooting for civil lawsuits which essentially entail a huge broadening of copyright laws is simply foolhardy imo.