KittiesAreCute!
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Michael Ten @lemmy.world to Technology@lemmy.worldEnglish · 2 years ago

OpenAI transcribed over a million hours of YouTube videos to train GPT-4

www.theverge.com

external-link
message-square
45
link
fedilink
  • cross-posted to:
  • technology@lemmy.ml
155
external-link

OpenAI transcribed over a million hours of YouTube videos to train GPT-4

www.theverge.com

Michael Ten @lemmy.world to Technology@lemmy.worldEnglish · 2 years ago
message-square
45
link
fedilink
  • cross-posted to:
  • technology@lemmy.ml
How OpenAI, Google, and Meta deal with the limits of data online.
  • Dkarma@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    25
    ·
    2 years ago

    Removed by mod

    • Defaced@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      2
      ·
      2 years ago

      You completely miss my point, are you saying data such as copyrighted published works and medical records are free? Because I did not in any way consent to sharing medical records to OpenAI https://www.businessinsider.com/openai-chatgpt-generative-ai-stole-personal-data-lawsuit-children-medical-2023-6?op=1

      Now I realize this is an alleged offense, but it’s still fucked up. As for wanting to be the first to make a LLM, I have no desire to put myself into that amount of responsibility and liability. Sam Altman is chasing money and nothing more.

    • BreakDecks@lemmy.ml
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      2
      ·
      2 years ago

      There’s a distinct difference between quotation and plagiarism. A search engine does the former, LLMs do the latter.

      • Knock_Knock_Lemmy_In@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        5
        ·
        2 years ago

        No. If you write a truly unique combination of words then an LLM will be very unlikely to reproduce them.

        An LLM is only likely to plagiarise you if your writing is similar to others.

        • BreakDecks@lemmy.ml
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          2 years ago

          [citation needed]

          • Knock_Knock_Lemmy_In@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            3
            ·
            2 years ago

            https://blog.gdeltproject.org/do-llms-truly-create-or-merely-arrange-just-how-much-of-an-llms-writing-is-original/

            • BreakDecks@lemmy.ml
              link
              fedilink
              English
              arrow-up
              1
              ·
              2 years ago

              The differences between human and machine-generated text overlap support the image of LLMs as more “arrangers” than “creators” of text.

              So plagiarism…

              • Knock_Knock_Lemmy_In@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                2 years ago

                It only plagiarises you if you write something similar to lots of other people.

                Write something original and, even if it is in their training dataset, LLMs are highly unlikely to reproduce it.

    • EurekaStockade@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      2 years ago

      Fuck Google too

Technology@lemmy.world

technology@lemmy.world

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !technology@lemmy.world

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


  • @L4s@lemmy.world
  • @autotldr@lemmings.world
  • @PipedLinkBot@feddit.rocks
  • @wikibot@lemmy.world
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 4.58K users / day
  • 9.73K users / week
  • 15.5K users / month
  • 30.3K users / 6 months
  • 2 local subscribers
  • 79.4K subscribers
  • 16.1K Posts
  • 588K Comments
  • Modlog
  • mods:
  • L3s@lemmy.world
  • enu@lemmy.world
  • Technopagan@lemmy.world
  • L4sBot@lemmy.world
  • L3s@hackingne.ws
  • BE: 0.19.15
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org