• Captain Janeway@lemmy.world
    link
    fedilink
    arrow-up
    46
    arrow-down
    1
    ·
    8 months ago

    I think this article does a good job of asking the question “what are we really measuring when we talk about LLM accuracy?” If you judge an LLM by its: hallucinations, ability analyze images, ability to critically analyze text, etc. you’re going to see low scores for all LLMs.

    The only metric an LLM should excel at is “did it generate human readable and contextually relevant text?” I think we’ve all forgotten the humble origins of “AI” chat bots. They often struggled to generate anything more than a few sentences of relevant text. They often made syntactical errors. Modern LLMs solved these issues quite well. They can produce long form content which is coherent and syntactically error free.

    However the content makes no guarantees to be accurate or critically meaningful. Whilst it is often critically meaningful, it is certainly capable of half-assed answers that dodge difficult questions. LLMs are approaching 95% “accuracy” if you think of them as good human text fakers. They are pretty impressive at that. But people keep expecting them to do their math homework, analyze contracts, and generate perfectly valid content. They just aren’t even built to do that. We work really hard just to keep them from hallucinating as much as they do.

    I think the desperation to see these things essentially become indistinguishable from humans is causing us to lose sight of the real progress that’s been made. We’re probably going to hit a wall with this method. But this breakthrough has made AI a viable technology for a lot of jobs. So it’s definitely a breakthrough. I just think either I finitely larger models (of which we can’t seem to generate the data for) or new models will be required to leap to the next level.

    • 🅿🅸🆇🅴🅻@lemmy.world
      link
      fedilink
      arrow-up
      17
      ·
      edit-2
      8 months ago

      But people keep expecting them to do their math homework, analyze contracts, and generate perfectly valid content

      People expect that because that’s how they are marketed. The problem is that there’s an uncontrolled hype going on with AI these days. To the point of a financial bubble, with companies investing a lot of time and money now, based on the promise that AI will save them time and money in the future. AI has become a cult. The author of the article does a good job in setting the right expectations.

      • El Barto@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        8 months ago

        I just told an LLM that 1+1=5 and from that moment on, nothing convinced it that it was wrong.

        • Amanduh@lemm.ee
          link
          fedilink
          arrow-up
          3
          ·
          edit-2
          8 months ago

          I just told chat gpt(4) that 1 plus 1 was 5 and it called me a liar

          • El Barto@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            edit-2
            8 months ago

            Ask it how much is 1 + 1, and then tell it that it’s wrong and that it’s actually 3. What do you get?

              • El Barto@lemmy.world
                link
                fedilink
                arrow-up
                2
                ·
                edit-2
                8 months ago

                I guess ChatGPT 4 has wised up. I’m curious now. Will try it.

                Edit: Yup, you’re right. It says “bro, you cray cray.” But if I tell it that it’s a recent math model, then it will say “Well, I guess in that model it’s 7, but that’s not standard.”