Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds

leninmummy@lemmy.ml · 3 years ago

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds

Fixbeat@lemmy.ml · 3 years ago

Can it still solve programming problems?

TheSaneWriter@lemmy.thesanewriter.com · 3 years ago

It can probably still write boilerplate code, but I wouldn’t currently trust it for algorithmic design.

remotedev@lemmy.ca · 3 years ago

I’ve tried to use it for debugging by copying code into it, and it gives me the same code back as the corrected version. I was wondering why it’s been getting worse

TheSaneWriter@lemmy.thesanewriter.com · 3 years ago

My guess is they’ve been trying to make it cheaper by decreasing the amount of time it spends on each response or by decreasing the amount of computing power that goes into the instance you’re speaking to. Coding and math are products of high-level cognition and arise emergently out of neural networks that are very sophisticated, but take just a bit of power out and the abilities degenerate rapidly.

agissilver@lemmy.world · 3 years ago

I also experienced this issue last week. I asked for a specific correction and got unchanged code back. Sometimes it does update, though. Maybe like 50-70% of requests.

StarkillerX42@lemmy.ml · 3 years ago

I’ve never been able to get a solution that was even remotely correct. Granted, most of the times I ask ChatGPT is when I’m having a hard time solving it myself.

SokathHisEyesOpen@lemmy.ml · 3 years ago

You need to be able to clearly describe the problem, and your expected solution, to get it to give quality answers. Type out instructions for it like you would type for a junior developer. It’ll give you senior level code back, but it absolutely needs clear and constrained guidelines.

exscape@kbin.social · 3 years ago

I mostly agree, I’ve had good results with similar prompts, but there’s usually some mistake in there. It seems particularly bad with python imports, it just uses class A, B, C and imports class A, B and X and calls it a day.

Here are a few prompts that gave pretty good results:

Create a QDialog class that can be used as a modal dialog. The dialog should update itself every 500 ms to call a supplied function, and show the result of the call as a centered QLabel.

How can I make a QDialog move when the user clicks and drags anywhere inside it? The QDialog only contains two QLabel widgets.

For this one, it ignored the method I asked it to use – but it was possibly correct in doing so, as it doesn’t support arbitrary sizes (but I think that’s only for the request?):

Hi again! Can you write me a Python function (using PySide) to connect to a named pipe server on Windows? It should use SetNamedPipeHandleState to use PIPE_READMODE_MESSAGE, then TransactNamedPipe to send a request (from a method parameter) to a named pipe, then read back a response of arbitrary size.

It should have told me why it ignored using TransactNamedPipe, but when I told it that it ignored my request it explained why.

EmilieEvans@lemmy.ml · 3 years ago

deleted by creator

SokathHisEyesOpen@lemmy.ml · 3 years ago

Today it randomly decided to hide the results from some code that was supposed to be returned from a function. I asked it why it chose to hide the results and it couldn’t tell me, it just apologized and then gave me the code without the hide logic. Pretty strange actually since we had been working on the code for half an hour and then all of the sudden it just decided to hide it all on its own.

SokathHisEyesOpen@lemmy.ml · 3 years ago

Yes! I use it at work almost every day. Sometimes it takes longer to get it to solve the problem than it would have taken me to write it, since it makes mistakes, but sometimes it saves me hours of coding and thinking. It is very helpful in debugging error codes and stuff like that since it can evaluate an entire 1000 line script file in half a second.