Maybe we need a way to generate checksums during version creation (like file version history) and during test runs of code that would be submitted along side the code as a sort of proof of work that AI couldn’t easily recreate. It would make code creation harder for actual developers as well but it may reduce people trying to quickly contribute code the LLMs shit out.
A lightweight plugin that runs in your IDE maybe. So anytime you are writing code and testing it, the plugin is modifying a validation file that shows what you were doing and the results of your tests and debugging. Could then write an algorithm that gives a confidence score to the validation file and either triggers manual review or submits obviously bespoke code.
What exactly would you checksum? All intermediate states that weren’t committed, and all test run parameters and outputs? If so, how would you use that to detect an LLM? The current agentic LLM tools also do several edits and run tests for the thing they’re writing, then edit more until their tests work.
So the presence of test runs and intermediate states isn’t really indicative of a human writing code and I’m skeptical that distinguishing between steps a human would do and steps an LLM would do is any easier or quicker than distinguishing based on the end result.
You could time stamp changes and progress to a file. Record results of tests and output and give an approximate algorithmic confidence rating about how bespoke the process of writing that code was. Even agentic AI rapidly spits out code like a machine would where humans take time and think about things as they go. They make typos and go back and correct them. Code tests fail and debugging looks different between an agent and a human. We need to fingerprint how agents write code and use agentic code processed through this sort of validation looks versus what it looks like for humans to do the same.
This basically amounts to a key/interaction logger in the IDE. I’d suspect this would prevent many people contributing to projects using something like that, at least I wouldn’t install such a plug-in.
Maybe we need a way to generate checksums during version creation (like file version history) and during test runs of code that would be submitted along side the code as a sort of proof of work that AI couldn’t easily recreate. It would make code creation harder for actual developers as well but it may reduce people trying to quickly contribute code the LLMs shit out.
A lightweight plugin that runs in your IDE maybe. So anytime you are writing code and testing it, the plugin is modifying a validation file that shows what you were doing and the results of your tests and debugging. Could then write an algorithm that gives a confidence score to the validation file and either triggers manual review or submits obviously bespoke code.
What exactly would you checksum? All intermediate states that weren’t committed, and all test run parameters and outputs? If so, how would you use that to detect an LLM? The current agentic LLM tools also do several edits and run tests for the thing they’re writing, then edit more until their tests work.
So the presence of test runs and intermediate states isn’t really indicative of a human writing code and I’m skeptical that distinguishing between steps a human would do and steps an LLM would do is any easier or quicker than distinguishing based on the end result.
You could time stamp changes and progress to a file. Record results of tests and output and give an approximate algorithmic confidence rating about how bespoke the process of writing that code was. Even agentic AI rapidly spits out code like a machine would where humans take time and think about things as they go. They make typos and go back and correct them. Code tests fail and debugging looks different between an agent and a human. We need to fingerprint how agents write code and use agentic code processed through this sort of validation looks versus what it looks like for humans to do the same.
This basically amounts to a key/interaction logger in the IDE. I’d suspect this would prevent many people contributing to projects using something like that, at least I wouldn’t install such a plug-in.
It would be a keylogger within the IDE. How else do you prove you were the one doing the work? Otherwise, AI slop. I guess pick your poison.
This could, in theory, also be used by universities to validate submitted papers to weed out AI essays.