AI coding tools are quickly improving. If you are not working in the code, it can be difficult to notice how much things change, but GPT-5 and Gemini 2.5 have made a brand new set of developer stuff possible to automate, and last week Sonnet 2.4 did again.
At the same time, other skills are progressing more slowly. If you use AI to write emails, you probably get the same value you made a year ago. Even when the model improves, the product does not always benefit – especially when the product is a chatbot which makes a dozen different jobs at the same time. AI is still making progress, but it is not as distributed as before.
The current difference is simpler than it seems. Coding applications benefit from billions of easily measurable tests, which can train them to produce feasible code. This is strengthening learning (RL), undoubtedly the largest engine of AI progress in the last six months and to become more complex all the time. You can learn strengthening with human students, but it works better if there is a metric of clear passes, so you can repeat billions of times without having to stop for the human contribution.
While the industry is increasingly based on strengthening learning to improve products, we see a real difference between the capacities that can be automatically classified and those that cannot. The skills adapted to RL as the fixing of insects and competitive mathematics improve quickly, while skills and writing only make progressive progress.
In short, there is a reinforcement gap – and it becomes one of the most important factors for what AI systems can and cannot do.
In some respects, software development is the perfect subject for learning strengthening. Even before the AI, there was a whole sub -discipline devoted to testing how the software would resist under pressure – largely because the developers had to ensure that their code did not break before deploying it. Thus, even the most elegant code must still go through unit tests, integration tests, safety tests, etc. Human developers use these tests regularly to validate their code and, as Google’s main director for development tools recently told me, they are just as useful for validating the code generated by AI. Even more than that, they are useful for learning strengthening, as they are already systematized and reproducible on a large scale.
There is no easy way to validate a well-written email or a good chatbot response; These skills are intrinsically subjective and more difficult to measure on a large scale. But all the tasks do not fall perfectly into the “easy -to -test” or “difficult to test” categories. We do not have a ready -to -use test kit for quarterly financial reports or actuarial science, but a well -capitalized accounting startup could probably build one from zero. Some test kits will work better than others, of course, and some companies will be smarter about how to approach the problem. But the testability of the underlying process will be the decisive factor to know if the underlying process can be transformed into a functional product instead of a simple exciting demonstration.
Techcrunch event
San Francisco
|
October 27-29, 2025
Some processes are more testable than you think. If you had asked me last week, I would have put a video generated by AI in the “difficult to test” category, but the immense progress made by the new OPENAI Sora 2 model shows that it may not be as difficult as it seems. In Sora 2, objects no longer appear and do not disappear anywhere. The faces hold their shape, resembling a specific person rather than a simple collection of features. The images of Sora 2 respect the laws of physics in a manner both obvious and subtle. I suspect that if you have taken a look behind the curtain, you will find a robust strengthening learning system for each of these qualities. A together, they make the difference between photorealism and an entertaining hallucination.
To be clear, it is not a strict and rapid rule of artificial intelligence. This is the result of the central role that the learning of strengthening is played out in the development of AI, which could easily change as the models develop. But as long as RL is the main tool to put AI products on the market, the strengthening gap will only grow – with serious implications for startups and the economy as a whole. If a process is found on the right side of the reinforcement gap, startups will probably succeed in automating it – and anyone who does this work can end up looking for a new career. The question of which health services are achievable RL, for example, have enormous implications for the form of the economy over the next 20 years. And if surprises like Sora 2 are an indication, we may not have to wait a response for a long time.