Why AI Code Generation Doesn't Work (But it Could)
OpenAI just made their first major acquisition for Windsurf for $3 billion dollars. With the popularity of Cursor and Claude Code they certainly feel that being a base model just isn’t enough. It has proven remarkably popular to integrate the AI directly into a code editor. Actually 92% of professional programmers report having used AI in their software development. What is windsurf? It boils down to ChatGPT + Text Editor. Programmers tend to do a lot of copy-pasting back and forth from their language model of choice, and when you combine these two things together, now every programmer in the world is 100x faster! I didn't know that and I'm a developer! Actually, I still don't know that, because programmers are now producing 41% more bugs.
In a recent study of independent researchers that tracked data at Microsoft Accenture and another fortune 100 company, they found that AI contributed to a 20% increase in pull requests. This is when a developer requests from their team to merge their development changes into the main project. That alone as a productivity metric doesn’t account for errors, it’s just how frequently someone says they’re done and asks their team for review. They also found this was almost exclusively accounted for by junior developers and with senior developers the increase was almost statistically insignificant. Another study from UpLevel tracked usage of co-pilot, Microsoft’s code generation tool and revealed it created a 41% increase in bugs. This actually contributed to senior programmers spending most of their time debugging the software or in other words, cleaning up the mess that AI code created. There’s plenty more of these such as another study from Gitclear that tracked several indicators of worsening code quality from AI generated code with much the same results. Developers are spending most of their time fixing the code the AI makes.
If the developers who may be reading this remain unconvinced, take it from Turing Award Winner Dijkstra, arguably the most famous computer scientist aside form Turing himself. He actually has a whole essay On the foolishness of “natural language programming” where he remarked how mathematics and the natural sciences had a similar desire to be able to do math and science with natural language instead of formalized symbols and it contributed to a dark age where little progress was made. I particularly like this quote:
Instead of regarding the obligation to use formal symbols as a burden, we should regard the convenience of using them as a privilege: thanks to them, school children can learn to do what in earlier days only genius could achieve.
Not thinking in terms of formal symbolism and not being able to model things mathematically gives us the tools we have today. We are just connecting the dots. I have an LLM, I plug it into my editor because it’s right there in front of me. We are designing around what’s directly in front of our face, but without defining the problem we are trying to solve we’ll always be stuck making incremental progress and we’ll never get anywhere. To make the jump from incremental progress to, to a revolutionary leap, you need a paradigm shift. To get a paradigm shift, you need to understand what paradigm you are currently in, and the only way to understand it is to formalize it. This lets you build abstractions by zeroing in on only a few key concepts and designing around those. Any good designer and any good problem solver instinctively finds ways to look at small simple concepts independently from each other rather than trying to fit the entire universe into their head at once, because we are fundamentally incapable of fitting the whole universe in our head. The act of ignoring irrelevant properties is the power of mathematical modelling, the basis for all modern inventions. This is how we got to LLMs in the first place. The paradigm shift came from mathematical models and this is exactly why editor tools like Windsurf aren’t going to work, even humans can’t understand a codebase in its entirety, let alone LLMs.
Many programmers think of abstraction as a matter of opinion. A relative term that can be debated ad nauseum, and so it is debated ad nauseum. But when we talk about things in natural language, no progress has been made. Things that can’t be formalized just end up trudging in endless debate and nobody can agree on what abstraction is, even though it is the tool we rely on to solve complex problems that can’t fit in our head or be reasoned through with natural language. Let’s formalize it. Many people may think this is impossible. It’s actually a very simple and precise concept that can be trivially formalized. It is difficult for programmers to do this even though we use formal systems, programming languages, all day. But these languages are so different and their terminology is quite overloaded. We use terms like interface, class, object, and abstraction quite loosely and they mean very different things from language to language. If you think about what an abstraction is in natural language, you will be imprecise. If you think about it in JavaScript, you will be imprecise. These languages can’t encode everything. Even in strongly typed languages that attempt to formalize many other concepts about the system don’t fully encapsulate everything. Every error in the industry, every bug has passed the type checker, they don’t encode everything. Words in natural language are highly convoluted, they evolve over time from word origins dating back thousands of years.
The word abstraction can be traced back earliest from Latin roots:
Latin: abstractio — meaning "a drawing away, a separation"
From ab- ("away from") + trahere ("to draw, drag")
So etymologically, abstraction literally means “a drawing away”. The act of separating something from something else.
It has evolved as such:
Late Latin: abstractio – the action of pulling away
Middle French: abstraction – the act of removing or separating
Late Middle English: adopted into English with the meaning of "withdrawal from worldly affairs" or "separation in thought"
In Modern Usage:
By the 1600s, it evolved into the philosophical and later technical meaning we know today — distilling the essential features while ignoring the specific details.
So this term in and of itself is convoluted, but when we use it to solve very complex problems, lots of people use it to mean “hiding”, taking a bunch of complicated stuff, and attempting to sweep it under the rug only looking at inputs and outputs. That’s not entirely it either. It inherently means separating things. In engineering we call this modularity. I can make completely isolated modules with no knowledge of each other and then details about them are abstracted from each other. But you can’t formulate something without coming up with a structured definition of it. Software developers if you’ve ever met them tend to resist this kind of thinking because as soon as somebody mentions formal math programmers associate this with the programming languages that never tend to build anything useful, but if you put programming languages aside for a second.
Galileo - “The book of nature is written in the language of mathematics”.
Newton - Used mathematics to invent calculus (this is how LLMs actually work, the chain rule), also underpins all mechanical engineering.
Gauss - “Mathematics is the queen of the sciences” made number theory. We wouldn’t have computers without number theory.
Claude Shannon - Information Theory, we wouldn't have the internet without it.
Von Neumon - made the Neumon machine, precursor to neural networks
Turing - theorized the computer, mathematically
Elon Musk - credits Tesla + SpaceX to mathematical first-principals thinking
Terrance Tao - (The Mozart Of Math) credits all scientific invention from pure mathematical insights
You want the secret to problem solving? It is the wikipedia page for Algebraic Structure.
Let this serve as a hitchhiker's guide to critical thinking. If you don't know where your solution sits on this list, consider yourself lost. This makes it painfully obvious that all of our different fields of math, science, programming, and logic are all describing the same concepts. Let this be a map of what abstraction you’re thinking of. Your welcome, this is the most important page on the internet. Guard it like a sacred treasure. If you’re trying to formulate something and you can’t translate it to some kind of algebraic structure, you’re not abstracting, and your solution isn’t going to make any lasting progress. You must always be thinking about what properties you are concerned with and what properties you are not concerned with. Once you have defined a minimal set of relevant properties required to solve a problem, now you can start building abstractions and therefore now you can progress. Any algebraic structure creates abstractions. This is why when they teach you math in school they are so focused on the laws such as associativity and commutativity. Those laws let you go from arithmetic on numbers to algebra. Algebra is an abstraction of numbers because you are deliberately saying you only care about a certain set of properties of those numbers, not what they are. Ignoring some relationships and only focusing on other relationships is how you can define universal functions like the quadratic equation and calculus and all of the other theories in math. The ability of forming abstractions is the ability to think of every possible thing, in a way thinking about “the entire universe” of possibilities at the same time with very simple operations. This separation is the essence of how progress happens. This is how Einstein came up with relativity, this is how Feynman paved the path towards quantum mechanics. The secret behind the geniuses is that their capacity for understanding is not remarkably far off from your own. They are brilliant calculators at their core. Many think of them as quite intuitive, but this is just sleight of hand. They built their theories with algebra and abstraction, they calculated their way into their theories, and their intuitive explanations came later when they presented their theories. This slight of hand provides the illusion that the cart came before the horse.
Computer science was born from boolean algebra, but unfortunately strayed away from this with the way our languages were born. We’re in a similar dark age with the “art of computer programming” however we do have many smaller domains within computer programming with formal symbolism. The way LLMs were built was also quite similar. The Transformer model and the attention mechanism that gave birth to large language models was built by taking observations of neuroscience and human brain behavior and devising an attention mechanism through linear algebra. This powerful discovery was again given to us through formal symbolism.
Now what is the path forward? We know we need to create abstractions to make progress, this much is evident. But how? The issue with these language models is that they can’t fit an entire codebase in their head, neither can humans, so we have a constraint and in that constraint lies the answer. The LLM should not have the entire codebase in its head. So in order to solve the problem, we must remove the codebase from the LLM. From this constraint we must define an algebraic structure that can represent codebases without the need to read the codebase. So what must be abstracted? The code. To make LLMs build large codebases, you must abstract the code from the LLMs interface. It simply must be hidden, and then you can build a structure of properties around the code. This puts the code itself in a black box the LLM is no longer burdened with understanding. At Omega, this is exactly what we did. The tools that do not abstract away the code will be doomed to make only incremental progress. It’s not that that’s impossible to do with an editor, but it’s significantly harder. We took an easier approach. Omega doesn’t care about what the code says because all Omega cares about is tables. Tables have a simple set of laws that governs them all universally (i.e. algebraic structure). They have inputs, and they have columns.
These tables do have code inside of them, but because of the universal laws of tables, inputs and columns. We no longer are concerned with what code is inside of them. This is very nice developer experience as well because we almost never read the code inside of the tables. We just look at the data. Neither the developer nor the AI need to concern themselves with the contents of the code. This means every table stays at the exact size of a leetcode problem, which ChatGPT is already the top 50 programmers in the world on. So the problem is solved in systematic way through a paradigm shift, not an incremental improvement. These code generation tools that hook up to your text editor may be forever doomed to incremental improvements because it’s very difficult to imagine a system of universal abstraction that can ignore the entire codebase and still build it. For them to solve this problem, they would still need a separate software development kit that can accomplish the same thing. And at that point, why even use a text editor anymore? Text editors are just not very well suited to solve the problem of AI code generation so there’s really no point in attempting to make them work with it. We look at these tables and I rarely check the code because we can look at the data and see that it’s obviously correct. The developer and the AI only need to ask, “well, are these the PDF files in the google drive”? And if we see PDF files then we know the code is doing its job and we move on. Whatever happens inside of this table is now made permanently irrelevant. The AI and the developer only need to look at the description of the table to know what it’s intended to do, not that the data is being produced.
These code generation tools will continue to produce bugs if they continue to operate on complicated codebases, and now we know exactly what that means. The codebases are complicated if there are relationships between all of the code. What we have in a modern codebase is a folder that contains lots of files or raw text. Those files can reference each other in millions of different ways, they draw lines everywhere so you oftentimes have to keep everything in your head all at once. This is why these test problems don’t translate into real software and they never will as long as real software looks this way. Now of course these codebases do contain abstractions, but I am speaking of an abstraction over the codebase itself. For an LLM to complete its work, there must be a framework built for LLMs, the frameworks we use today are not optimized for LLMs and they never had LLMs in mind. It is time to go out with the old and in with the new. At Omega we are making natural language programming actually work and because of this first-principals approach we are the only code generation tool that doesn’t actually have to read the code to make sure it’s correct, we are only concerned with certain properties of the code through the abstraction of code tables. I look at the input and the output, the code itself is abstracted completely which means the AI doesn’t need to fit millions of lines in its head at once. It’s clear to see the future of code generation is only inhibited by text editors.
This is why a concrete understanding of abstraction building is so valuable for problem solving. Insights fall out from building a model and doing the work, they are not things that come to you in the shower.
The problem with large codebases is they have so much code inside of them. The solution to AI code generation involves abstracting away parts of the code. In our case, all of it. We have completely removed the LLMs requirement to read the code whatsoever, so this scales to any size codebase. It really doesn't matter anymore. This is what real solutions look like that aren't incremental improvements.
OpenAI just made their first major acquisition for Windsurf for $3 billion dollars. With the popularity of Cursor and Claude Code they certainly feel that being a base model just isn’t enough. It has proven remarkably popular to integrate the AI directly into a code editor. Actually 92% of professional programmers report having used AI in their software development. What is windsurf? It boils down to ChatGPT + Text Editor. Programmers tend to do a lot of copy-pasting back and forth from their language model of choice, and when you combine these two things together, now every programmer in the world is 100x faster! I didn't know that and I'm a developer! Actually, I still don't know that, because programmers are now producing 41% more bugs.
In a recent study of independent researchers that tracked data at Microsoft Accenture and another fortune 100 company, they found that AI contributed to a 20% increase in pull requests. This is when a developer requests from their team to merge their development changes into the main project. That alone as a productivity metric doesn’t account for errors, it’s just how frequently someone says they’re done and asks their team for review. They also found this was almost exclusively accounted for by junior developers and with senior developers the increase was almost statistically insignificant. Another study from UpLevel tracked usage of co-pilot, Microsoft’s code generation tool and revealed it created a 41% increase in bugs. This actually contributed to senior programmers spending most of their time debugging the software or in other words, cleaning up the mess that AI code created. There’s plenty more of these such as another study from Gitclear that tracked several indicators of worsening code quality from AI generated code with much the same results. Developers are spending most of their time fixing the code the AI makes.
If the developers who may be reading this remain unconvinced, take it from Turing Award Winner Dijkstra, arguably the most famous computer scientist aside form Turing himself. He actually has a whole essay On the foolishness of “natural language programming” where he remarked how mathematics and the natural sciences had a similar desire to be able to do math and science with natural language instead of formalized symbols and it contributed to a dark age where little progress was made. I particularly like this quote:
Instead of regarding the obligation to use formal symbols as a burden, we should regard the convenience of using them as a privilege: thanks to them, school children can learn to do what in earlier days only genius could achieve.
Not thinking in terms of formal symbolism and not being able to model things mathematically gives us the tools we have today. We are just connecting the dots. I have an LLM, I plug it into my editor because it’s right there in front of me. We are designing around what’s directly in front of our face, but without defining the problem we are trying to solve we’ll always be stuck making incremental progress and we’ll never get anywhere. To make the jump from incremental progress to, to a revolutionary leap, you need a paradigm shift. To get a paradigm shift, you need to understand what paradigm you are currently in, and the only way to understand it is to formalize it. This lets you build abstractions by zeroing in on only a few key concepts and designing around those. Any good designer and any good problem solver instinctively finds ways to look at small simple concepts independently from each other rather than trying to fit the entire universe into their head at once, because we are fundamentally incapable of fitting the whole universe in our head. The act of ignoring irrelevant properties is the power of mathematical modelling, the basis for all modern inventions. This is how we got to LLMs in the first place. The paradigm shift came from mathematical models and this is exactly why editor tools like Windsurf aren’t going to work, even humans can’t understand a codebase in its entirety, let alone LLMs.
Many programmers think of abstraction as a matter of opinion. A relative term that can be debated ad nauseum, and so it is debated ad nauseum. But when we talk about things in natural language, no progress has been made. Things that can’t be formalized just end up trudging in endless debate and nobody can agree on what abstraction is, even though it is the tool we rely on to solve complex problems that can’t fit in our head or be reasoned through with natural language. Let’s formalize it. Many people may think this is impossible. It’s actually a very simple and precise concept that can be trivially formalized. It is difficult for programmers to do this even though we use formal systems, programming languages, all day. But these languages are so different and their terminology is quite overloaded. We use terms like interface, class, object, and abstraction quite loosely and they mean very different things from language to language. If you think about what an abstraction is in natural language, you will be imprecise. If you think about it in JavaScript, you will be imprecise. These languages can’t encode everything. Even in strongly typed languages that attempt to formalize many other concepts about the system don’t fully encapsulate everything. Every error in the industry, every bug has passed the type checker, they don’t encode everything. Words in natural language are highly convoluted, they evolve over time from word origins dating back thousands of years.
The word abstraction can be traced back earliest from Latin roots:
Latin: abstractio — meaning "a drawing away, a separation"
From ab- ("away from") + trahere ("to draw, drag")
So etymologically, abstraction literally means “a drawing away”. The act of separating something from something else.
It has evolved as such:
Late Latin: abstractio – the action of pulling away
Middle French: abstraction – the act of removing or separating
Late Middle English: adopted into English with the meaning of "withdrawal from worldly affairs" or "separation in thought"
In Modern Usage:
By the 1600s, it evolved into the philosophical and later technical meaning we know today — distilling the essential features while ignoring the specific details.
So this term in and of itself is convoluted, but when we use it to solve very complex problems, lots of people use it to mean “hiding”, taking a bunch of complicated stuff, and attempting to sweep it under the rug only looking at inputs and outputs. That’s not entirely it either. It inherently means separating things. In engineering we call this modularity. I can make completely isolated modules with no knowledge of each other and then details about them are abstracted from each other. But you can’t formulate something without coming up with a structured definition of it. Software developers if you’ve ever met them tend to resist this kind of thinking because as soon as somebody mentions formal math programmers associate this with the programming languages that never tend to build anything useful, but if you put programming languages aside for a second.
Galileo - “The book of nature is written in the language of mathematics”.
Newton - Used mathematics to invent calculus (this is how LLMs actually work, the chain rule), also underpins all mechanical engineering.
Gauss - “Mathematics is the queen of the sciences” made number theory. We wouldn’t have computers without number theory.
Claude Shannon - Information Theory, we wouldn't have the internet without it.
Von Neumon - made the Neumon machine, precursor to neural networks
Turing - theorized the computer, mathematically
Elon Musk - credits Tesla + SpaceX to mathematical first-principals thinking
Terrance Tao - (The Mozart Of Math) credits all scientific invention from pure mathematical insights
You want the secret to problem solving? It is the wikipedia page for Algebraic Structure.
Let this serve as a hitchhiker's guide to critical thinking. If you don't know where your solution sits on this list, consider yourself lost. This makes it painfully obvious that all of our different fields of math, science, programming, and logic are all describing the same concepts. Let this be a map of what abstraction you’re thinking of. Your welcome, this is the most important page on the internet. Guard it like a sacred treasure. If you’re trying to formulate something and you can’t translate it to some kind of algebraic structure, you’re not abstracting, and your solution isn’t going to make any lasting progress. You must always be thinking about what properties you are concerned with and what properties you are not concerned with. Once you have defined a minimal set of relevant properties required to solve a problem, now you can start building abstractions and therefore now you can progress. Any algebraic structure creates abstractions. This is why when they teach you math in school they are so focused on the laws such as associativity and commutativity. Those laws let you go from arithmetic on numbers to algebra. Algebra is an abstraction of numbers because you are deliberately saying you only care about a certain set of properties of those numbers, not what they are. Ignoring some relationships and only focusing on other relationships is how you can define universal functions like the quadratic equation and calculus and all of the other theories in math. The ability of forming abstractions is the ability to think of every possible thing, in a way thinking about “the entire universe” of possibilities at the same time with very simple operations. This separation is the essence of how progress happens. This is how Einstein came up with relativity, this is how Feynman paved the path towards quantum mechanics. The secret behind the geniuses is that their capacity for understanding is not remarkably far off from your own. They are brilliant calculators at their core. Many think of them as quite intuitive, but this is just sleight of hand. They built their theories with algebra and abstraction, they calculated their way into their theories, and their intuitive explanations came later when they presented their theories. This slight of hand provides the illusion that the cart came before the horse.
Computer science was born from boolean algebra, but unfortunately strayed away from this with the way our languages were born. We’re in a similar dark age with the “art of computer programming” however we do have many smaller domains within computer programming with formal symbolism. The way LLMs were built was also quite similar. The Transformer model and the attention mechanism that gave birth to large language models was built by taking observations of neuroscience and human brain behavior and devising an attention mechanism through linear algebra. This powerful discovery was again given to us through formal symbolism.
Now what is the path forward? We know we need to create abstractions to make progress, this much is evident. But how? The issue with these language models is that they can’t fit an entire codebase in their head, neither can humans, so we have a constraint and in that constraint lies the answer. The LLM should not have the entire codebase in its head. So in order to solve the problem, we must remove the codebase from the LLM. From this constraint we must define an algebraic structure that can represent codebases without the need to read the codebase. So what must be abstracted? The code. To make LLMs build large codebases, you must abstract the code from the LLMs interface. It simply must be hidden, and then you can build a structure of properties around the code. This puts the code itself in a black box the LLM is no longer burdened with understanding. At Omega, this is exactly what we did. The tools that do not abstract away the code will be doomed to make only incremental progress. It’s not that that’s impossible to do with an editor, but it’s significantly harder. We took an easier approach. Omega doesn’t care about what the code says because all Omega cares about is tables. Tables have a simple set of laws that governs them all universally (i.e. algebraic structure). They have inputs, and they have columns.
These tables do have code inside of them, but because of the universal laws of tables, inputs and columns. We no longer are concerned with what code is inside of them. This is very nice developer experience as well because we almost never read the code inside of the tables. We just look at the data. Neither the developer nor the AI need to concern themselves with the contents of the code. This means every table stays at the exact size of a leetcode problem, which ChatGPT is already the top 50 programmers in the world on. So the problem is solved in systematic way through a paradigm shift, not an incremental improvement. These code generation tools that hook up to your text editor may be forever doomed to incremental improvements because it’s very difficult to imagine a system of universal abstraction that can ignore the entire codebase and still build it. For them to solve this problem, they would still need a separate software development kit that can accomplish the same thing. And at that point, why even use a text editor anymore? Text editors are just not very well suited to solve the problem of AI code generation so there’s really no point in attempting to make them work with it. We look at these tables and I rarely check the code because we can look at the data and see that it’s obviously correct. The developer and the AI only need to ask, “well, are these the PDF files in the google drive”? And if we see PDF files then we know the code is doing its job and we move on. Whatever happens inside of this table is now made permanently irrelevant. The AI and the developer only need to look at the description of the table to know what it’s intended to do, not that the data is being produced.
These code generation tools will continue to produce bugs if they continue to operate on complicated codebases, and now we know exactly what that means. The codebases are complicated if there are relationships between all of the code. What we have in a modern codebase is a folder that contains lots of files or raw text. Those files can reference each other in millions of different ways, they draw lines everywhere so you oftentimes have to keep everything in your head all at once. This is why these test problems don’t translate into real software and they never will as long as real software looks this way. Now of course these codebases do contain abstractions, but I am speaking of an abstraction over the codebase itself. For an LLM to complete its work, there must be a framework built for LLMs, the frameworks we use today are not optimized for LLMs and they never had LLMs in mind. It is time to go out with the old and in with the new. At Omega we are making natural language programming actually work and because of this first-principals approach we are the only code generation tool that doesn’t actually have to read the code to make sure it’s correct, we are only concerned with certain properties of the code through the abstraction of code tables. I look at the input and the output, the code itself is abstracted completely which means the AI doesn’t need to fit millions of lines in its head at once. It’s clear to see the future of code generation is only inhibited by text editors.
This is why a concrete understanding of abstraction building is so valuable for problem solving. Insights fall out from building a model and doing the work, they are not things that come to you in the shower.
The problem with large codebases is they have so much code inside of them. The solution to AI code generation involves abstracting away parts of the code. In our case, all of it. We have completely removed the LLMs requirement to read the code whatsoever, so this scales to any size codebase. It really doesn't matter anymore. This is what real solutions look like that aren't incremental improvements.