AI Share of Code Was a Useful Number. It Isn't Anymore.

May 30, 2026

There was a time when "AI share of code" numbers were the hottest headline. Anthropic started sharing it and now puts it at essentially 100% for themselves¹, and then the big companies like Google picked it up — putting it in earnings calls² and blog posts³, each one a little higher than the last. I started watching our own version of it too.

To keep it concrete, I settled on one definition: of all the code changed in our primary branch over a month, what percentage could be attributed to AI? I'd check it and use it as a rough understanding of how things were going.

Defining AI Share of Code

I've stopped. Here's why.

You Can't Really Ground It

On paper this looks like an easy metric to calculate. Every line of code that changes — added or removed — eventually lands on the main branch. So for each one, just ask: did an AI write it, or did a human?

The trouble starts the moment you try to actually get the answer. Each tool plays by its own rules.

Cursor counts code that changed inside a Cursor session⁴. Anything an engineer edits outside of it is invisible. So the number isn't "how much of our code is AI-written" — it's "how much is AI-written, in the parts Cursor happened to be watching." I tried to get something firmer by looking at the AI share on our main branch. Cursor will give you that — not just the percentage, but the two numbers behind it: the AI-attributed lines on top and the total lines changed on the bottom. So I checked the denominator against our git logs for the same month. It was a tiny fraction of what git showed. And for some engineers, the main-branch AI share came back as zero, even though they were clearly using Cursor a lot. Cursor couldn't connect their commits to what landed on main.

Codex doesn't hand you this number at all — by design. OpenAI's governance docs surface usage and activity (active users, tokens, threads), but explicitly put "lines of code generated" on the list of things they don't track, calling it "a bit of a noisy proxy for productivity" that "can incentivize the wrong behavior"⁵. There's no "X% of the code was AI-written" figure to read at all.

Claude Code is the most careful of the three about what it counts⁶. It attributes at the merge boundary — matching Claude Code sessions against the lines of each merged PR — and keeps only "effective" lines (more than three characters, no empty or bracket-only lines), dropping anything a human substantially rewrote. But it doesn't hand you the number I was after: you get a deliberately conservative count of AI-assisted lines, not a primary-branch share. It ignores the PR's destination branch entirely, so it's every merged PR, not "what reached main."

Now multiply that by every tool an org runs at once. You can't add the numbers up: they count different things, measured at different points in the lifecycle, double-counting some code and missing the rest. The single org-wide metric doesn't really exist. I'm curious how Google arrived at the 75%³ figure for AI share of code.

It No Longer Separates Anyone

Even if I trusted the number, it has stopped doing any work.

A year ago, AI use was still new, and the number told you something real: whether the investment in these tools was paying off. Could AI actually produce code that survived review and made it to the primary branch? That was a genuine open question. Now almost every engineer on my team uses AI to write code and the code also gets merged into the primary branch. The metric tells me nothing.

It Counts the Wrong Part

Here's the part that bothers me most. Even a perfect, trustworthy number would only describe the easy part of the work.

Writing code may have been time-consuming, but other parts of the job — deciding what to build, breaking it down, testing, and knowing it's correct — were always just as important. With no extra scaffolding, agents seem to do a great job at writing code, while those other areas still need attention.

What AI share of code leaves out

Testing is a good example. Agents have gotten genuinely good at it — they'll write the tests, run them, and catch a lot on their own. But not always all the way. In a B2B SaaS product, the configurable parts are messy — feature flags, permission levels, settings that every customer wires up differently, sometimes in combinations that live only inside a single account. An agent can't check what it can't reach, so the last slice of testing still lands on a person.

A small example: we recently had a security-flagged issue in our frontend repo — a dangerouslySetInnerHTML that didn't need to be there, left over from making some text render as bold. The agent removed it cleanly, but broke i18n. When the agent was asked to match how we handle this elsewhere, it did a genuinely good job. Then it came to testing. The code lived in a widget that only renders if a dashboard is configured to include it. The agent couldn't trace which configurations would surface it, and couldn't click through to confirm the fix. So the last mile fell to a human — configuring a dashboard to surface the widget, then verifying the fix held.

And testing is only one example. Research, planning, review — the parts that were hard to begin with — now have AI in the loop too. But that hasn't made them cheaper. The human time to steer and verify each step has gone up, especially when you want a satisfactory result and not to get sucked into iteration cycles. AI share of code captures none of it. It tracks the one part of the job that shrank and stays silent on the parts that didn't.

What I'm Tracking Instead

I didn't stop tracking. I stopped tracking the typing.

Out: the typing. In: the work around it.

Feature velocity: are features reaching customers faster? But velocity isn't engineering's alone to move — a feature still waits on a PRD from product and a design from UX before code is even the bottleneck. Coding agents compress the engineering slice; they don't touch the rest. So the out-of-the-box lift they give to writing code doesn't automatically translate into features shipping faster, and a flat feature count isn't a verdict on engineering by itself. If the number isn't moving, look one layer down — is the developer experience improving, or is the experimentation aimed at improving it picking up? The payoff often shows up in how the work gets done before it shows up in what ships.

Quality: how are the features holding up in the QA environment? Unlike feature velocity, engineering has a stronger grip on quality. However, quality doesn't get better on its own — agents are good at writing tests, but that doesn't always translate into a better experience for users. The out-of-the-box boost that coding agents give feature velocity isn't the same boost they give quality. So keep an eye on whether quality is keeping up.

Exploration: one of the clearest things AI changes is how far an engineer can move outside familiar ground — into a repository they've never opened, a part of the system they don't own. Measure that directly: are engineers contributing to codebases that are new to them? Of course, this takes some cultural change too — engineers have to feel safe experimenting and stepping into code they don't know.

Autonomous agent completion: watch how much agents can take off people's plates end to end — the low-hanging fruit, and maybe some harder problems too, resolved without someone driving. That share is worth tracking in its own right.

Apparatus quality: Dex Horthy calls it apparatus engineering⁷ — the discipline of building and wielding the skills, rules, and hooks that shape how a coding agent behaves. How well a team is doing that work is a signal in its own right. The catch is that once "number of skills" becomes the target, people start adding skills just to grow the count, and a bloated apparatus makes the agent worse, not better. So watch that it's growing thoughtfully and that it's actually helping, not just that it exists.

None of this shows up in a share-of-code number. Which is the whole point. The things worth tracking now are the things that metric was never able to see.

You Can't Really Ground It

It No Longer Separates Anyone

It Counts the Wrong Part

What I'm Tracking Instead

Footnotes