AI Share of Code Was a Useful Number. It Isn't Anymore.

There was a time when "AI share of code" numbers were the hottest headline. Anthropic started sharing it and now puts it at essentially 100% for themselves1, and then the big companies like Google picked it up — putting it in earnings calls2 and blog posts3, each one a little higher than the last. I started watching our own version of it too.

To keep it concrete, I settled on one definition: of all the code changed in our primary branch over a month, what percentage could be attributed to AI? I'd check it and use it as a rough understanding of how things were going.

I've stopped. Here's why.

You Can't Really Ground It

The first problem is that nobody agrees on what the number even is, because each tool can only count its own slice.

Take Cursor. It only counts code that changed inside a Cursor session4. Anything an engineer edits outside of it is invisible. So the number isn't "how much of our code is AI-written" — it's "how much is AI-written, in the parts Cursor happened to be watching." I tried to get something firmer by looking at the AI share on our main branch. Cursor will give you that — not just the percentage, but the two numbers behind it: the AI-attributed lines on top and the total lines changed on the bottom. So I checked the denominator against our git logs for the same month. It was a tiny fraction of what git showed had actually shipped. And for some engineers, the main-branch AI share came back as zero, even though they were clearly using Cursor a lot — the tool just can't connect their commits to what landed on main.

Other tools draw the line in other places. Claude Code attributes more carefully5: it checks at the merge boundary — what actually lands when a PR is merged — and normalizes each line, counting only "effective lines" with more than 3 characters, excluding empty lines and trivial punctuation. It's conservative on purpose. Codex doesn't seem to report an AI share of code6.

Now multiply that by every tool an org uses at once — Cursor, Claude Code, Codex, and more. You can't add the numbers up: they count different things at different points, double-count some code and miss the rest. The single org-wide figure leadership wants doesn't really exist.

I'm curious how Google arrived at the 75%3 figure for AI share of code.

The single org-wide figure leadership wants doesn't really exist.

It No Longer Separates Anyone

Even if I trusted the number, it has stopped doing any work.

A year ago, AI use was uneven, and the number told you something real: whether the investment in these tools was paying off. Could AI actually produce code that survived review and made it to the primary branch? That was a genuine open question. Now almost everyone uses AI to write code. The metric tells me nothing.

It Counts the Wrong Part

Here's the part that bothers me most. Even a perfect, trustworthy number would only describe the easy part of the work.

Writing code may have been time-consuming, but other parts of the job — deciding what to build, breaking it down, testing, and knowing it's correct — were always just as important. With no extra scaffolding, agents seem to do a great job at writing code, while those other areas still need attention.

Testing is a good example. Agents have gotten genuinely good at it — they'll write the tests, run them, and catch a lot on their own. But not always all the way. In a B2B SaaS product, the configurable parts are messy — feature flags, permission levels, settings that every customer wires up differently, sometimes in combinations that live only inside a single account. An agent can't check what it can't reach, so the last slice of testing still lands on a person.

A small example: we recently had a security-flagged issue in our frontend repo — a dangerouslySetInnerHTML that didn't need to be there, left over from making some text render as bold. The agent removed it cleanly, but broke our internationalization. When the agent was asked to match how we handle this elsewhere, it did a genuinely good job. Then it came to testing. The code lived in a widget that only renders if a dashboard is configured to include it. The agent couldn't trace which configurations would surface it, and couldn't click through to confirm the fix. So the last mile fell to a human — configuring a dashboard to surface the widget, then verifying the fix held.

And testing is only one example. Research, planning, review — the parts that were hard to begin with — now have AI in the loop too. But that hasn't made them cheaper. The human time to steer and check each step has gone up, especially when you want a satisfactory result and not to get sucked into iteration cycles. AI share of code captures none of it. It tracks the one part of the job that shrank and stays silent on the parts that didn't.

What I'm Watching Instead

I didn't stop measuring. I stopped measuring the typing.

Adoption: are more engineers reaching for an agent every day, and as their default first move? Watch what they use it for — quick fixes, full features, tests — and where — throwaway scripts or the repos that actually matter. When both widen, it's taking hold.

Feature velocity and quality: are features reaching customers faster, and do they hold up once they land? But a flat feature count isn't a verdict on its own. If the number isn't moving, look one layer down — is the developer experience improving, or is the experimentation aimed at improving it picking up? The payoff often shows up in how the work gets done before it shows up in what ships.

Reach: one of the clearest things AI changes is how far an engineer can move outside familiar ground — into a repository they've never opened, a part of the system they don't own. Measure that directly: are engineers contributing to codebases that are new to them? Of course, this takes some cultural change too — engineers have to feel safe experimenting and stepping into code they don't know.

Autonomous agent completion: watch how much agents can take off people's plates end to end — the low-hanging fruit, and maybe some harder problems too, resolved without someone driving. That share is worth tracking in its own right.

Apparatus quality: Dex Horthy calls it apparatus engineering7 — the discipline of building and wielding the skills, rules, and hooks that shape how a coding agent behaves. How well a team is doing that work is a signal in its own right. The catch is that once "number of skills" becomes the target, people start adding skills just to grow the count, and a bloated apparatus makes the agent worse, not better. So watch that it's growing and that it's actually helping, not just that it exists.

None of this shows up in a share-of-code number. Which is the whole point. The things worth watching now are the things that metric was never able to see.

Footnotes

  1. Anthropic on the share of its own code now written by AI (podcast)
  2. “Over 30% of Google’s new code is now AI-generated” — Sundar Pichai (Moneycontrol)
  3. Sundar Pichai, Google Cloud Next 2026 (Google blog)
  4. How Cursor tracks AI code in git commits (Cursor docs)
  5. How Claude Code attributes AI-written lines (Anthropic docs)
  6. OpenAI Codex enterprise governance (OpenAI docs)
  7. Dex Horthy on “apparatus engineering” (podcast)