Blog · AI & platform

AI Coding Entropy: What It Is, Why It Compounds, and How to Stop It

When a human engineer writes bad code, it spreads at human speed. When an AI agent writes bad code, it spreads at agent speed — across every file, every PR, every sprint. Understanding entropy in AI coding contexts is the difference between a codebase that improves over time and one that quietly falls apart.

What is AI coding entropy?

In thermodynamics, entropy is the tendency of a system to move from order toward disorder without external energy input. In software, entropy has always meant technical debt accumulation: inconsistent patterns, deprecated dependencies, violated conventions, and code that's harder to change than it should be.

AI coding entropy is the same phenomenon, accelerated. When AI agents generate code, they learn from the existing codebase as much as from their training data. If the codebase contains inconsistent patterns, the agent replicates them. If a deprecated approach is still present in 40% of files, the agent treats it as valid. If a bad pattern exists anywhere, it will exist everywhere.

The OpenAI harness engineering team used the term directly: without deliberate counter-pressure, entropy accumulates in AI-generated codebases faster than in human-written ones. The agent is not judicious. It does not distinguish between the code the team is proud of and the code the team left in place because there was never time to fix it.

Why it compounds faster than human entropy

FactorHuman-written codeAI-generated code
Pattern replication speedOne engineer, one PRAgent, hundreds of files per session
Judgment about qualitySenior engineers filter bad patternsAgent treats all existing patterns as valid
Bad pattern surface areaLimited by human throughputLimited only by context window size
Detection lagCode review catches it in the PRMay be in 50 files before review
Correction costFix the PRRefactor across the codebase

The asymmetry is stark. A human engineer makes a mistake in one place. An AI agent makes the same mistake consistently, at scale, in a way that looks intentional because it's consistent. Reviewers often approve consistent-looking code that is consistently wrong.

The five sources of entropy

1

Pattern drift from the existing codebase

The agent learns by example. If 30% of your files use one error handling pattern and 70% use another, the agent will use whichever appears more in the files it reads for context. It doesn't know which is the intended standard.

2

Outdated AGENTS.md or convention docs

If the documented standard was written six months ago and the codebase has evolved, the agent follows the document — not the actual current practice. It will consistently produce code that was correct in the past but isn't today.

3

AI slop: coherent but low-quality patterns

The OpenAI team coined this term for code that passes review — it compiles, tests pass, functionality is correct — but introduces subtle quality degradation. Unnecessary abstractions, verbose error messages, duplicate logic that doesn't warrant a pattern. Not wrong. Just expensive to maintain.

4

Missing mechanical enforcement

Rules written in AGENTS.md but not backed by linters or CI checks are advisory. The agent follows them when it happens to, and misses them when context is crowded. Without mechanical enforcement, the rule exists only probabilistically.

5

Legacy code left in place

Code that predates the current conventions is entropy waiting to spread. The agent reads it as valid pattern. Teams that accept 'we'll fix it later' accumulate a growing surface area of patterns the agent will replicate.

How to detect entropy before it compounds

Entropy in AI-generated codebases has early warning signals. Most teams ignore them until the debt is structural:

  • Multiple implementations of the same pattern. If you have three different ways of handling the same concern (logging, error handling, config access), the agent is not receiving a clear signal about which is canonical.
  • Reviewer comments that repeat across PRs. If code reviewers are leaving the same comment on multiple agent-generated PRs, the harness is missing a rule.
  • PRs that are "correct but something feels off." This is usually AI slop — technically valid but carrying quality debt that doesn't show up in tests.
  • Convention docs that haven't been touched in 90 days. Active codebases evolve. Documentation that doesn't evolve with them is stale. Stale docs produce entropy.

The three harness mechanisms that stop entropy

1. Taste invariants as linters

Every convention that matters should be mechanically enforced. Not written down — enforced. A linter that fails CI catches every violation, regardless of whether the agent happened to read the relevant section of AGENTS.md in the context window for this session. Linters are the only truly reliable harness mechanism.

2. Structured knowledge architecture

A short AGENTS.md that points to a structured docs directory. Docs that are narrow, owned, versioned, and written for system readers. Rules stated before reasoning. Counter-examples included. This prevents the agent from defaulting to inconsistent existing patterns because the authoritative pattern is always findable.

3. Continuous garbage collection

A background agent that runs on a regular cadence, scans the codebase for divergence from documented conventions, and opens targeted refactoring PRs. This prevents entropy from accumulating silently. Each PR is small, focused, and reviewable in under a minute.

Garbage collection: the mechanism most teams skip

Linters and good docs are preventive. Garbage collection is corrective. Both are necessary because no harness is perfect from day one — entropy accumulates during the period before the harness was built, and continues to accumulate in edge cases the harness doesn't yet cover.

The OpenAI team replaced their Friday cleanup sessions — which had been consuming 20% of engineering time — with background agents. The agents ran on a weekly schedule, found specific deviations from documented conventions, and opened PRs that each addressed one instance of one violation. Most PRs could be reviewed and merged in under a minute because they were narrow and targeted.

Garbage collection runs work best when they are:

  • Scoped to one convention per run. "Find all files that use console.log instead of the structured logger and replace them" — not "improve code quality."
  • Scheduled, not ad hoc. Weekly or bi-weekly runs on a cron schedule, not when someone remembers to run them.
  • Opening small PRs, not large ones. A PR that changes 200 files for one pattern is a liability. Ten PRs that each change 20 files are reviewable.
  • Informing the harness, not replacing it. If the same violation keeps appearing in GC runs, the linter is missing a rule. Fix the linter.

The amplification law

There is a useful mental model for thinking about AI coding quality: agents amplify whatever quality was already present in the codebase, good or bad.

HIGH QUALITY

Consistent conventions + linters + structured docs → agent amplifies quality → codebase improves faster than humanly possible

LOW QUALITY

Inconsistent patterns + no linters + stale docs → agent amplifies entropy → codebase degrades faster than humanly possible

The harness is the intervention point. It does not guarantee that the agent produces perfect code. It guarantees that whatever quality the harness encodes is applied consistently — and that quality compounds rather than entropy.

Frequently asked questions

Is AI coding entropy the same as technical debt?

Related but distinct. Technical debt is the accumulated cost of past shortcuts. AI coding entropy is the ongoing tendency toward disorder in systems without explicit counter-pressure. Technical debt is a stock; entropy is a flow. A harness addresses both: it reduces new entropy accrual (flow) and garbage collection reduces existing debt (stock).

Does this apply to all AI coding tools or just agentic ones?

It applies more severely to agentic tools (which generate large volumes of code autonomously) than to copilot-style autocomplete (where a human accepts or rejects each suggestion). But the pattern-replication problem exists at any scale of AI assistance.

How do I know if my codebase already has significant AI coding entropy?

Signs: multiple implementations of the same pattern with no clear winner; convention docs that haven't been updated in months; code reviewers leaving the same comments repeatedly on AI-generated PRs; a growing list of 'we'll fix it later' items that keep being deferred.

Can I fix existing entropy, or only prevent future entropy?

Both. Prevention is linters and docs. Remediation is garbage collection — background agents that find and fix existing violations. The order matters: build the linters first so that fixed code stays fixed. Running GC without linters means the violations return.

Related reading: AI didn't remove engineering judgment — it moved it upstream, AGENTS.md: the complete field guide, and the harness engineering checklist.