Blog · AI & platform

Knowledge Architecture for AI Coding Agents: Beyond the AGENTS.md File

AGENTS.md is the entry point. But the quality of an AI coding harness is determined by what the file points to — the docs directory, its structure, its ownership model, and the mechanisms that keep it true over time. Most teams build the entry point and skip the architecture.

What is knowledge architecture for AI agents?

Knowledge architecture is the discipline of deciding what an AI coding agent is allowed to know, how that knowledge is organised, how it's accessed, and how it stays accurate over time.

It is distinct from context engineering (what information is assembled for a specific request) and from prompt engineering (how instructions are phrased). Knowledge architecture is the upstream layer: the repository of truth the agent draws from, built and maintained before any specific request is made.

The distinction matters because context engineering and prompt engineering are per-request concerns. Knowledge architecture is a persistent system. Getting it right once compounds across every subsequent agent run. Getting it wrong — typically: one large file, no ownership, no staleness mechanism — compounds quietly in the other direction.

Writing for a system reader, not a human reader

Most engineering documentation is written for a human who skims, infers, and asks follow-up questions. AI agents don't skim. They read linearly within a context window, treat everything they read as authoritative unless explicitly told otherwise, and cannot ask clarifying questions at the moment they encounter ambiguity.

The properties of documentation written for an agent reader:

PropertyHuman docsAgent docs
Rule placementRule explained after context and rationaleRule stated first, then rationale (agent acts on what it reads first)
Ambiguity toleranceReader infers intent from contextZero tolerance — ambiguity produces unpredictable behaviour
LengthComprehensive is rewardedConcise is rewarded — token budget is finite
ExamplesHelpful but optionalEssential — rules without examples are interpreted inconsistently
Counter-examplesRarely includedCritical — agent needs to know what NOT to do as much as what to do
Freshness signalImplicit (version history)Explicit — Last-reviewed date and owner in each file

The docs directory: structure that works

The docs directory is the harness knowledge base — the system of record for conventions, architectural decisions, and operational procedures the agent should follow. It should be structured for findability by both the agent and the engineers who maintain it.

Recommended structure

docs/
├── conventions/          # How code should be written
│   ├── index.md          # Overview + links to each rule file
│   ├── naming.md
│   ├── logging.md
│   ├── errors.md
│   ├── file-size.md
│   └── anti-patterns.md
│
├── architecture/         # Why the system is built the way it is
│   ├── index.md          # Overview + links to ADRs
│   ├── adr-001-database.md
│   ├── adr-002-auth.md
│   └── adr-003-api-versioning.md
│
├── testing/              # What and how to test
│   ├── strategy.md
│   ├── fixtures.md
│   └── coverage-targets.md
│
├── operations/           # How to run and change the system
│   ├── runbooks/
│   └── incident-patterns.md
│
└── integrations/         # How external systems connect
    ├── index.md
    └── [service-name].md

Key structural decisions that affect agent behaviour:

  • One concern per file. A file called "conventions.md" that covers naming, logging, errors, and file structure is too broad. When the agent loads context for a specific task, it loads files — not sections of files. Narrow files mean more targeted context loading.
  • Index files at every directory level. An index.md that lists what's in the directory and when each file was last reviewed gives the agent a navigation layer without loading every file.
  • Stable file names. Links from AGENTS.md to docs/conventions/logging.md should not break when the file is updated. Rename-and-redirect is an anti-pattern in harness docs — stale links in AGENTS.md silently remove rules from the harness.

Ownership: who keeps each doc true?

Documentation without explicit ownership rots at a predictable rate: it starts accurate, becomes partially accurate as the codebase evolves, and eventually reflects a system that no longer exists. For human readers, partial accuracy is annoying. For agent readers, partial accuracy produces consistent errors.

Every doc file in the harness knowledge base should have three ownership attributes, ideally in the file header:

---
owner: platform-team
last-reviewed: 2026-05-01
review-interval: 90d
---

# Structured Logging Conventions

**Rule:** All log calls must use the structured logger...

The review interval should match how fast the relevant part of the system evolves. Conventions for a stable utility library change less often than conventions for a product that ships every week. Over-reviewing wastes time; under-reviewing silently introduces staleness.

Treat ownership failures as harness failures. If a doc hasn't been reviewed in twice its review interval, the harness is running on potentially false information. Flag it — don't silently let it persist.

Context scoping: what should the agent know per task?

Not every task requires the full knowledge base. Loading all docs for every task is wasteful (tokens) and counterproductive (noise). The knowledge architecture should support scoped loading.

Three context scoping patterns:

Task-type scoping

Different task types load different doc sets. A backend API change loads conventions/naming.md, conventions/errors.md, and architecture/adr-002-auth.md. A frontend component loads a different subset. The mapping from task type to doc set is itself a harness artifact.

File-proximity scoping

Load the docs most relevant to the files being modified. A change to the payments module loads the payment-specific runbook and the relevant ADR. A change to a utility loads the general conventions only.

On-demand retrieval

The agent receives only the index at the start of a task and retrieves specific doc files when it needs them. More complex to implement but produces the most token-efficient harness for large doc bases.

Staleness detection and correction

A knowledge base that can't detect its own staleness will drift from reality silently. The harness needs mechanisms to surface divergence between what the docs say and what the code does.

Three mechanisms in increasing order of automation:

  • PR checklist item. Every PR that changes a pattern or convention should include: "Did you update the relevant docs/ file?" Enforced by CODEOWNERS or PR template. Low automation, high coverage if the team follows it.
  • Scheduled staleness scan. A weekly CI job that reads each doc file, checks its last-reviewed date against its review-interval, and opens an issue for any file overdue. Makes staleness visible without requiring an agent.
  • Semantic divergence agent. A background agent that reads each convention doc, scans a sample of recently changed files, and flags cases where the code patterns diverge from the documented standard. More sophisticated — catches cases where the code evolved without updating the docs.

Knowledge vs rules: what belongs where

A common confusion: treating all harness content as documentation. But there are two distinct categories, and mixing them produces a system that enforces neither well.

Knowledge (docs/)

Architecture context. Rationale. Historical decisions. What the system is and why. Read by the agent to understand; followed by convention.

Agent reads, may not mechanically enforce

Rules (linters / CI)

Taste invariants. Format rules. Structural constraints. Things that are always true and can be checked automatically.

Mechanically enforced — agent cannot violate

The principle: if a convention matters enough that a violation should block a PR, it should be a linter rule, not a docs entry. Docs inform; linters enforce. A well-designed harness uses both — docs for context and rationale, linters for the invariants that must hold.

Frequently asked questions

How is knowledge architecture different from a wiki?

A wiki is optimised for human navigation and collaborative editing. A harness knowledge base is optimised for system consumption: narrow files, explicit ownership, machine-actionable rules, staleness detection. Wikis grow toward comprehensiveness; harness docs should stay minimal and precise.

Do I need a knowledge architecture if I only use AI for autocomplete?

Autocomplete tools read the surrounding code for context, not a structured docs directory. Knowledge architecture matters most for agentic tools that take multi-step actions. As you move up the autonomy ladder — from autocomplete to agent — the architecture becomes more important.

How do I handle knowledge architecture for a legacy codebase with no existing conventions?

Start by documenting what is true, not what you wish were true. If the codebase has three error handling patterns, document all three and mark one as preferred going forward. The agent needs to know what exists — including the patterns you're deprecating — to avoid amplifying them.

What is the minimum viable knowledge architecture?

An AGENTS.md under 100 lines that covers: what the codebase does, the tech stack, how to run it, and three to five explicit conventions with examples. A docs/conventions/ directory with one file per convention. A process for reviewing docs when a convention changes. That's enough to start. Build from there.

Related reading: AGENTS.md: the complete field guide, AI didn't remove engineering judgment — it moved it upstream, and the harness engineering checklist.