Three layers, one system
In 2023, "prompt engineering" was often treated as the whole job: find the magic system message, add a few examples, ship. As agents gained tools, memory, and side effects, failures moved elsewhere. Models hallucinated because the service catalog was missing, not because an adjective was wrong. They burned budgets because every turn resent a megabyte of instructions. They took destructive actions because nothing in the runtime said no.
Three terms now describe complementary work—each solves a different class of failure:
- Prompt engineering — shape behavior through instructions and examples in the message you send right now.
- Context engineering — decide what facts, history, and tool output land in the finite context window before the model reasons.
- Harness engineering — build the runtime around the model: tool loops, retries, policy, observability, and human gates so probabilistic output becomes dependable software.
Confusing them leads to expensive mistakes: polishing prose while the agent still cannot see ownership data, or stuffing more tokens into the window while nothing verifies tool results before a production change runs.

Prompt engineering: the ask
Prompt engineering is the craft of instructing the model clearly: role, constraints, output format, tone, and when to refuse. Few-shot examples, chain-of-thought nudges, and structured outputs (JSON schema, tool-choice hints) all live here. It matters. Ambiguous asks produce ambiguous actions.
It also has a ceiling. Prompts cannot invent facts that were never retrieved. They cannot undo a tool that returns stale JSON. They cannot substitute for an approval workflow when the blast radius is a customer database. Prompt engineering optimizes how the model interprets what it already has—not whether that material is true, complete, or safe to act on.
Invest here when: outputs are inconsistent for the same inputs, formatting breaks downstream parsers, or the model needs explicit boundaries ("never delete," "always cite the source field").
Stop here when: failures trace to missing data, wrong tools, or ungoverned side effects—no amount of rewording fixes a blank catalog.
Context engineering: the window
Context engineering treats the context window as a scarce, expensive resource you assemble deliberately. Instead of one static system prompt, you choose—per turn—what the model should see: relevant docs, service metadata, recent incident notes, prior tool results, compressed conversation history, and negative space (what to omit so signal stays high).
Typical techniques include retrieval over a knowledge base, graph-aware fetches (owners, dependencies, environments), summarization and compaction of long threads, progressive disclosure (metadata first, full instructions only when a skill activates), and hygiene on tool output (truncate logs, strip secrets, attach provenance). The goal is grounded reasoning: the model should argue from evidence your platform controls, not from weights alone.
Context engineering is where many teams discover token economics. Sending ten thousand tokens of runbooks on every "what's the status?" is a context problem, not a prompt problem. So is failing an on-call query because the agent never pulled the owning team from the catalog.
Invest here when: answers are generic, hallucination rates drop when you manually paste docs, or multi-turn sessions explode cost because nothing gets pruned or targeted.
Stop here when: the model sees the right facts but the loop still double-commits, skips verification, or bypasses policy—those are harness failures.
Harness engineering: the loop
A harness is everything that turns a chat completion into an agent: the orchestration loop (plan → call tool → observe → repeat), timeouts and retries, sandboxing, structured logging, eval hooks, cancellation, and the gates between suggestion and execution. Harness engineering is software engineering applied to unreliable components—much like you would wrap an external API you do not fully trust.
Concrete harness concerns include: which tools exist and who may invoke them; idempotency and dry-run modes; human-in-the-loop approvals for high-risk actions; comparing tool results to policy; circuit breakers when costs or error rates spike; and tracing so you can reconstruct why an agent restarted a service at 2 a.m.
Coding agents made the term visible: the model proposes edits, but the harness applies patches, runs tests, enforces directory boundaries, and stops the loop on failure. The same pattern applies to operational agents: the model proposes a runbook step; the harness checks RBAC, opens a change ticket, and records audit fields before anything touches production.
Invest here when: the agent can act on the world, costs scale with users, you need SOC-friendly audit trails, or "it worked in the demo" does not survive parallel users and partial outages.
How they stack
Think of a single turn flowing downward: the harness decides whether a turn may run and which tools are available. Context engineering fills the window with the right slice of your estate. Prompt engineering tells the model how to use that slice (format, caution, priorities). After the model answers, the harness again—validates, executes, logs, or blocks.

Skipping a layer shows up predictably. Prompt-only agents sound confident and know nothing. Context-rich but harness-free agents know plenty and still break prod. Harness without context becomes rigid automation with a language model lipstick—expensive and brittle.

A practical maturity ladder
- Copilot / Q&A: prompt engineering plus light context (paste docs, small RAG). Harness is mostly rate limits and logging.
- Tool-using assistant: context engineering becomes mandatory—tool outputs and retrieval must be curated per turn. Harness defines the tool surface and error handling.
- Operational agent: harness engineering dominates—approvals, shared policy with humans, idempotency, and the same actions whether the user types in a console or an IDE via MCP.
Most platform teams are climbing from (1) to (3) right now. The hype cycle still markets (1) skills; production pain lives at (2) and (3).
Where a control plane fits
Day 2 Ops agents fail when context is fragmented across wikis, tickets, and tribal knowledge, and when the harness in the IDE diverges from the harness in the runbook console. A unified platform separates concerns without splitting truth: catalog and integrations feed a durable graph (context substrate); governance and approvals wrap actions the same way in every channel (harness). Prompt templates still matter—for tone, output shape, and safety copy—but they ride on top, not instead of, engineering discipline.
For token-heavy agents, pair context engineering patterns like progressive disclosure with a harness that measures cost per workflow—not just per request.
Prompt
Clear instructions for format, refusal, and tool-use etiquette.
Context
Live services, owners, dependencies, and signals—not a one-off scrape into a vector store.
Harness
Same guarded actions, audits, and approvals whether the actor is a human or an MCP client.
Closing frame
Prompt engineering is not obsolete—it is the thinnest top layer. Context engineering answers "what should the model know for this turn?" Harness engineering answers "what happens when it is wrong, and who is accountable?" Production agents need all three; the teams that win treat them as separate specialties that share one runtime, not as synonyms for typing harder into a chat box.
Editorial—general discussion only.