Notes
Engagement · Tooling

Why we ship CLAUDE.md on day one The agent-native scaffolding that goes into every Widal repo.

Nils Widal···7 min read

Every Widal engagement starts the same way. Before the first feature ticket, before the first migration, before any code lands on a branch, we commit a file at the root of the repo called CLAUDE.md. It is small, opinionated, and read by both humans and the coding agent on every session. By the end of the first week, it sits next to an eval suite and a set of CI canaries, and that bundle is the real deliverable of the week. The feature work that follows is downstream of it.

People sometimes read this as a tooling preference, a Claude Code-specific quirk. It is not. It is the place where we admit a shift that already happened: the model is not reading the prompt anymore. It is reading the repository. The unit of control moved with it.

Prompt engineering was a workaround for not owning the repo. Once you own the repo, conventions are the prompt.

What CLAUDE.md actually is

CLAUDE.md is a plain markdown file at the root of the project. Claude Code (and most other agent-native IDEs and CLIs) reads it on session start and treats its contents as persistent context, the way a senior engineer treats the README, the style guide, and the on-call runbook combined. It is not a system prompt. It is the part of the repo where you write down the things a new contributor would otherwise have to learn by breaking something.

The reason it goes in on day one, rather than after the first sprint, is that every untracked convention is one the agent will either invent or violate. Both are expensive. Inventing means you get plausible code that does not match the rest of the codebase. Violating means the next reviewer spends their afternoon undoing it. A two-page file fixes both.

What we put in it

The shape varies by codebase, but the sections are stable. We write them in roughly this order, because that is the order in which an agent (or a new engineer) needs them.

  1. Project goals. One short paragraph. Not the marketing pitch. The actual operating goal of the codebase this quarter. What it has to do, who it serves, what it absolutely must not break.
  2. Code conventions. Language version, package manager, lint and format commands, test command, import order, naming, error handling style, logging style, file layout. Boring and load-bearing.
  3. Named entry points. The handful of files and functions that anchor the system. The HTTP router, the job worker, the policy bundle loader, the seed script. With paths. An agent reading the repo cold does not need to grep for these.
  4. MCP servers in scope. Which Model Context Protocol tools are wired up for this repo, what credentials they have, and what they are allowed to touch. Anything not on the list is off-limits by default.
  5. Eval gates. The commands that must pass before a change is considered done, and the thresholds they enforce. Not just unit tests. Calibration runs, golden transcripts, adversarial cases, latency budgets.
  6. Do-not-touch zones. Paths the agent should never modify without an explicit human instruction. Migrations that ran in production. Generated files. Trusted policy bundles. Anything compliance-relevant.
  7. Human review process. When a draft is ready, what happens. Who reviews. What the PR template requires. How escalations work. What "done" means in this repo.

That is the whole file. It is not a wiki. If a section grows beyond a screen, it belongs somewhere else and the file links to it. The discipline is in keeping it the size a person actually re-reads.

A minimal skeleton

For a fresh repo we tend to start with something close to this and then specialize. It is intentionally short.

# CLAUDE.md

## Project
[one paragraph: what this repo is, who uses it, the
operating goal this quarter, the must-not-break list]

## Conventions
- Runtime: Node 22, pnpm 9
- Lint / format: pnpm lint, pnpm format
- Tests: pnpm test (unit), pnpm test:e2e
- Style: see /docs/style.md (imports, errors, logging)

## Entry points
- HTTP: src/server/router.ts
- Workers: src/jobs/registry.ts
- Policy bundle loader: src/policy/load.ts
- Seed: scripts/seed.ts

## MCP servers in scope
- filesystem (read-only on /docs)
- github (PR comments only)
- internal-eval (pnpm eval:* scripts)
Anything else: ask before wiring.

## Eval gates (must pass before "done")
- pnpm test
- pnpm eval:golden  (>= 0.95 pass)
- pnpm eval:adversarial  (no regressions vs main)
- pnpm eval:latency  (p95 <= budget in /docs/slo.md)

## Do not touch without explicit instruction
- /migrations/*  (production-applied)
- /policy/bundles/*.signed.json
- /infra/terraform/prod/*
- Anything under /generated/*

## Review
- Draft PR -> CODEOWNERS auto-assigns
- Required: green CI, one human reviewer, eval deltas posted
- Escalation: Slack #eng-oncall, then phone tree in /docs/oncall.md

From prompt engineering to repo conventions

The mental model worth swapping is this. A year ago, getting a model to behave meant writing a long system prompt with examples, warnings, role-play framings, and the occasional all-caps plea. That work is still useful in places, but it was always a substitute for context the model could not see. Coding agents can see the context now. They read the file tree, the lockfile, the tests, the CI config, the open PRs. They will read CLAUDE.md first if you put it there.

The result is a different center of gravity. Less time tuning instructions in a chat window, more time making the repo legible. Naming things well stops being a virtue and becomes a control surface. A clean module boundary is a safety boundary. A test named after the failure mode it prevents is a guardrail. The agent inherits the discipline you already practice as a senior engineer, and the cost of that discipline drops because the agent helps you maintain it.

The model reads the repo, not just the prompt. Make the repo readable on purpose.

What lives alongside it

CLAUDE.md is the index. Three other artifacts tend to land in the same week and they reference each other.

  • Skills. Small, single-purpose capabilities the agent can compose. We keep them in .claude/skills/ with one markdown file each: when to use it, what it does, what it does not do. The eval suite tests them in isolation.
  • Slash commands. Repo-local shortcuts for the workflows the team runs constantly. Things like /triage, /eval, /release-notes. Each one is a thin wrapper around a real script, not a magic incantation.
  • Tool registry. A single source of truth for which MCP servers and external tools are wired in, with credentials managed outside the repo. Anything the agent can reach for is on the list. Anything off the list is not available.

The point is that the agent's surface area is enumerable. A new engineer can read CLAUDE.md plus the skills folder plus the tool registry and know exactly what the agent can and cannot do in this codebase. That readability is what makes the system reviewable.

The week one deliverable

On a forward-deployed engineering engagement, the first week is not a feature week. The deliverable is a starter kit the customer team inherits and can keep using after we are gone. Three things, in this order.

  1. CLAUDE.md. Reviewed line by line with the team lead. Conventions, entry points, MCP scope, do-not-touch zones, review process. Signed off as the source of truth.
  2. An eval suite. Golden transcripts, adversarial cases, latency targets, and a small set of capability regressions, all runnable from one command. Wired to CI.
  3. CI canaries. A handful of always-on probes that would catch the most expensive failures we can imagine in this repo. Schema drift, secret leakage, regressions on the eval gates, missing provenance. Each canary has a named owner.

That kit is the floor. Every feature shipped after week one inherits it. Every contributor, human or agent, lands inside it. When we leave, the team is not holding a list of habits we modeled. They are holding files that enforce those habits.

Honest about the tradeoffs

CLAUDE.md is not magic. It is a file. Files rot.

The maintenance burden is real. A repo that grows past the file's assumptions starts to lie, quietly. The agent keeps following the old conventions; the team has moved on. We treat the file as code, with PR review, codeowners, and a quarterly re-read on the calendar. When the file disagrees with the code, one of them is wrong and you fix it.

Version drift is a related trap. The agent reads CLAUDE.md at session start, but it also reads the current state of the repo. If a convention in the file no longer matches the code, the agent will hedge, or worse, average the two. The fix is not a smarter agent. The fix is to keep the file true.

And the discipline cost is non-trivial. Writing down a convention is harder than holding it in your head. Naming the do-not-touch zones means admitting which ones exist. Specifying review means actually doing review. The file rewards teams that already want this and punishes teams that do not. We are fine with that tradeoff. The teams that want it tend to be the ones we want to work with.

The principle

An agent-native repository is one a senior engineer and a coding agent can both read in fifteen minutes and arrive at the same mental model. CLAUDE.md is the cheapest possible commitment to that property. Skills, slash commands, the tool registry, the eval suite, and the CI canaries are the rest of it. Together they convert a working style into a working repo.

We ship the file on day one because everything else we do that quarter is downstream of whether the repo is legible. If it is, the engagement compounds. If it is not, every PR is a re-litigation of conventions we already settled in the kickoff. The file is short. The discipline is long. Both are the work.

Related