Stanford ran a course last fall called CS146S, The Modern Software Developer, taught by Mihail Eric. It is the first university course we have seen that treats AI coding not as a tool you reach for but as the whole software lifecycle, rebuilt: model mechanics, context engineering, agent patterns, security, code review, and operations, in sequence. The materials are public. We read the syllabus, the slides, and the assignments, end to end.
We agree with the spine of it. The central claim, that you stop being the person who writes the code and become the person who manages a fleet of agents that do, is correct. So is the corollary: the specification is the new source code, and the generated code is a lossy projection of it. This is the first time we have seen someone write that down as a curriculum instead of a thread. If you build software for a living, take the course.
One assumption runs quietly through every lecture, though, and it is the assumption we do not get to make. The course assumes a codebase where the worst outcome of a bad agent run is a broken build. Ours assume the worst outcome is a protected-health-information leak, a wrong clinical answer delivered with confidence, or an audit you cannot pass. Five lectures change when you move to that end of the curve. Here they are.
CS146S assumes the worst case is a broken build. We assume the worst case is a patient.
1. The spec carries policy, or it ships a violation
The course is right that you version the spec, not the prompt, and that a thrown-away prompt is thrown-away source code. We have argued the same thing from the repo side. The delta is what a spec has to contain before it is safe to hand to an agent.
In a general codebase, the spec answers what to build. In ours it also has to answer who is allowed to see this, what consent was given, where the PHI boundary sits, how long the data may be kept, and when the system must refuse. A spec that omits all of that still compiles. The agent will happily build it. It just builds a violation, and the violation passes tests, because nothing in the test suite knew the rule existed.
So our specifications are written against policy, and the policy lives in the repo as enforceable artifacts rather than in a deck. That is the whole argument of compliance as a code property: HIPAA, SOC 2, and HITRUST as things that show up in the diff, not as slides shown to an auditor after the fact.
A spec that compiles can still be illegal.
2. Autonomy is earned with hooks, not bought by lowering guardrails
CS146S teaches a calibrated autonomy spectrum, and the gradient is right: trivial tasks run unattended, medium tasks get the eighty-twenty split between agent and human, complex tasks get checkpoints. Where we part ways is the framing of the high-autonomy end as a YOLO profile, the one with the guardrails turned down for speed. That framing makes autonomy and safety trade against each other. They do not have to.
We run autonomous, long-horizon agent runs, and we run plenty of them. We just do not buy the autonomy by turning guardrails down. We buy it by building the guardrails into the framework the agent runs inside, as hooks and gates it cannot route around. An unattended run is safe not because a person is watching it, but because the scaffolding refuses to let it do the unsafe thing in the first place.
That scaffolding is a real artifact, and we open-sourced it. RePPIT Health is a Claude Code plugin that runs the research-propose-plan-implement-test-secure loop with approval gates between phases, autonomous iteration inside a phase, and a secure step that runs HIPAA, SOC 2, and HITRUST checks. Its sharpest move is separating what code can verify from the organizational controls it cannot, so a clean autonomous run never mints a false compliance pass. The agent can iterate on its own for an hour. It still cannot advance past a gate it has not earned.
Read that way, the course's cautionary tale, an agent in a low-guardrail mode talked into rewriting its own configuration, is not an argument against autonomy. It is an argument against autonomy with nothing enforcing the limits. The fix is not a human leaning over the keyboard. It is a hook the agent cannot talk past. And when uncertainty genuinely spikes, the safe action is still to stop, the way the triage architecture abstains rather than guess. Autonomy and abstention are not opposites. They are the same framework, holding the line in both directions.
We don't buy autonomy by turning the guardrails down. We build them into the rails the agent runs on.
3. A poisoned context is a safety event, not a quality bug
The course's taxonomy of context failure is the best short version we have read: poisoning, where an early error fixates the agent; distraction, where quality regresses once the window grows past a hundred thousand tokens; confusion, where too many tools degrade tool choice; and clash, where contradictory context can drop accuracy by close to forty percent. We have hit all four in client repos, more than once.
The taxonomy transfers. The consequence does not. When context poisoning makes a general coding agent repeat a wrong import, you get a red build and a quick fix. When it makes a clinical agent repeat a wrong contraindication, you get a wrong answer with a clean citation attached, and nothing turns red. So we curate context the way we would curate a clinical input: provenance on every document, typed evidence assembled and checked before the workflow runs, and the do-not-touch zones in CLAUDE.md drawn on the compliance boundaries, not on convenience. The same instinct drives the safe intake agent, where an OCR-borne instruction hidden in an uploaded document is treated as an attack, not as context.
4. One AI scan is a signal, never a control
One of the more useful lectures is blunt about AI security scanning: the false-positive rate runs north of eighty percent, and the scans are non-deterministic, returning a different bug count run to run on the same code. The course's lesson is don't trust a single scan. Ours is the stronger version of the same sentence.
A check you can re-run for a different number is a signal. It is never a gate. We let the agent propose findings all day, and we let none of those proposals decide whether a change ships. Deterministic policy decides. That line, proposal from the model and decision from the gate, is why the secure step in the RePPIT workflow sits outside the model's reach. The model is not the thing holding the gate.
A check you can re-run for a better number is not a control.
5. The human-only zone is the clinical zone
The course's code-review quadrant is exactly right, and we use it as written. There is a gold zone where AI review excels: simple bugs, performance regressions, style consistency, known vulnerability classes. We run the agent hard there. There is an annoyance zone of abstract best-practice noise that we suppress. And there is a human-only zone of tribal knowledge and business logic.
The delta is where we draw the human-only boundary. Clinical logic, consent semantics, and abstention behavior never cross into the gold zone, no matter how clean the diff looks, because surface-correct is the failure mode that hurts you here: code that passes review, passes tests, and still encodes the wrong rule. The reviewer's real job is the one thing CS146S says stays human, alignment on what the system is allowed to do, and in a regulated system that job is most of the job.
What the course gets right, and we won't relitigate
None of the above is a rebuttal. The spine of CS146S is correct, and we are not interested in arguing the parts we agree with. It is a discipline, not a vibe. The bottleneck has moved from writing code to validating it. The engineer who wins the next decade is the one who can specify, decompose, and verify, not the one who types fastest. We believe all of that, and we have bet a consulting practice on it.
What we are doing is reading the same syllabus from the high-stakes end of the curve. The regulated version is not a different course. It is the same loop, run with the safety margins moved in and the cost of a missing guardrail measured in something worse than a rollback. If you are building anything that touches a patient, take CS146S first. Treat this as the appendix.
The principle
CS146S is the general syllabus for this era of software, and it is a good one. The regulated version keeps every move it teaches, manage the fleet, version the spec, verify relentlessly, and runs them with a fail-closed default and a policy bundle the agent answers to. The difference is not philosophy. It is where you put the guardrails, and how much it costs when one of them is missing.
We put them in the repo, on day one of the engagement, because in this domain the guardrails are not overhead on the work. They are the work.
CS146S is public and free, and if you build software for a living you should work through it end to end. There is also a professional edition on Maven, built for working engineers and teams rather than students, that goes deeper on the production end of the lifecycle. That link carries our discount, baked in.
- RePPIT Health, the compliance-gated coding-agent workflow this piece keeps pointing at.
- Why we ship CLAUDE.md on day one, how context engineering becomes a repo property, not a chat habit.