The instinct, when you notice a recurring problem with how an AI agent handles a task, is to add a rule to the instruction file. “Always save research output before continuing.” “Check for existing files before creating new ones.” “Confirm with the user before deleting anything.”
These rules work. They also cost tokens every time the agent runs, because the instruction file is loaded into context at the start of every session. The rule sits in the agent’s context window whether or not it’s relevant to the current task. And unlike code, a rule in an instruction file relies on the agent deciding to apply it — which means there’s always a failure mode where the agent, under task pressure or in an edge case, doesn’t.
The better abstraction, when the rule is mechanical rather than judgment-based, is a CLI wrapper.
A CLI wrapper enforces the rule at the tool level. The agent calls the tool; the tool produces the output; the output is automatically structured correctly. Zero deliberation. Zero token cost. The agent never has to think about the rule because the rule is baked into what the tool does. The structure is guaranteed by the tool’s behavior, not by the agent’s compliance.
This matters more than it first appears. Every mechanical decision baked into a tool is a decision the agent never has to make again. Every rule that lives in an instruction file is a decision the agent has to make on every invocation. The accumulation of these decisions is a meaningful fraction of the agent’s cognitive overhead — tokens spent on compliance rather than on the task.
The distinction is between what should be a judgment call and what should be a structural guarantee. Judgment calls belong in instruction files: “when the user’s request is ambiguous between X and Y, ask one clarifying question before proceeding.” The agent needs to apply judgment in context. A structural guarantee belongs in a tool: “every research output is saved to a structured file with a consistent schema.” The agent should never have to think about this; it should just happen.
The practical pattern that emerges: when you notice you’re adding the same mechanical instruction to multiple skill files, that instruction is telling you there should be a CLI. When you find yourself debugging why the agent didn’t follow a rule that you thought was clear, ask whether the rule is actually a judgment requirement (and needs better specification) or a mechanical requirement (and should be a tool).
A fleet of small, well-named CLIs with consistent behavior is a different kind of infrastructure than a long instruction file with many rules. The CLIs are reliable in a way that rules aren’t — not because the rules are poorly written, but because the rules require agent compliance and the tools require only agent invocation. The system that relies on tools for structure and instruction files for judgment is more robust than the system that uses instruction files for both.
There’s a deeper principle at work. An AI agent is a reasoning engine. You get the most value from that reasoning engine when it’s applied to the things that actually require reasoning — ambiguous situations, novel cases, judgment calls that depend on context. Every time the agent spends reasoning capacity on a mechanical decision that could have been enforced structurally, you’ve wasted that capacity on something a well-designed tool would have handled for free.
Build the CLI. Let the agent reason about the things that require reasoning.
P.S. The litmus test: if you could describe what the tool should do as a deterministic function — same input always produces same output structure — it should be a CLI. If the description requires the word “depends,” it should be in the instruction file. Most things that end up as rules in instruction files are actually deterministic functions waiting to become tools.