We fine-tune models with gradient descent. We nudge agents with skill files. Same goal, roughly ten thousand times cheaper.
This is one of those observations that sounds too simple to be interesting until you think through the implications for how enterprises should approach AI agent deployment at scale.
Fine-tuning — adjusting a model’s weights through additional training on domain-specific data — is the standard answer to the question “how do we make this model behave differently?” It’s powerful, it’s durable, and it’s expensive. A fine-tuning run requires curated training data, compute, careful evaluation to avoid degrading the model’s general capabilities while improving specific ones, and ongoing maintenance as the base model evolves. For large organisations with genuinely specific requirements, it’s sometimes the right answer.
But most organisations that think they need fine-tuning actually need something else: consistent, inspectable, correctable behavioral patterns that can be installed and updated without a training run.
That’s what a skill file does. A skill is a structured document that defines how an AI agent should approach a specific class of task — what to consider, what sequence to follow, what traps to avoid, what the output should look like. When the agent encounters a task that matches the skill’s scope, the skill’s guidance shapes how it approaches the work. The behavior change is immediate, transparent, and reversible. If the skill produces the wrong behavior, you edit the file and reload it. No training run required.
The iteration cycle is qualitatively different. Write the skill, observe the behavior, edit the file, reload. The complete cycle can happen in minutes. With fine-tuning, the equivalent cycle — collect data demonstrating the desired behavior, fine-tune, evaluate — takes days or weeks, and each iteration is expensive enough that you do it rarely and carefully.
There’s a deeper principle here that I think the enterprise AI community hasn’t fully processed. The act of writing a skill forces you to articulate what “good” looks like for a specific class of work. That articulation is valuable independently of the skill’s effect on agent behavior. Most organisations that think their AI systems behave inconsistently haven’t clearly specified what consistent would look like. The discipline of writing the behavior down — in enough detail that an agent can follow it — is a clarity exercise that reveals ambiguity in the underlying requirement.
What you can’t write down clearly, you can’t evaluate clearly. And what you can’t evaluate clearly, you can’t improve.
The enterprise application runs further than most current deployments recognise. Consider an organisation deploying AI agents across multiple business functions — compliance review, credit assessment, customer communication, code review. Each of these functions has institutional knowledge about what good looks like: what to check, in what sequence, with what level of documentation, escalating under what conditions. That knowledge lives in experienced practitioners, in training materials, in informal norms. It’s hard to transfer to an AI system through fine-tuning because it’s rarely codified in a form that serves as training data.
But it can be codified in skills. The compliance expert who knows what a high-quality regulatory review looks like can write a skill that installs that expertise in the agent. The credit analyst who has calibrated what counts as a legitimate exception versus a red flag can write a skill that makes those distinctions explicit. The result isn’t a fine-tuned model — it’s an agent that carries institutional knowledge that was previously unavailable to it.
The maintenance advantage compounds over time. When policies change, the skill changes. When the institutional understanding of good practice evolves, the skill evolves. When a new practitioner has a better approach, they can contribute a revision. The behavioral infrastructure stays current with the organisation’s actual understanding rather than being frozen at the point of the last training run.
This isn’t a substitute for fine-tuning in the cases where fine-tuning is actually the right answer — domain-specific language patterns, specific technical formats, consistent style at scale. It’s a recognition that a large fraction of the “we need to fine-tune” problem is actually a “we need to articulate what we want” problem, and skill files are a faster path to that.
P.S. The most useful forcing function for any enterprise AI deployment: before doing anything else, write the skills. Not to run them immediately, but because the process of writing them surfaces the ambiguity that’s going to cause problems later. The gaps in the skills are the gaps in the requirement. Better to find them in a text file than in a production incident.