The Expert Illusion

Every agent framework, every prompt template, every “awesome prompts” repo starts the same way: “You are an expert software engineer with 20 years of experience…”

It feels right. It sounds authoritative. And according to a growing body of research, it does almost nothing.

What the research says

An EMNLP 2024 paper — “When ‘A Helpful Assistant’ Is Not Really Helpful” — tested 162 different persona roles across 2,410 factual questions on four LLM families. Their finding: persona prompting adds no measurable improvement on accuracy-based tasks. Automatically selecting the “best” persona per question performed no better than random selection.

A Vanderbilt/William & Mary study testing across 4,000+ benchmark questions on GPT-3.5 and GPT-4 found the same split: expert personas help slightly on open-ended tasks (brainstorming, creative writing, advice — gains of 0.3–0.9 on their scoring scale) but show no effect on closed or factual tasks. Multi-agent persona setups sometimes increased hallucinations.

Separately, Araujo et al. (2025) showed that adding irrelevant persona details — a name, colour preferences, a backstory — degraded accuracy by up to 30 percentage points on objective benchmarks. The model latches onto the narrative noise instead of the task.

The intuition makes sense once you think about it. When you tell a model “you are an expert Python developer,” you’re not giving it new knowledge. The model already has all the Python knowledge it was trained on. You’re adding a narrative wrapper around the same capability. It’s like putting a lab coat on a doctor who’s already board-certified — it might impress the patient, but it doesn’t change the diagnosis.

What actually works

The contrast with what does improve performance is instructive:

Specific procedures beat general personas. “Review this code for SQL injection vulnerabilities by checking every user input path” outperforms “You are a world-class security expert.” The first tells the model what to do. The second tells it who to be — and it already is.

Structured reasoning beats role-play. Chain-of-thought prompting, where you ask the model to show its work step by step, consistently outperforms persona-based prompting across every benchmark. The improvement comes from changing how the model thinks, not how it identifies.

Checklists beat expertise claims. In my own agent orchestration setup, the code review agents that work best aren’t the ones with the most elaborate “you are a senior staff engineer at Google” preambles. They’re the ones with specific review checklists: check for OWASP top 10, verify error propagation paths, identify missing test coverage for edge cases. The value is in the protocol.

Why this matters for agent builders

The agent ecosystem is drowning in persona-first design. Browse any agent marketplace and you’ll find hundreds of agents that are essentially “You are an expert in X” followed by a knowledge dump the model already has. It’s prompt engineering as interior decoration — it looks like you did something, but the load-bearing walls haven’t moved.

If you’re building agents for production use, the research points to a different approach:

  1. Write procedures, not personas. Instead of “you are an expert debugger,” write “reproduce the error, isolate whether it’s in the tool or the prompt, form one hypothesis, test it, verify the fix AND check for regressions.”

  2. Add knowledge the model lacks. Your codebase conventions, your deployment constraints, your team’s past mistakes — this is context the model genuinely doesn’t have. A paragraph of real project context outweighs a page of expert persona.

  3. Constrain the output, not the identity. “Return findings as Blocker/Major/Minor with one sentence each” shapes behaviour more than “you are a meticulous code reviewer who never misses a bug.”

The irony is that the simplest, most boring prompts — the ones that just say what to do, in what order, with what constraints — consistently outperform the elaborate persona constructions that dominate the prompt engineering discourse.

The expert illusion persists because it feels like it should work. We’re pattern-matching to how we’d brief a human consultant: establish credibility first, then ask the question. But LLMs aren’t consultants. They don’t have impostor syndrome to overcome. They don’t need to be told they’re qualified before they’ll commit to an answer. They just need clear instructions.

Drop the lab coat. Write the checklist.