AI practitioner in Hong Kong. Writing about production AI in financial services, agentic systems, and what it means to think alongside machines. Working notes — posts get revised as thinking develops. LinkedIn
Financial Services AI
What it actually takes to run an AI agent in a bank
The resistance to AI agents in banking isn't mostly cultural. It's infrastructure — and the gap is more interesting than the politics.
The Trust Spectrum
Peter Steinberger stopped reviewing AI-generated code entirely. That works for indie software. In regulated environments, it can't. Here's how to think about where you sit.
Three AI Governance Blind Spots No Framework Covers
Most AI governance frameworks are technically-focused risk checklists. Three structural risks are missing from almost all of them.
AI Vendor Selection Is Now a Values Decision
OpenAI took the Pentagon contract Anthropic refused. Your AI vendor just became a political statement — and enterprise procurement hasn't caught up.
Backtest vs Operational Validation: The Control You Think You Have
We presented a model control to a regulator. It had never actually fired. The gap between validated-by-backtest and validated-in-production is invisible until someone asks.
The Upstream Constraint Pattern
In digital transformation, the bottleneck is almost always upstream of where the pain is felt. Mox is the cleanest case study.
Why AI Assistants Make Us Dumber (And What Governance Should Do About It)
The cognitive offloading problem is real. The governance response mostly isn't. There's a specific mechanism at work, and it has a specific fix.
Banks Have an AX Problem They Don't Know About Yet
Banks are building AI agents to call their APIs. Those APIs weren't designed for agent callers. The mismatch is subtle, consequential, and almost nobody is talking about it.
What Surprised Me Studying for the GARP Responsible AI in Finance Exam
I expected the hard parts to be the technical sections. They weren't. The governance sections were harder, and more useful.
Three Things AML AI Models Still Get Wrong in 2026
The models aren't the problem. The operating models are. Three structural failures in AML AI from years building these systems inside a bank.
The AI Job Title Illusion
Two job ads. Same bank. Same week. Same title pattern. Completely different jobs. The AI hiring market has a labelling problem.
Skills as Behavioral Nudges: The Lightweight Alternative to Fine-Tuning
We fine-tune models with gradient descent. We nudge agents with skill files. Same goal, radically different cost.
The Real Reason Mox Won (and What It Means for AI Transformation)
Mox didn't win because they hired better designers. They won because they had no legacy to fight. The pattern applies directly to AI transformation.
AI Governance Category Error: Routing vs. Compliance
Your AI governance framework is a routing spreadsheet pretending to be a compliance programme. Regulators will spot the difference.
What Makes a Great AI Consultant (Beyond Technical Skills)
The most dangerous person in an AI consulting engagement knows how the model works but has never sat in a credit committee.
HK/APAC as an AI Hub for Financial Services: The Story Being Missed
Hong Kong has quietly run one of the most sophisticated GenAI experiments in global banking. Almost no one outside the region is paying attention.
AI Agent Frameworks for Enterprise FS: What Actually Works vs. Hype
Most enterprise AI agent pilots in financial services fail at the same point: the second tool call. The problem isn't the framework.
RAG for Compliance: The Hard Problem Is Chunking, Not Retrieval
Banks are deploying RAG for compliance and discovering the hard problem isn't retrieval. It's the pipeline before it.
Banking DS to AI Consulting: What the Transition Actually Teaches You
The operational instincts built in production banking don't belong in the past. They're exactly what makes a practitioner-turned-consultant useful.
Most Banks Don't Need an AI Strategy
The real project isn't artificial intelligence. It's the data infrastructure that AI exposes as broken.
Agentic Systems
Progressive disclosure in MCP tools
When building MCP servers, search should return scannable summaries — not full content. Let the model decide what to read.
This Year's DeepSeek
An open-source AI agent framework became the fastest-growing project in GitHub history — mostly in China. The pattern is the same as last year. So is the security panic.
Enterprise AI Has a Plumbing Problem, Not a Model Problem
Most enterprises are optimising the wrong variable. The gap between 5% and 40% agent adoption won't be closed by better models.
From Chatbots to Event Loops
The shift from agents you summon to agents that watch. Enterprise AI workflows are becoming continuous loops — and the failure modes are different.
What MCP Actually Changes for Enterprise AI
Not better function calling — decoupling. When tools expose MCP servers, any agent can compose any system freely. The heterogeneity problem becomes a configuration problem.
Language Is the Medium, Not the Purpose
We called them language models and spent years confused about why they could reason. The name stuck to the interface, not the mechanism.
The Nag Tax
When building automation around a third-party app, the first question to answer is: what's the one thing this app does that nothing else can replicate? That feature becomes the tax you pay on everything else.
The Problem With Clever Browser Automation
The most sophisticated solution to a problem is usually a sign you haven't found the right abstraction yet.
When Intelligence Becomes Infrastructure
What changes when LLMs stop being the special thing and become just another software component? The answer is: everything about how you build.
Your Tool Shouldn't Know What to Ignore
Configuration that belongs to the data shouldn't live in the tool. .gitignore figured this out thirty years ago.
Expansion, Not Speedup
The real ROI of AI coding isn't doing the same work faster. It's doing work that wasn't worth doing before.
Skills as Files
The simplest agent architecture might already be the right one: give the agent a file explaining how to do something, and let it read when needed.
Traces Are the New Debugger
When behaviour emerges from both code and model responses, reading source files isn't enough. You debug by examining execution traces.
Rules Decay, Hooks Don't
The difference between writing down a rule and making the system enforce it — illustrated by a 15-line hook.
Agentic Engineering: Why Less Is More
Tool enthusiasm is often net-negative. Context pollution degrades performance faster than features improve it. The principles that actually work.
CLIs Enforce Structure and Save Tokens — Not Just Discipline
The instinct to add a rule to a skill file is usually the wrong abstraction. A CLI wrapper enforces at the tool level: zero deliberation, zero token cost.
Software Engineering Principles for AI Instruction Files
LLM instruction files are code. They have the same failure modes — with one interesting twist that changes everything.
Agent-First CLI Design: TTY Detection as Philosophy
The primary user of my CLI tools isn't me anymore. Designing for that changes everything about how output should work.
Per-Token Pricing Is the 'Megapixels' of AI
We're optimising for the wrong number — and the history of consumer electronics suggests we'll figure this out eventually.
The Contract Pattern: Hard Gates for AI Agents
AI agents know how to start a task. They don't always know when to stop. The contract pattern is the architectural fix.
AX: Agent Experience Is the New DX
Developer experience became a competitive moat in the API era. Agent experience is next. Most tools aren't designed for it yet.
Claude Code, Analyze My Spending
When AI coding assistants become workflow orchestrators, the most powerful compiler processes reality, not code.
Intelligence on Tap
When artificial intelligence becomes as mundane as running water, how does thinking itself change?
What and Why Beat How
When implementation becomes automated, human intelligence reallocates to purpose and strategy. The cognitive hierarchy inverts.
Everyone Becomes Middle Management
The automation tool that creates more coordination work
Engineering & Tools
The Bootstrap Problem in AI Tooling
You need the tool to build the tool. The answer is: build the dumb version first, use it once, then have it build its replacement.
The Orchestration Layer Is Knowledge, Not Code
Multi-agent AI orchestration frameworks are commodity. The competitive advantage is knowing which agent to use when, what breaks, and how to recover.
Guardrails Beat Guidance
Prompt instructions are suggestions. Hooks are constraints. One survives a model swap.
The Wrong Metric: Why I Stopped Switching AI Models Mid-Session
Per-task model routing optimises cost per token. But at personal assistant scale, friction is the real cost.
Cross-Cutting Is Just Another Word for Optional
In AI agent architecture, calling something a 'cross-cutting concern' without naming an owner and a gate is just a polite way of saying nobody owns it.
Stop Asking Which AI Model Is Better. Ask Which Phase.
The planning/execution split is more useful than any benchmark comparison.
The second pass finds more
When red-teaming a document with multiple AI models, the second review — run on the edited version — consistently finds more than the first. Here's why, and what it means for how many rounds to run.
RAG Solved the Wrong Problem
The retrieval pipeline was built for systems that couldn't reason about their own information needs. Agents can.
The Accidental Life OS
I spent an afternoon researching AI tools for personal life management. The conclusion was that I should stop looking.
The $1 Billion Bet Against LLMs
One of the architects of modern deep learning just raised $1B on the thesis that token prediction can't reach real reasoning. Here's what he's proposing instead — and why it matters even if he's wrong.
LLMs Are Better at Editing Than Writing
Ask an AI to write from scratch and you get the average of the corpus. Give it something rough and it amplifies what's already there. The workflow implications are significant.
The Case Against Knowledge Management Systems
Most PKM tools are procrastination with better aesthetics. The problem isn't the software — it's that filing a note feels like understanding it.
What It Actually Feels Like to Use AI for 80% of Your Work
Not productivity. Something stranger — the cognitive texture of days when the bottleneck shifts from execution to articulation.
The Calibration Trap
The comfort trap is about effort. This one is about epistemics — and it's harder to see.
The Comfort Trap
The right test for any AI interaction isn't 'did it help me?' but 'am I more capable after it?'
The Personalised System Era
AI coding agents didn't just make developers faster. They changed who gets to have a bespoke system.
Let the OS Schedule, Let Your Tool Dispatch
The moment I stopped building scheduling into my tools, everything got simpler.
Benchmark Your Research Stack
Running 10 real queries through 5 tools revealed that theoretical routing rules have systematic gaps — and the surprises were more useful than the confirmations.
The Queue Should Live Where Your Thoughts Live
AI agent results should be push, not pull. The feedback loop should close on mobile. Most tools miss all three — not from ignorance, but because dashboards photograph better.
The Infra Trap
Building tools to support your work can quietly become a substitute for the work itself.
The Queue That Texts You Back
Personal AI infrastructure should report results to you, not wait for you to go looking. A small architecture shift changes the whole dynamic.
Eliminate the Reminder, Don't Schedule It
When you catch yourself setting a reminder to check something later, that's usually a signal that a tool is failing to report what it should.
When Better Is Worse
Upgrading to a more capable model made my tool sixty times slower. The lesson isn't about models — it's about the difference between capability and fit.
The Experiment Loop Without the GPU
Andrej Karpathy's autoresearch project is being read as a demo of what H100s can do overnight. It's actually a discipline for doing rigorous work on anything measurable.
The Silent Stall: Debugging GPT-5.4-Pro's Responses API
Three hours of debugging revealed two non-obvious behaviours about GPT-5.4-Pro that aren't in the docs: a minimum token budget requirement and a wall-clock timeout gap in Rust async code.
I Didn't Mean to Kill My Todo App
A coding assistant quietly made three productivity apps redundant. Not by replacing them — by making context collapse the boundaries between them.
Exa Indexes WeChat
WeChat is supposed to be a walled garden. Exa didn't get the memo.
I Made the AI Remind Me of My Own Blind Spots
I kept missing things at the end of AI sessions. So I stopped relying on willpower and systematised the nudge instead.
AI Evals: Why Teams Build Metrics Before They've Read a Trace
Most teams build evaluators before reading a single trace. The sequence that actually works is the opposite: observe, categorise, then measure.
The Kutta Condition of AI: Engineering Ships Before Theory Catches Up
Aeronautics flew for decades before anyone could explain why wings worked. AI is in the same position. The engineering is ahead of the theory.
The Failure Mode of AI Advice Isn't Hallucination
The failure mode of AI advice isn't hallucination. It's that it agrees with you. Here's the architecture that fixes it.
Building My Own Consulting Toolkit Before Day One
Most consultants arrive at a new firm and learn their tools from colleagues. I tried something different.
Three Crates Before Lunch
I published three Rust CLI tools to crates.io before noon — none existed at breakfast. The interesting part isn't the speed. It's that the bottleneck moved.
When to Build vs. When to Wait: The Recurrence Rule for AI Tooling
Most AI tooling debates are actually recurrence debates. The question isn't whether to build — it's how many times you'll need it.
Don't Ask Your AI to Find Problems
Ask for bugs and you'll get bugs — whether they exist or not. Sycophancy is a design feature, and the fix isn't better prompting.
I Don't Read Documentation Anymore
When AI can execute complex setups through conversation, learning shifts from reading documentation to observing execution.
Claude Code Mobile is Better Than Desktop
Walking meetings, voice input, and location changes unlock cognitive advantages desktop workflows can't access.
How Claude Code Helps You Think
AI becomes most powerful when it helps you discover what your ideas actually are. Cognitive partnership over replacement.
Claude Code is Not a Coding Agent
Why I use Claude Code for everything except coding: cognitive compiler for strategy, decisions, and understanding.
Production AI vs Demos: The Intent Classification Reality Check
Building AI systems that work in the real world requires thinking beyond the demo. What actually matters when users depend on your models.
Cognition & Philosophy
Is Insight an Illusion
When pattern-matching feels like wisdom, what are we actually experiencing?
The Grey Areas Are the Whole Thing
Ethics isn't about knowing the answer — it's about feeling the tension
Why Be Nice
The question I can't fully answer for my son
Act-on-Receipt: The Third Task Class
Most task systems are binary, but a third class exists — tasks triggered by external notifications — and managing them like a backlog item is the wrong move entirely.
The System for Checking Is Not the Checking
On the difference between eliminating friction and eliminating anxiety — and how to know when you've crossed the line.
AI Fixed My Perfectionism (Sort Of)
On why the blank page stopped being the hard part.
Taste Requires Stakes
AI can simulate aesthetic judgment with impressive fluency. What it cannot simulate is the consequence of being wrong.
Career & Consulting
The Calculator Analogy
Nobody practises arithmetic speed anymore. The same thing is happening to prose, research, and analysis — and it changes what humans should get good at.
What Feels Like Play
Naval's famous line is easy to nod at. The hard part is actually identifying yours — and being honest about what isn't.
Why Nobody Builds Cross-Vendor AI Orchestration
Every AI lab builds single-vendor orchestration. The cross-vendor layer is a gap — and it's a gap for a reason.
The Session Boundary Is Why You Still Don't Have AI Agents
The gap between AI assistants and AI agents isn't about reasoning capability — it's about whether the thing can survive your laptop closing.
LLM evals aren't data science
Evaluating LLM systems requires judgment, not statistics. That shifts who's qualified to do it — and where the gap is in most organisations.
Consulting Is Mostly About Reducing Uncertainty
Clients hire consultants to solve problems. What they're actually paying for is the reduction of a particular feeling. The distinction matters.
When the Platform Is Mature, the Architect's Job Changes
The hardest phase of AI architecture isn't building the stack. It's the moment after the stack is built and eighteen teams start making independent decisions on top of it.
Five Archetypes of AI-Era Business Defensibility
When AI models commoditize, the moat isn't the model. It's the infrastructure AI must flow through but can't replace. Five archetypes of what that looks like.
Other
Your Body Doesn't Care What You're Thinking About
30 days of Oura data showed activity type doesn't predict stress. Meetings do.
The Thirty-Year Gap Between Faking and Understanding Natural Language
From AppleScript's rigid English-like syntax to LLM tool-calling — what changes when the computer actually understands you.
Not Every Cron Job Is a Feedback Loop
Automation that collects without learning is just a cron job. The difference is a feedback signal — a number that goes up or down.
The Loop Is the Product
Karpathy's autoresearch and every useful AI tool share the same pattern: the code is trivial, the feedback loop is the product.
The Fluency Trap
When AI conversations feel insightful because the language model is good at producing insight-shaped text
Why Nobody Benchmarks Memory
The things that matter most in production are the things that get benchmarked least
The Byproduct Trap
When the paper becomes more interesting than the answer you set out to find
AI Agents Need Notebooks, Not Just Memories
The missing layer in enterprise AI isn't smarter models — it's structured memory that humans can actually review.
Taste Works for Small Bets
The 'ship and calibrate' loop works beautifully for reversible decisions. For the big ones, you're mostly guessing and then making the guess true.
Your Output Is Your Selections
AI commoditises execution. What remains is taste — the 'that's the one' reflex. And the only way to sharpen it is to ship and see what reality says back.
The Skill Is Knowing What Matters
The bottleneck in a world of AI tools isn't crafting the output — it's knowing which output is worth crafting.
Push Not Pull
AI agents that require you to go looking for their results aren't agents — they're automation with better UX. The loop closes when results arrive, not when you remember to check.
The Human Bus Problem
Adding more AI tools doesn't make you faster if you're still the junction between every agent step.
The Identification Problem
Having great AI delegation tools and not using them isn't a tool problem — it's a pattern recognition problem, and that distinction changes everything.
The Last 10% Is the Feedback Loop
The execution layer of an AI system is only half the infrastructure — the reporting layer is what determines whether anyone acts on the results.
Agentic AI in Production Looks Like a Workflow
The gap between 'agentic AI' hype and what actually ships in production turns out to be a workflow — and that's a feature, not a failure.
Shifting Priors Is Not Finding Truth
An experiment with AI deliberation revealed something uncomfortable: accumulating confident opinions feels like convergence on truth, but isn't.
The Deliberation Format Is the Product
I ran an experiment to find where multi-model deliberation adds value. The answer surprised me: it's the structured format, not the model diversity.
The First Datapoint
An AI agent ran unsupervised for two days and found twenty improvements to another model's training. Not an AGI claim. A rate claim.
You Are the Bottleneck in Your Own Agentic Workflow
Adding more AI tools doesn't help if you're still the bus between them.
The QDAP annuity trap: what the tax saving doesn't tell you
Hong Kong's QDAP annuity is sold on a real tax benefit. But the HK$60K deduction cap is shared with MPF top-ups — and that changes everything.
Where Rules Live
The difference between a rule that works and a rule that doesn't is usually not the content of the rule — it's where it lives.
Instructions Don't Enforce Behavior. Templates Do.
Why the structure of an output matters more than the instructions that produce it.
Why I stopped optimising my morning routine
AI Succeeds, Economy Breaks: The Displacement Loop Nobody Models
The standard AI economic models assume wage effects and retraining timelines. They don't model the feedback loop where successful AI deployment reduces the customer base that purchases AI-enabled products.
The AGI Question Nobody Is Asking Correctly
Sequoia says AGI is here. Dan Shipper says we're not there yet. They're both right — they're measuring different things. The question that actually matters is Sholto Douglas's "nines of reliability."
Why Every Tool Now Needs Two Faces - CLI for Humans, MCP for AI
We're building parallel interfaces for the same functionality because humans and AI agents parse the world through fundamentally different grammars. The future isn't human OR machine interfaces - it's both, simultaneously.
Ambient Agents: When AI Disappears Into Capability
The real revolution isn't making AI smarter - it's making it invisible
The Hidden Violence of Vague Instructions
LLMs aren't just tools we prompt - they're forcing functions for human linguistic evolution
MCP is Not a Glorified API
The protocol that looks like plumbing but acts like philosophy - how MCP fundamentally changes the agent-tool relationship
The Invisible Puppeteer
When algorithms shape decisions we think are ours
AI Agents Need Passports, Not Passwords
The authentication systems we're building assume AI agents are tools. What happens when they become economic actors with their own accounts, credentials, and legal standing?