Production MCP: A Practitioner's Guide

Most MCP writing in 2026 falls into two buckets: marketing for a vendor's product, or tutorials for a hackathon demo. This is neither.

After shipping nine production MCP servers at one mid-sized engineering firm over three months, a coherent framework emerged. The thesis: for mid-market companies (five to twenty key data sources, accessible domain experts, no Fortune-500-scale complexity), MCP plus your existing identity infrastructure is the entire AI platform. No RAG. No agent frameworks. No AI-platform vendors.

This guide walks the framework end to end. Each section links to a detailed post on the topic, and to the practitioner report for the full method.

Why most enterprise AI fails

MIT puts the enterprise AI failure rate above 80%. The industry blames data quality. That diagnosis is mostly wrong.

In the field, the data is usually fine. What's broken is meaning: the AI doesn't know what your data means, how the systems relate, which source answers which question. The technician field on the work order is empty, but the real technician name lives in the time-booking records of a parallel system, and nobody told the AI that.

Most "AI data problems" are documentation problems wearing a different name. Fix the documentation in the one place the model reliably reads on every call (the tool description) and the data starts working.

Full argument with concrete examples: Your Data Is Fine. Your AI Doesn't Know What It Means..

The six-level maturity ladder

Once you accept that meaning lives in tool descriptions, you can grade MCP servers by how seriously they treat that layer. After 52 tools across nine APIs, a ladder emerged:

Level 1: API Mapper (~70%). One tool per endpoint. One-sentence descriptions. The model invents and fails.
Level 2: Functional (~20%). Tools grouped sensibly, longer descriptions, no domain knowledge. The ceiling most commercial implementations aim for.
Level 3: Metadata-Rich (~8%). Knowledge graphs living next to the tool. The agent rarely reads side-channels; I built two of these layers and removed both.
Level 4: Self-Teaching (<2%). Domain knowledge lives inside the tool description, discovered by AI from real data, validated by experts. Production-ready.
Level 5: Interactive App (emerging). The server returns rendered UI.
Level 6: Secure Write App (frontier). The server writes back, gated by IdP, structured around explicit user intent.

Most public servers sit between 1 and 2. MCP isn't dead; most servers are empty.

Full breakdown: The Six Levels of MCP Servers.

Tool descriptions are the work

97.1% of MCP tool descriptions, across 856 tools in 103 servers (MCP Tool Descriptions Are Smelly!, arXiv:2602.14878), contain at least one critical smell: unstated limitations, missing usage guidelines, opaque parameters. That's not fringe. That's the baseline.

A tool description is not a sentence. It's an operational manual, structured into blocks, each one added because the agent failed without it. After 52 production tools, the pattern converged to eight: RETURNS, WHEN TO USE, WHEN NOT TO USE, QUERY STRATEGY, INTERPRETATION, EXAMPLES, CROSS-REFERENCES, FAILURE MODES.

Same data behind a Level 1 and a Level 4 tool. Different products entirely in production behaviour.

The eight-block pattern with examples: How to Write MCP Tool Descriptions: The 8-Block Pattern from 7 Production Servers.

Introspective Context Engineering for MCP

The hardest part of writing good tool descriptions is that the domain knowledge they require lives in your team's heads, not in any document. Asking a domain expert to dictate 500 words of operational guidance per tool produces dry, incomplete text. Asking them to review AI-discovered patterns produces precision in a fraction of the time.

That insight became a five-phase pattern I call Introspective Context Engineering for MCP (ICE):

Examine: point an AI at real data. Ask it to discover patterns, flag confusing fields, generate hypotheses.
Flag: the AI marks each pattern with a confidence level (certain, probable, uncertain).
Validate: the domain expert reviews. Confirms, corrects, or rejects.
Encode: validated patterns are written into the tool description and schema.
Iterate: production usage exposes new gaps; the cycle repeats.

This inverts the traditional metadata-curation pipeline. Instead of asking humans to write down everything (which doesn't scale), the AI asks questions and the human approves answers (which does). The output is structured domain knowledge in the only place the agent reliably reads it.

ICE is what makes Level 4 reachable in days rather than quarters. It's also why the pattern works for mid-market companies and stalls at Fortune 500 scale: it needs a domain expert who can sit with you for an afternoon, not a 40-person governance committee.

The full five-phase method with the feedback architecture: practitioner report.

The feedback loop

A good MCP server isn't built once. It evolves. Three patterns show up in production telemetry that no design session anticipates:

The agent calls the same tool three times with tightening filters → the description didn't say which filter to try first.
The agent invents a parameter that doesn't exist → the schema left ambiguity about what's available.
The agent uses a tool for a query that belongs to a different tool → both WHEN TO USE blocks need sharpening.

Each one is invisible without instrumentation. The pattern that works: every tool accepts a queryIntent string (one sentence from the agent describing what it's trying to find) and logs it alongside parameters. The logs reveal what the agent thought it was doing, which exposes the metadata gap exactly.

Fixes are small. Minutes per fix, not weeks per design cycle.

The full pattern with the queryIntent design: Your MCP Server Should Get Smarter Every Week.

What the MCP spec gets wrong

After 52 tools in production, six gaps in the current MCP spec became hard to ignore. The headline ones:

Resources are the spec's answer to reference documentation. No tested client surfaces them reliably. I built two Resource layers and removed both. Everything has to live in the tool description.
Enums are loose. Different clients render them differently; some don't show allowed values to the model at all. The agent invents values and the tool rejects them.
No standard for tool-level RBAC. Every team rolls its own auth pattern; few survive an enterprise audit.

Not reasons to abandon MCP. Reasons to write servers around the parts of the spec that work and push for changes where they don't.

Full list: Six Things the MCP Spec Should Fix.

MCP plus identity is the platform

This is the reframe most "do I need an AI platform?" conversations miss. The platform isn't a product you buy. It's two pieces you already have, composed differently:

A frontier model that speaks MCP (Claude, GPT, Gemini, protocol's the same).
The corporate identity layer the business already pays for (Entra, Okta, Google Cloud Identity).

Wire MCP servers through enterprise identity and the AI platform is built. Tools authenticate against the existing IdP, scope access by role, log every call to the audit trail every other corporate system uses. A field engineer sees only their own work orders. A controller sees aggregated financials, not raw payroll. Every action is attributable to a named identity.

That's the entire enterprise AI security model. No prompt-firewall vendor. No AI governance platform. No model gateway. Just access control the company already runs, enforced at the MCP tool boundary. Anthropic's 2026 roadmap leads with enterprise authentication. The pattern works.

Complete argument with the use/skip table: MCP Is the AI Platform.

Without an enterprise budget

Enterprise AI gets sold as something only large companies can afford. The path that justifies that price tag (custom chat UI, prompt platform, RAG pipeline, agent framework, model gateway, observability stack) commits a smaller company to a build budget, a platform team, an aging pinned model, and an unpredictable per-token bill, all at once.

The alternative is one structural decision: rent the surface, own the domain. Subscribe to the vendor's chat client at €19/seat/month. Connect your existing IdP. Put your engineering hours into MCP servers for the systems no vendor will ever connect for you. No frontend to maintain, no platform team to staff, no orchestration platform to buy. The model upgrades on the vendor's clock, for free.

The full architecture with the rent/own breakdown and the framework-absorption argument: Enterprise AI Without an Enterprise Budget.

Where to start

The recipe that worked, applied nine times:

Pick one domain expert who has too much to do. The accountant who keeps getting margin questions. The fleet manager who keeps getting asked where the vans are. The energy specialist who keeps pulling weather data manually. The person whose week would get visibly better if a specific question got an instant answer.
Pick one specific question that person gets asked every week. Not a category. One actual question, with a known correct answer, that takes them more than five minutes manually.
Build one MCP tool that answers it. Wrap whatever access the data lives behind (REST, GraphQL, SQL, file). Write the description as if explaining the data to a sharp new hire who has never seen this domain.
Wire it through your existing IdP so only that expert (and people they explicitly authorise) can call it.
Hand it to the expert. Watch what they try. Log every call. They will try things you didn't anticipate.
Update the tool description to fix what you saw. Add WHEN NOT TO USE blocks. Add examples. Tighten the QUERY STRATEGY. Ship the fix the same day if you can.
Pick the next question. Build the next tool. Add it to the same server if it's the same domain, a new server if it's a new domain.

Three months of that, applied across domains, produces a portfolio of working MCP servers and a team that uses them unprompted. Not a project plan. A practice.

The skip list, equally important:

Don't build a framework first.
Don't build a registry, a router, or an "MCP platform" first.
Don't try to "do AI strategy" before you've shipped one useful tool.
Don't ask permission. Domain experts will thank you; committees will slow you down.

The full method

This guide is the practitioner's overview. The complete framework with the Introspective Context Engineering for MCP method, the six-level maturity model, the WriteIntent pattern for secure writes, and the analysis of where the MCP ecosystem is heading is the practitioner report.

If you ship one tool that one colleague uses unprompted within a week, you've already crossed the line that 80% of enterprise AI projects never cross. The rest is repeating that loop until your business has one MCP-shaped layer instead of many integration projects.

The model is the agent. The IdP is the security boundary. MCP is the platform. Build for the platform, not around it.

Go deeper: read the full practitioner report, The Missing Layer, or explore the working code in the mcp-metadata-demo server.