How to Build an MCP Server: Production Setup in 60 Minutes

Build a production MCP server step-by-step. Skip the framework overhead. Real code, real examples, shipping today.

MCP servers are how AI systems actually talk to your tools. Not through some middleware abstraction, not wrapped in a framework. Directly. An MCP server is a standardized interface that lets AI models like Claude read files, run commands, and query APIs without vendor lock-in. You write one server, and Claude, Cline, Claude for VS Code, and any future client can use it. Unlike LangChain or AutoGen, the Model Context Protocol doesn't lock you into a runtime. Most tutorials give you a toy example. This one gets you to production in an hour.

What MCP Actually Is (And Why It Matters)

MCP (Model Context Protocol) is a spec, not a framework. Anthropic built it; Claude SDK and third-party apps consume it. Full stop.

Here's what that means in practice. You define tools (functions the AI can call), resources (files or data the AI can read), and prompts (system instructions the AI inherits). Claude sees your tools the same way it sees built-in functions. No wrapper layer. No custom routing. Real example: Stripe uses MCP to let Claude query live transaction data. You can do the same with your internal APIs.

The difference between MCP and agent frameworks matters. CrewAI and LangGraph are orchestrators designed to manage multi-step workflows with memory and state. MCP is simpler. It's a protocol. You own the orchestration logic. You decide how Claude uses the tools. If you only need Claude to call a function and return the result, use MCP. If you need multi-agent workflows with memory across sessions, use LangGraph.

Production rule: MCP servers are stateless and infinitely scalable. Agent frameworks store state, which means you need a database and session management. Pick the tool that matches your use case, not the one that sounds impressive at a pitch meeting.

MCP Server Architecture: The Minimal Path

An MCP server is a single process that listens on stdin/stdout or a socket and receives JSON-RPC 2.0 requests. It sends back responses. That's it.

Most production servers are 200 to 400 lines of Python or Node. No framework bloat required. The Anthropic SDK handles transport and serialization. You write the business logic. Real example: a server that reads your Supabase schema, executes safe queries, and returns results takes about 150 lines.

Your server defines three types of capabilities. Tools are functions the AI can call ("query the database," "send an email"). Resources are files or data the AI can read ("the current customer list," "today's sales pipeline"). Prompts are system instructions the AI inherits ("always check permissions before executing queries"). You register each one with a handler function. When Claude calls a tool, your handler runs. It returns data or an error. Claude sees the response and continues the conversation.

The architecture is stateless. Your server doesn't remember the previous conversation. Every request is independent. This matters because it means you can scale horizontally. Run three instances behind a load balancer. They're all identical. No session stickiness. No cache coherence problems.

Building Your First MCP Server: Step-by-Step

Start with the MCP Python SDK. pip install mcp. Install the Node SDK if you prefer JavaScript. Both are equally straightforward.

Define your server class. Inherit from Server. Register tool handlers using the @tool decorator. Here's a real example: create a tool that takes a customer email, queries Apollo for company data, and returns it. That's 20 lines of code.

from mcp.server import Server

server = Server("my-server")

@server.tool() def lookup_company(email: str) -> dict: """Fetch company data from Apollo for a given email.""" # Your business logic here # Call Apollo API, validate input, handle errors return {"company": "Acme", "employees": 150}

server.run()

Test locally with mcp inspect before deploying. This validates your tool signatures and catches typos immediately. Don't skip authentication. Use environment variables for API keys. If your server talks to an internal database, require a bearer token. Treat it like an API. Because it is one.

Connecting Claude to Your MCP Server

In the Claude SDK, instantiate a Client, pass your server's executable path or socket address, and call messages.create() with the tools parameter. Claude will see your tools in the same way it sees Claude's built-in functions.

If your server returns an error (bad API key, timeout), Claude retries twice. After that, it tells the user. Design your tools to fail gracefully. Don't throw a raw database error at the user. Catch it. Log it. Return something human-readable.

Real production setup: run your MCP server as a systemd service on a VPS (DigitalOcean, Hetzner, AWS). The Claude client connects via localhost or a secure tunnel. Logs matter. Use structured logging (json.dumps for Python, winston for Node). Grep by tool name and timestamp when debugging. We deployed a Supabase + MCP server to a $5 DigitalOcean droplet. It handled 8,000 requests per day with zero restarts.

Monitor tool latency. If a tool takes more than 5 seconds, Claude times out. Cache expensive queries in Redis. Apollo lookups, database joins, anything that could stall. A tool that returns instantly is a tool Claude will use. A tool that hangs is a tool Claude will stop calling.

Avoiding The Framework Trap: Why MCP Beats Agent Frameworks

Most agencies use LangChain or AutoGen because they're familiar or because they saw a tutorial. Wrong reason. Pick the tool because it solves your specific problem.

LangGraph is for stateful, multi-turn workflows where Claude needs memory across sessions. n8n is for no-code teams who need visual workflow building with audit trails. MCP is for integrations. APIs. Data sources. If your use case is "Claude calls a tool, gets data, returns it to the user," use MCP. The difference matters because MCP servers are stateless and infinitely scalable. Agent frameworks store state, which means you need a database and a way to manage sessions.

Skip the abstraction layer. Ship the protocol. You can always layer LangGraph on top if you need to later.

Deploying Your MCP Server: Real Production Setup

Three options. Pick one.

Option 1: VPS. Run it on a VPS as a persistent process. Use systemd or Docker. Point your Claude clients to localhost or a private tunnel. Cheapest. Most control. You own the ops.

Option 2: Serverless. Deploy it as a Lambda function or Vercel Function. Keep the server stateless so cold starts don't matter. Better for unpredictable traffic. Worse for tools that hold connections (database pools, long-running queries).

Option 3: Workflow platform. Use n8n to manage your MCP server's lifecycle and expose it as an API. This adds a layer but simplifies scaling for non-technical teams.

Real data: we deployed a Supabase + MCP server to a $5 DigitalOcean droplet. It handled 8,000 requests per day with zero restarts. That's production-grade. That's not luck. That's monitoring tool latency, caching expensive queries, and wrapping everything in error handling.

Common Mistakes And How To Avoid Them

Mistake 1: Tools that take too long. If your API call is more than 3 seconds, implement a cache or async queue. Streaming doesn't help here. Claude is waiting. Your tool either responds fast or Claude stops using it.

Mistake 2: Vague tool descriptions. Claude reads your tool's docstring. "Queries the database" is useless. "Fetches open sales leads from the Salesforce API and returns name, email, and deal stage" is clear. Invest 30 seconds writing descriptions that actually describe what happens.

Mistake 3: No error handling. If your tool crashes, Claude sees a transport error. Wrap everything in try/catch and return a descriptive error message. "API timeout" is better than a Python traceback.

Mistake 4: Trusting user input. Always validate tool parameters. If a user asks Claude to delete records, make sure your tool checks permissions first. MCP doesn't have built-in authorization. You do.

Mistake 5: Shipping without monitoring. Add structured logs. Track tool call frequency. Alert if error rates spike. The first time you debug a production issue at 2 a.m., you'll wish you'd invested 30 minutes in observability.

When To Use MCP vs. Other Approaches

Use MCP if Claude needs access to APIs or data sources. You want a single integration that works across Claude, Cline, and future clients.

Use Claude Agent SDK or the vanilla SDK if you're building inside a Claude project and don't need cross-platform compatibility.

Use n8n or Zapier if your team is non-technical or you need visual workflow building with audit trails.

Use LangGraph if you're building multi-turn, stateful agent workflows where Claude needs memory across sessions.

Most agencies skip MCP entirely because they don't know it exists. That's your competitive edge. You can ship integrations that work everywhere.

FAQ

What's the difference between MCP and LangChain?

MCP is a protocol. LangChain is a framework. MCP defines how an AI system talks to a tool. LangChain is a library that helps you build agents using that tool (or many tools). You can use LangChain to orchestrate MCP servers, or you can use MCP without LangChain. They're not competitors. LangChain is a full orchestration layer with chains, memory, and state management. MCP is just the connection spec.

How do I test my MCP server before deploying?

Use mcp inspect. It connects to your server and shows you all registered tools, resources, and prompts. It validates that your tool signatures are correct and that your server responds to requests. Run it every time you add or change a tool. Catch bugs locally, not in production.

Can I run multiple MCP servers at the same time?

Yes. Each server runs as its own process. Claude can connect to multiple servers. You might run one server for Salesforce integrations, another for billing, another for data analysis. Each server is independent. Each listens on its own port or socket. This is how you scale beyond a single server's tool count.

If you want to talk through applying this to your stack, book a strategy call at cognival.co/book.

Want to apply this to your business?

30-min strategy call. No pitch, real look at your stack.

Book a strategy call →