← Back to blog

How to Deploy AI Agents to Production: Real Stack, Not Theory

Skip the framework hype. Deploy production AI agents with n8n, Supabase, and Claude API. Real architecture for founders and ops leads.

An AI agent that deploys to production does one thing: it receives a request, makes decisions based on external data, acts on those decisions, and logs everything that happened. Not a chatbot. Not a toy. A system that handles 100+ daily requests, tracks every decision, scales without melting your AWS bill, and doesn't hallucinate customer data into Slack.

Most guides skip this. They show you a framework demo in a notebook and call it "deployment." You need something real. Here's what actually works in 2026.

Why Most AI Agent Frameworks Won't Ship to Production

LangChain and CrewAI are abstractions on top of abstractions. They hide the failure modes you need to see. When your agent breaks in production at 2 AM, you'll be debugging someone else's error handling logic instead of understanding what your agent actually did.

Framework-first thinking locks you into vendor decisions. The framework owns your error handling. The framework owns your logging. The framework decides how tools are called and what happens when they fail. You inherit all of that production risk without visibility.

Real production agents need: observability from day one, stateful memory that doesn't corrupt, and cost controls that fire before your bill hits $50k. A framework gives you none of this by default. You'll bolt it on later, after the outage.

Name the actual problem first. Is this a scheduling agent that moves meetings around? A data retrieval agent that pulls Salesforce records? A decision agent that scores leads? Each has different deployment needs. Treating them all the same is where most projects fail.

The Minimal Production Stack: n8n + Supabase + Claude API

Skip the framework. Use three tools that do one thing well.

n8n handles orchestration and webhooks. No custom backend needed for basic flows. Deploy to n8n Cloud ($20/month on the paid tier) or self-host on a $10 DigitalOcean droplet. n8n knows how to retry, schedule, and fan out tasks. Use it.

Supabase (PostgreSQL + managed auth) stores agent state, decision logs, and conversation history. It costs $25/month for the starter tier and gives you a query language that works. No DynamoDB learning curve. No Redis mystery timeouts. Postgres works.

Claude API is the brain. Call it directly. No wrapper. No framework layer. You pay per token used. At scale, you'll spend $100–300/month depending on query volume. That's the entire inference cost.

This combo costs $200–500/month and gives you observability out of the box. An outbound AI agent making 50 calls per day breaks down as: n8n ($20/mo on paid tier) + Supabase ($25/mo) + Claude API pay-as-you-go (~$100/mo). Your whole infrastructure is $145/month. Double that if you need backups or higher throughput.

State Management: The Part That Breaks Your Agent in Production

Agents need memory that persists across requests and recovers if your infrastructure restarts. Most teams get this wrong.

Don't use in-memory caches like Redis as your primary state store. You'll lose decision history and have to rebuild context from nothing after a deploy or a crash. Redis is for sessions. Postgres is for truth.

Store conversation threads and decision logs in Supabase as JSONB columns. JSONB is queryable, version-able, and debuggable. A lead-scoring agent needs to remember why it rejected a prospect 3 weeks ago. That decision lives in Postgres, not in your Python dict.

Use a combination: Redis for session-level caching to reduce query volume, Postgres for a permanent audit trail. When a user asks your agent "why did you reject me?", you query Postgres and pull the exact LLM output and confidence score from that day. This is why you log everything.

Connecting Your Agent to External Tools Without Breaking It

MCP (Model Context Protocol) is the standard now. Use it to define what your agent can and cannot touch. Each tool call needs explicit retry logic, timeout handling, and a fallback if the external service is down.

Build your MCP server in TypeScript or Python, deploy it on Vercel or a lightweight VPS. Don't embed it in your agent process. When your agent calls Apollo for company data and Apollo is slow, your agent waits forever and burns tokens. Add a 3-second timeout and a cached fallback. The agent switches to yesterday's data instead of hanging.

Tools are where production agents fail most often. You ship without testing the failure case. The external API goes down. Your agent still tries to call it. Your token bill doubles. Implement graceful degradation from day one.

Observability: See What Your Agent Is Doing (or Not Doing)

Log every LLM call with input, output, tokens used, latency, and cost before you go live, not after. Use structured logging (JSON output) so you can query patterns later. "Show me all rejections where confidence was <0.6." You can't answer that question if you didn't log confidence scores.

Set up alerts: token spend exceeds $X per hour, error rate >5%, agent response time >30 seconds. Tools like Datadog or Axiom ingest your logs and surface patterns. CloudWatch will bury you in noise.

Here's the specific metric that matters: if your agent's average decision confidence is 0.72 last week and 0.58 today, something broke. This only shows up if you log it. Confidence scores are often ignored by teams, then they wonder why their agent started making bad decisions.

Cost Control: Stop Your AI Agent From Bankrupting You

Set hard limits per request, per user, per day. Not just "monitoring" them. Actually block the request if it exceeds the limit.

Use Claude Haiku for routing decisions, then escalate to Opus only if it matters. Haiku costs ~$0.00080 per 1K input tokens. Opus costs $0.015 per 1K. Routing with Haiku then Opus for complex cases cuts your bill in half [STAT_NEEDED: current Haiku vs Opus pricing as of 2026].

Batch requests where possible. One 100-item inference beats 100 single-item calls by 20–30%. Implement circuit breakers: if cost per decision exceeds $X, switch to a cached response or human escalation. Your agent doesn't need to run if the cost is too high.

Deployment: Where to Run Your Agent and Why It Matters

Vercel for Next.js-based agents. Serverless, autoscales, works with Supabase without latency issues. n8n Cloud or self-hosted n8n on a $10/month DigitalOcean droplet if you're orchestration-first. Deploy to the same region as your database. Vercel US East + Supabase US East = <50ms latency.

Don't run agents in Docker containers on your own infrastructure unless you're already running Kubernetes. The ops cost isn't worth it at early scale. Use webhooks, not polling. Webhooks are cheaper, faster, easier to debug.

Common Failure Modes in Production (and How to Dodge Them)

Agents hallucinate tool responses because you didn't validate outputs. Add a schema check after every tool call. External API dependencies fail and your agent goes silent. Implement graceful degradation and cached fallbacks.

Token count explodes because conversation history keeps growing. Implement a sliding window: keep the last 10 messages plus your system prompt, archive older threads. You ship without rate limits and get bill-shocked. Set spending caps before day one.

Specific failure: your sales agent calls Instantly API to send emails. If Instantly goes down, the agent keeps trying, burns tokens, no email gets sent. Add a retry count limit. After 3 failures, escalate to Slack. Human sees it and handles it manually.

FAQ

Should I use LangChain or CrewAI to deploy an AI agent to production?

Neither. Both frameworks add a layer of abstraction that costs you visibility and control when something breaks. You'll spend more time debugging their error handling than building your agent. Use direct API calls with explicit orchestration (n8n) and state management (Supabase). You'll ship faster and understand what's happening in production.

How do I handle state and memory when deploying AI agents at scale?

Store everything in Postgres as JSONB. Conversation threads, decision logs, confidence scores, rejected items with reasons. Query it later. Use Redis for session caching only, not truth. A sliding window approach keeps your context fresh without ballooning token costs. Implement versioning so you can rerun decisions if needed.

What's the cheapest way to deploy a production AI agent without overengineering?

Start with n8n ($20/month), Supabase ($25/month), and Claude API (pay-as-you-go, usually $50–150/month). Total: $95–195/month for a fully operational system. Add Vercel or a VPS only if you need custom logic. Most orchestration can live in n8n workflows. Most teams overspend on infrastructure before they've shipped anything.

If you want to talk through applying this to your stack, book a strategy call at cognival.co/book.


Want to apply this to your business?

30-min strategy call. No pitch, real look at your stack.

Book a strategy call →