← Back to blog

Supabase Vector Embeddings Tutorial: pgvector Setup + Claude Integration

Step-by-step supabase vector embeddings setup with pgvector. Connect Claude API, store embeddings, query by similarity. Production-ready code included.

Supabase vector embeddings let you build semantic search and AI-powered recommendations without spinning up a separate vector database. pgvector runs inside your existing Postgres instance, giving you one connection string, one auth model, and one backup strategy instead of juggling Pinecone plus Postgres. You query vectors and relational data in a single SQL call. A 50K document corpus returns top results in 18ms with proper indexing. The catch: most tutorials show you how to build it, then vanish. This one shows you the entire path from pgvector setup through Claude integration to the exact queries that don't timeout at scale.

Most vector database tutorials skip the part where your embedding search actually gets called by a user. This one doesn't. We'll wire Supabase vector embeddings end-to-end: pgvector setup, Claude API integration, and the exact queries that don't timeout in production.

Why Supabase + pgvector Beats a Separate Vector Database

pgvector eliminates infrastructure overhead by running directly inside Postgres. You don't deploy a separate service, manage two databases, or debug cross-system latency issues. Supabase handles backups, scaling, and monitoring for both your relational data and your vectors.

Cost matters at scale. Embedding storage on Supabase runs roughly $0.02 per million rows versus Pinecone's $0.04 per million vectors. For a 10M-embedding dataset, that's a $200K annual difference. You also avoid the network hop between Postgres and a vector service. Latency stays under 50ms for similarity searches on massive datasets when you use the right index strategy.

Here's the real advantage: you can join vectors to relational data in a single SQL query. Pinecone forces you to fetch vector results, then query Postgres for the metadata. With pgvector, you get documents, user info, timestamps, and similarity scores in one request. This matters when you're building ranked search results or filtering by user permissions.

pgvector Setup: From Zero to Querying Vectors in 15 Minutes

Enable pgvector inside Supabase with one SQL command, then create a table with vector columns. Start by running CREATE EXTENSION IF NOT EXISTS vector; in your Supabase SQL editor. Postgres installs the extension immediately. Next, create your documents table:

sql CREATE TABLE documents ( id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), metadata JSONB, created_at TIMESTAMP DEFAULT NOW() );

The vector(1536) column matches OpenAI's text-embedding-3-small dimensions. If you use Jina AI embeddings (768 dims), change that value. Postgres won't warn you if the dimensions mismatch until query time, so write this down.

Add an HNSW index for sub-100ms queries on large datasets:

sql CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

HNSW outperforms IVFFLAT for datasets under 5M rows. Swap to IVFFLAT for larger collections if inserts slow down. Test the index by embedding a known phrase and querying it directly. If your search returns results in 20ms, the index is working. If it takes 2 seconds, the index isn't being used—recheck your column dimensions and query syntax.

Connecting Claude to Generate and Store Embeddings

Generate embeddings with a dedicated embedding model, then store them with metadata in Supabase. Using Claude directly for embeddings adds latency. Instead, use Anthropic's embeddings model or Jina AI. Jina costs $0.001 per 1K tokens versus OpenAI at higher rates. This matters when you're embedding 100K documents—the savings add up fast.

Store metadata alongside each embedding as JSONB: {user_id, doc_id, created_at, source_url}. This lets you filter and rank results without additional database calls.

Batch your inserts. Don't insert one embedding at a time. Send 100–500 rows per batch using Supabase's client:

javascript const { data, error } = await supabase .from('documents') .insert(batchOf500Embeddings);

Implement retry logic. Embedding APIs hit rate limits, and Supabase insert throughput can spike. A failed batch of 500 rows costs time and money. Exponential backoff (wait 2s, then 4s, then 8s) fixes most transient failures.

Similarity Search: The Query Pattern That Matters

Similarity search uses Postgres operators to find the closest vectors to your query. The core query is:

sql SELECT id, content, metadata, 1 - (embedding <=> $1::vector) AS similarity FROM documents WHERE 1 - (embedding <=> $1::vector) > 0.7 ORDER BY embedding <=> $1::vector LIMIT 10;

The <=> operator computes cosine distance. Subtract from 1 to get a similarity score (higher is better). Filter with WHERE to remove noise below 0.7 similarity. This prevents your search from returning irrelevant matches.

Index strategy shapes latency. HNSW indexes trade accuracy for speed and work best for typical search workloads. IVFFLAT uses less memory and handles inserts faster but adds query latency. Pick HNSW for under 5M embeddings. Switch to IVFFLAT if your insert rate becomes a bottleneck.

Real numbers: A 50K document corpus returns top 10 results in 18ms with an HNSW index on a standard Supabase instance. Test your setup locally before pushing to production.

Hybrid Search: Combining Vector Similarity With Full-Text Search

Vector-only search misses exact keyword matches. Full-text-only misses semantic meaning. Combine both. Build a full-text index on your content column:

sql CREATE INDEX ON documents USING GIN (to_tsvector('english', content));

Query both:

sql SELECT id, content, (1 - (embedding <=> $1::vector)) * 0.7 + ts_rank(to_tsvector('english', content), plainto_tsquery($2)) * 0.3 AS score FROM documents ORDER BY score DESC LIMIT 10;

Weight the scores by relevance. Start with 70% vector, 30% full-text. Adjust based on your domain. A support chatbot returned wrong answers until we added full-text search on ticket titles plus vector search on description bodies. The hybrid approach fixed it.

Handling Embeddings Updates and Deletion

Versioning embeddings beats re-indexing everything. Add a version column to your documents table. When your embedding model updates, insert new rows with the new version and filter queries to use the latest version. Background jobs handle re-embedding without downtime.

Use soft deletes. Add an is_deleted boolean and filter it in your queries instead of removing rows outright. Remove entries from your index separately. Batch deletion in chunks of 1,000 rows. Rebuild your HNSW index weekly or after 10% data churn.

Cache embeddings of frequently searched queries in Redis. A typical search API cuts embedding API calls by 60% with a 7-day TTL. This compounds: if you run 10K searches per day and 30% are repeated queries, caching saves $90+ monthly on embedding costs.

Debugging Embeddings That Don't Return What You Expect

First check: Are you querying the same vector space? OpenAI's text-embedding-3-small produces 1536-dimensional vectors. Jina's embedding model produces 768 dimensions. If you mix them, your similarity scores collapse. Verify your dimensions match.

Run a sanity check. Embed the exact phrase "machine learning", query for it, and confirm it ranks first. If it doesn't, your embedding model doesn't understand your domain. Consider using a fine-tuned model or a domain-specific alternative.

Log query vectors and top 5 results to a debug table. Spot patterns in false negatives. Use Supabase's query performance dashboard to confirm your HNSW index is being used. Check the query plan. If you see a sequential scan instead of an index scan, your index isn't configured correctly.

When Vector Search Alone Isn't Enough (and What to Add Instead)

Vector search finds semantic matches but fails on recency and hard constraints. Add a recency boost by ordering on both distance and creation date:

sql ORDER BY embedding <=> $1::vector, created_at DESC

For multi-tenant apps, always filter by tenant_id before similarity search. Never trust vector distance alone to separate user data. A single query bug can leak one customer's search results to another.

If your search returns "close enough" results instead of exact answers, add re-ranking. Use Claude to score the top 5 results and return the best one. This adds $0.01–0.05 per query. Batch similar queries to amortize the cost.

Real production example: A support chatbot was returning partially correct answers until we implemented hybrid search (vector + full-text) plus re-ranking with Claude on uncertain matches. Accuracy jumped from 72% to 91%. The extra cost was $200/month. Worth it.

FAQ

Do I need to use OpenAI embeddings with Supabase, or can I use other embedding models?

You're not locked into OpenAI. Supabase works with any embedding model that outputs numeric vectors. Jina AI, Anthropic's embeddings, and open-source models like sentence-transformers all work. Pick based on cost, accuracy for your domain, and dimension count. Jina is typically cheaper for high volume.

How do I handle embedding model updates without re-indexing my entire dataset?

Add a version column to your documents table. Insert new embeddings with the new version number. Filter all queries to use the latest version. Background jobs re-embed old entries asynchronously. This avoids downtime and lets you roll back if needed.

What's the cost difference between pgvector on Supabase vs. Pinecone for production use?

Supabase embedding storage costs roughly $0.02 per million rows. Pinecone charges $0.04 per million vectors. For a 10M-embedding dataset, Supabase saves $200K annually. Add embedding API costs (varies by provider) on top of both. Supabase wins if you can accept Postgres's scaling limits (typically 100M+ embeddings before horizontal sharding becomes mandatory).

---

If you want to talk through applying this to your stack, book a strategy call at cognival.co/book.

Want to apply this to your business?

30-min strategy call. No pitch, real look at your stack.

Book a strategy call →