We benchmarked four vector databases on the same workload. Each has a place. Here's how we'd pick today.

On this page

Vector Databases for AI: Pinecone, Weaviate, Chroma, pgvector

Choosing a vector database is the kind of decision teams agonize over and then live with for years. We've used four in different production contexts: Pinecone (managed), Weaviate (self-hosted), Chroma (embedded), and pgvector (extension on Postgres). All four can serve the basic vector search use case. They differ on operational shape, cost, and ecosystem fit. This post is the comparison from someone who's run each.

What we benchmarked #

Same dataset (~280k documents, embedded with text-embedding-3-large at 3072 dimensions), same query workload (~5k queries/day), same evaluation set (200 hand-labeled queries with expected results).

Metrics:

Recall@10 (how often the top 10 results contained the expected match)
p50 / p95 query latency
Cost per month at our volume
Operational time per quarter

The numbers #

Database	Recall@10	p50	p95	Cost/month	Ops time
Pinecone (s1.x1)	96%	35ms	80ms	$720	<1 hr
Weaviate (self-hosted)	95%	22ms	60ms	$290 (compute)	~6 hr
pgvector	93%	45ms	110ms	$80 (existing PG)	~2 hr
Chroma (embedded)	94%	18ms	40ms	~$0 (in-app)	~1 hr

Recall is within margin of error across all four — the choice isn't really about quality at this scale. It's about ops profile and cost.

Pinecone: easiest to operate, most expensive #

What it gets right:

Zero ops. We provisioned an index, sent vectors, made queries. Nothing to patch, scale, or monitor.
Filters are first-class — metadata filtering with where clauses works without manual sharding.
Auto-scaling that we never had to think about.

What it gets wrong (or where it costs):

Price. ~$720/month for our load. At higher volume the cost compounds — one team has had Pinecone bills > $5,000/month.
Vendor lock-in. We export embeddings periodically as a precaution but a real migration off Pinecone is a project.
Network latency to/from Pinecone (it's not in our VPC). p50 is ~35ms vs ~22ms for self-hosted; about 10ms is round-trip to Pinecone's API.

When we'd pick Pinecone: a small team without ops capacity, rapid prototyping, when "just make it work" matters more than cost. We have it in production for one customer-facing app where reliability mattered more than the bill.

Weaviate: most flexible, most operational overhead #

What it gets right:

Hybrid search (vector + BM25) is built-in. You can run both in one query and combine via reranking.
GraphQL API is nice for complex queries.
Schema-aware — metadata is typed, validation happens at insert.
Self-hostable with reasonable resource needs (our 280k vector index runs on a 4-vCPU/16GB instance).

What it gets wrong:

Operating it is real work. Backups, version upgrades, scaling, monitoring. We spend ~6 hours/quarter on Weaviate operations.
The Kubernetes operator is decent but not bulletproof. We've had two upgrades that needed manual intervention.
Schema migrations (changing a property type, adding a vector dimension) are painful — usually requires reindexing.

When we'd pick Weaviate: when hybrid search matters out of the box, when self-hosting is acceptable, when the volume is too high for Pinecone economics. Production for our internal knowledge-search where compliance preferred self-hosting.

pgvector: best fit when you already have Postgres #

What it gets right:

Zero new infrastructure. We already had Postgres; pgvector is an extension.
Joining vector results with relational data (filtering by user, project, date) is a single SQL query rather than a round-trip dance between two databases.
Operationally identical to Postgres — same backups, same monitoring, same upgrade cadence.
Cost is essentially zero on top of existing Postgres infrastructure.

What it gets wrong:

Performance is worse than purpose-built vector DBs at high scale. At our 280k document scale, fine. At 100M, probably not.
Index choice matters. HNSW (added in pgvector 0.5+) gives much better speed than IVFFlat but uses more memory.
Metadata filtering with vector search can be tricky to optimize — the planner sometimes picks wrong indexes.

When we'd pick pgvector: most cases, honestly. If you have Postgres, the operational simplicity wins. We use it for our largest deployment (the customer-support knowledge base).

A specific gotcha: HNSW index build time on 280k vectors took ~12 minutes. Not bad, but on a fresh deployment you wait. Plan accordingly.

Chroma: best for embedded use cases #

What it gets right:

Embedded mode runs in-process (Python). No separate service to deploy.
Simple API. Get something working in 20 lines of Python.
Good for scripts, notebooks, experimental work, single-machine apps.

What it gets wrong:

Not really a server-mode workhorse. The HTTP server exists but feels like an afterthought.
Persistence is via a SQLite database under the hood. Works for moderate scale; we wouldn't use it for 10M+ vectors.
Limited tooling for ops (backups, replication).

When we'd pick Chroma: prototypes, internal tools, tasks where the vectors live with the code. We use it for some internal experimentation tools but nothing production-customer-facing.

What we ended up picking when #

Across our actual production deployments:

Customer support knowledge base (280k docs, 5k QPD): pgvector. Already had Postgres, joins to ticket metadata helpful.
Public product docs Q&A (50k docs, 100 QPD): pgvector. Same reasoning. Low traffic.
Internal RAG over our internal wiki (40k docs, 1k QPD): Weaviate. Hybrid search benefit; self-hostable for compliance.
A customer-facing semantic search feature with bursty traffic (180k docs, variable QPD): Pinecone. Reliability and zero ops won; pay the cost.
Experimental multi-modal demo (image + text embeddings, ~5k items): Chroma. Not customer-facing; one Python script.

How we'd pick today #

A simple decision tree:

Already have Postgres? Try pgvector first. Spend a day to see if it meets your latency/recall needs. For most teams, it will.
Need hybrid search out of the box, want to self-host? Weaviate.
Don't want any operational burden, willing to pay? Pinecone (or Pinecone-like managed services like Vertex AI Vector Search, MongoDB Atlas Vector Search, etc.).
Embedded scripting / notebook use? Chroma. (Or FAISS for raw speed if you don't need persistence.)
Massive scale (100M+ vectors)? None of the above is an obvious answer; this is its own design problem with options like Milvus, Vespa, or sharded custom solutions.

Things that don't differ much #

People obsess over these; they barely matter at our scale:

Exact ANN algorithm (HNSW vs IVF vs etc.). All implement HNSW now and it's good enough.
Vector size limits. All four handle 3072-dim vectors fine.
Recall numbers. Within ~3 percentage points across all four; differences are below the noise floor of our eval set.

Things that do differ a lot #

What actually matters for the decision:

Cost at your volume. 100x difference between Chroma (free) and Pinecone (~$720/month) for the same workload.
Operational shape. Managed vs self-hosted is a different team capability question.
Integration with existing data. pgvector wins decisively if your domain data is already in Postgres.
Hybrid search needs. Weaviate ships it; others require building it.
Filtering capability. All support metadata filters but the speed at high cardinality varies. Test with your actual filter patterns.

What we'd tell a team choosing #

Don't pick the trendiest option. Pick the one that fits your stack. A team that already runs Postgres should default to pgvector. A team without Postgres might pick differently.

Benchmark with your actual data. Sample of 10k documents and 100 queries gives you a real comparison in an afternoon. Synthetic benchmarks are often misleading.

Watch the operational cost, not just the bill. $290/month for self-hosted Weaviate is cheaper than $720/month for Pinecone — until you spend 8 hours/month maintaining it. Then they're comparable in total cost.

Avoid premature scale anxiety. "But what about when we have 100M vectors?" — most teams never get there. Optimize for the next 6-12 months of scale, plan to revisit if you cross thresholds.

Have an export plan. Even managed services should expose their data in a portable format. We snapshot embeddings to S3 monthly so a migration is feasible if needed.

The vector DB choice is rarely the bottleneck on AI quality. Retrieval logic, chunking, re-ranking, and prompt design matter more. Pick the DB that fits operationally, then move on to the work that actually moves the metric.

Vector Databases for AI: Comparing Pinecone, Weaviate, and ChromaDB

Vector Databases for AI: Pinecone, Weaviate, Chroma, pgvector

What we benchmarked #

The numbers #

Pinecone: easiest to operate, most expensive #

Weaviate: most flexible, most operational overhead #

pgvector: best fit when you already have Postgres #

Chroma: best for embedded use cases #

What we ended up picking when #

How we'd pick today #

Things that don't differ much #

Things that do differ a lot #

What we'd tell a team choosing #

Stay Updated

How We Stopped Terraform Drift from Surprising On-Call

Real-World RAG Incidents: Lessons from a Production Rollout

More from AI

Token Budgeting for Long-Context Prompts: What to Cut First

Multi-Provider LLM Gateways: Routing, Fallback, and Cost Control

Streaming LLM Responses: SSE, Backpressure, and Cancellation

Token Budgeting for Long-Context Prompts: What to Cut First

Multi-Provider LLM Gateways: Routing, Fallback, and Cost Control

Streaming LLM Responses: SSE, Backpressure, and Cancellation

Choosing an Embedding Model: Dimensions, Cost, and MTEB Reality

Agent Memory: Short-Term, Long-Term, and When You Need Neither

Guardrails for Production LLMs: Input and Output Filtering That Holds

You might have missed

GitOps with Argo CD: Best Practices for 2025

Process Management and Monitoring in Linux

Linux Performance Tuning for Containers and Kubernetes Nodes

About Kiril Urbonas