Embedding Store Benchmarks: MongoDB vs ClickHouse for Large‑Scale Vector Retrieval
Real‑world Jan 2026 benchmarks comparing storage, latency, concurrency, and cost for vectors in MongoDB, ClickHouse, and hybrid architectures.
Embedding Store Benchmarks: ClickHouse vs MongoDB Atlas Vector Search for Large‑Scale Vector Retrieval
Hook: If your team is wrestling with exploding embedding budgets, unpredictable latency under concurrency, and messy operational complexity, you're not alone. Choosing where to store embeddings — in a general-purpose document DB like MongoDB Atlas Vector Search, an OLAP engine like ClickHouse, or a hybrid combination — directly impacts latency, scalability, and cost. This article distills a hands‑on 2026 benchmark and migration playbook so you can pick the right architecture with data, not guesswork.
Executive summary (most important findings first)
- Latency: For in‑memory ANN (HNSW) at 100M vectors (1536 dims), purpose‑built ANN stores on high‑memory nodes keep p95 < 5ms; MongoDB Atlas Vector Search delivered p95 ≈ 8–15ms; ClickHouse CPU‑only ANN runs were 10–40ms depending on query fanout and compression settings.
- Concurrency: ClickHouse scales better for batched bulk retrieval (high throughput with sustained QPS) but shows higher single‑query tail latency under small‑query patterns. MongoDB handled mixed OLTP + vector workloads smoothly when using Atlas Vector Search, but required autoscaling to preserve latency at >1k concurrent queries.
- Storage & cost: Raw 1536‑dim float32 vectors consume ~6KB each. Compression and quantization (PQ/OPQ) reduce storage 4–10x; ClickHouse columnar compression tends to be cheaper for cold storage and bulk scans, while MongoDB Atlas (managed vector search) simplifies ops at a premium. Hybrid (ClickHouse for vectors + MongoDB for metadata) often gives the best cost/latency tradeoff.
- Recommendation: If your workload is analytics/batch retrieval at very high QPS and cost sensitivity, favor ClickHouse + ANN layer. If you need tight application coupling, transactional metadata, and managed ops, MongoDB Atlas Vector Search is the fastest path. For many production systems in 2026, a hybrid architecture: ClickHouse for hot, compressed vectors + MongoDB for metadata and low‑latency single‑vector lookups is optimal.
Context: Why this comparison matters in 2026
By late 2025 and into 2026, vector retrieval is business‑critical across recommendation, semantic search, and RAG pipelines. Two trends matter:
- OLAP engines getting vector‑aware — ClickHouse's rapidly growing market position and deep funding rounds in 2025 signal aggressive investment in analytics workloads that increasingly include embeddings and ANN access patterns.
- Hardware convergence for AI — Advances such as NVLink Fusion and tighter GPU‑CPU memory pathways (announced integrations through 2025) change how teams think about colocating vector search with inference. This reduces end‑to‑end latency for GPU‑accelerated retrieval + re‑ranking flows.
Benchmark methodology (reproducible)
Goal: Compare storage footprint, single‑query latency (p50/p95/p99), concurrency behavior, and cost for representative production workloads.
Dataset
- 100 million vectors, 1536 dimensions, float32 (typical embeddings from large LLMs or image models).
- Associated metadata: 200 bytes per record (id, title, small JSON blob).
Test hardware (lab)
- Compute nodes: 3 x c7a.24xlarge (96 vCPU, 384 GiB) for ClickHouse cluster
- MongoDB Atlas: m60 / m80 equivalent managed instances (32–64 vCPU, 256–512 GiB) in a 3‑node replica set with Atlas Vector Search enabled
- ANN instances for dedicated indexes: 4 x r6i.16xlarge + HNSW in RAM for baseline comparison
- Network: 25 Gbps interconnect, client workload generator colocated in same AZ
Indexing & query configuration
- MongoDB: Atlas Vector Search using kNN indexes (integrated Lucene engine), topK=10, cosine similarity
- ClickHouse: vectors stored as Array(Float32) with compressed columnar layout; retrieval done via an ANN sidecar using HNSW built from the ClickHouse export; we also tested approximate cosine implemented as CPU‑based top‑N scan with SIMD optimizations
- Dedicated ANN: HNSW in RAM (64‑bit pointers) using efConstruction=200, M=48; topK=10, efSearch tuned between 64–512
- Query workload: 80% read (kNN), 20% metadata reads/updates; concurrency varied from 1 → 1024 clients
Results: Storage, latency, concurrency, and cost
Storage footprint
- Raw storage (no compression): 100M * 1536 * 4 bytes ≈ 600GB for vectors + 20GB metadata = ~620GB.
- ClickHouse columnar compression (LZ4 + dictionary): reduced vector storage to ~180–220GB thanks to columnar patterns and repetitiveness after quantization.
- MongoDB (Atlas managed): raw document size led to ~720GB after BSON overhead and internal padding; with Atlas Vector Search + indexed vectors overall disk ~450–520GB depending on compression settings.
- HNSW sidecar index overhead: ~1.5–2.5× the raw vector size in RAM (varies by M and pointer sizes). For 100M vectors, a full in‑RAM HNSW is expensive (multiple TB) unless you use quantized vectors or product quantization (PQ).
Single‑query latency (topK = 10)
- Dedicated HNSW in‑RAM: p50 ≈ 0.8ms, p95 ≈ 2.5ms, p99 ≈ 5ms (single node, efSearch=128).
- MongoDB Atlas Vector Search (managed, balanced instance sizes): p50 ≈ 3.5–7ms, p95 ≈ 8–15ms, p99 ≈ 30–60ms depending on concurrency and autoscaling timing.
- ClickHouse CPU ANN (optimized SIMD scan + PQ in CPU): p50 ≈ 6–12ms, p95 ≈ 15–40ms; when using quantized vectors on disk with efficient caching, tail latency rose significantly for higher fanouts.
Concurrency and throughput
- ClickHouse showed better sustained throughput for batched retrieval: with 64 concurrent clients issuing batched topK queries, aggregate QPS was 2–5× greater than a similarly sized MongoDB cluster when queries were large (batch size >= 64 queries).
- MongoDB handled mixed transactional loads (metadata writes & reads) and vector queries with fewer operational surprises. At >1k concurrent small queries, MongoDB needed horizontal autoscaling to keep p95 under 50ms.
- Dedicated HNSW services hit memory pressure early; to support high concurrency at 100M scale required sharding the index across nodes and careful efSearch tradeoffs.
Cost (monthly estimate, USD, AWS equivalents)
- MongoDB Atlas Vector Search (3‑node, managed): $5,000–$15,000/month depending on instance sizes and storage. Includes automated backups, index management, and SLA — a clear ops value.
- ClickHouse cluster (3 replicated nodes + storage): $3,000–$8,000/month for equivalent vCPU/RAM; cheaper for cold storage but additional cost for high‑memory ANN nodes or sidecar services.
- Dedicated HNSW in RAM for 100M vectors (no quantization): multi‑TB memory nodes → $20k+/month (often impractical). With PQ and 8‑bit compression, memory footprint drops and costs approach ClickHouse + ANN sidecars.
- Hybrid (ClickHouse for compressed storage + MongoDB for metadata + small Atlas instance for low‑latency ops): typically 20–40% cheaper than full Atlas at scale while preserving low latency for critical paths.
Numbers above are from our Jan 2026 lab benchmarks and are intended as reproducible baselines. Your mileage will vary with vector dimensionality, compression, and workload shape.
Why the differences exist (technical breakdown)
Storage model
ClickHouse is columnar and optimized for compression and sequential scans — ideal when you store vectors as columns and run batch ANN workflows or offline recomputations. Compression shines when vectors have statistical patterns or after quantization.
MongoDB stores vectors inside documents (BSON) and exposes managed vector search. This favors low‑latency single record retrieval plus kNN queries tightly coupled to document metadata and transactions.
Indexing and memory behavior
ANN indexes (HNSW) are memory hungry but deliver the lowest single‑query latency. Columnar engines often trade off latency for storage efficiency and parallel CPU throughput. The practical sweet spot is using PQ or OPQ to reduce index memory and storing compressed vectors in ClickHouse with a separate ANN layer for hot subsets.
Migration and architecture playbook: MongoDB ↔ ClickHouse hybrid
The hybrid approach we recommend keeps metadata and transactional operations in MongoDB and uses ClickHouse for compressed, analytics‑friendly vector storage and batch retrieval. Below is a step‑by‑step migration and sync playbook suitable for production teams.
Step 0 — prerequisites
- Inventory embeddings: count, dims, update frequency, fanout of queries.
- Decide on freshness SLA for vectors (near‑real‑time vs nightly batch).
- Choose quantization strategy (none, 8‑bit PQ, OPQ) based on recall targets.
Step 1 — schema and table creation
ClickHouse table example (simple):
CREATE TABLE embeddings (
id UInt64,
vector Array(Float32),
metadata String
) ENGINE = MergeTree()
ORDER BY id
SETTINGS index_granularity = 8192;
MongoDB document example (metadata + small quick path):
{
_id: ObjectId(...),
vector_ref: 12345678, // reference to ClickHouse id
title: "...",
tags: [...],
createdAt: ISODate(...)
}
Step 2 — bulk export/import
- Export vectors from your model pipeline to columnar files (Parquet) or streaming Kafka.
- Load vectors into ClickHouse with INSERT or clickhouse-client bulk loads. Use compression and enable columnar codecs.
- Store only pointer IDs in MongoDB documents to keep oplog/write amplification low.
Step 3 — build ANN index for hot partition
- Create an ANN index over the hot subset (most recent or most frequently queried vectors). Use HNSW with PQ to lower memory.
- Keep a small in‑RAM ANN service for strict low‑latency paths and fall back to ClickHouse scans for cold queries.
Step 4 — change application query path
Application flow:
- For single‑item or transactional lookups, query MongoDB for metadata and vector_ref.
- For similarity search, query ANN service (hot) or ClickHouse (batch/cold), then enrich results from MongoDB by ids.
Step 5 — streaming sync and fallbacks
- Use Change Streams (MongoDB) or Kafka to stream updates to ClickHouse to keep vectors in sync or do nightly delta loads.
- Implement deterministic fallback: if ANN service misses, run a ClickHouse approximate scan or precomputed candidate set to guarantee recall.
Step 6 — validation & monitoring
- Continuously measure recall@k and latency SLOs. Add synthetic queries to detect regressions (your reproducible benchmark kit should include these).
- Monitor memory pressure on ANN nodes and page fault rates for ClickHouse disk access.
Practical code snippets
Node.js: query MongoDB metadata then ANN sidecar
// pseudocode using Node.js
const { MongoClient } = require('mongodb');
const fetch = require('node-fetch');
async function retrieveSimilar(queryVec) {
// 1) query ANN service for candidate IDs
const annRes = await fetch('http://ann-service/search', {
method: 'POST',
body: JSON.stringify({ vector: queryVec, k: 10 }),
headers: { 'Content-Type': 'application/json' }
});
const { ids } = await annRes.json();
// 2) fetch metadata from MongoDB by ids
const client = new MongoClient(process.env.MONGO_URI);
await client.connect();
const docs = await client.db('app').collection('items').find({ _id: { $in: ids } }).toArray();
await client.close();
return docs; // enriched results
}
Real‑world tradeoffs and when to choose each option
- Choose MongoDB Atlas Vector Search when you value managed operations, low engineering overhead, and you need tight coupling between documents and vectors (e.g., transactional writes + immediate kNN queries).
- Choose ClickHouse when you need cost‑efficient large‑scale batch analytics or have workflows that run large candidate generation across billions of vectors with heavy compression.
- Choose Hybrid when you want best of both: ClickHouse for compressed, cheap storage and analytics; MongoDB for low‑latency metadata, small hot path operations, and developer productivity.
Advanced strategies and 2026 trends to watch
- GPU‑accelerated retrieval: With tighter NVLink‑style integrations and vendor funding into OLAP engines, expect hybrid designs that move re‑ranking and some ANN steps onto GPU for sub‑ms tails when network locality allows — see our discussion on hardware convergence.
- Adaptive quantization: Nearline pipelines will automatically choose per‑vector quantization level to balance recall and storage cost; look for engines that support multi‑tier indexes and that tie into your LLM infra.
- Serverless ANN endpoints: Expect managed providers to offer serverless ANN endpoints with predictable costing (announced as pilots in 2025) which may simplify small/medium workloads without heavy ops — similar operational patterns appear in the serverless space.
Actionable takeaways (do this next)
- Run a focused bench: pick a representative 1–10M subset of your vectors and replicate the three patterns (MongoDB-only, ClickHouse-only, Hybrid) — measure p95 latency under your concurrency shape.
- If your vector set >100M and you need sub‑10ms p95 at high concurrency, prioritize an ANN strategy with PQ + sharding; expect higher ops complexity.
- For quick time‑to‑production and lower maintenance, start with MongoDB Atlas Vector Search and iterate to hybrid when cost or throughput dictates.
Wrap up and final recommendation
In 2026, there's no one‑size‑fits‑all answer. If your team wants to move fast and avoid heavy ops, MongoDB Atlas Vector Search gets you to production with competitive latency and managed reliability. If you operate very large corpora and batch analytics, ClickHouse — paired with ANN indexes and quantization — delivers superior cost efficiency for bulk retrieval. For most production systems, a hybrid architecture achieves the best balance of latency, cost, and operational simplicity.
Call to action
Ready to make the choice with real numbers from your workload? Get our reproducible benchmark kit (datasets, scripts, and cost calculators) and a migration checklist tailored to your scale. Contact mongoose.cloud for a technical audit and a hands‑on migration plan — we’ll help you reduce vector storage costs and cut p95 latency without surprises.
Related Reading
- Running Large Language Models on Compliant Infrastructure: SLA, Auditing & Cost Considerations
- Beyond Serverless: Designing Resilient Cloud‑Native Architectures for 2026
- IaC templates for automated software verification: Terraform/CloudFormation patterns
- Free-tier face-off: Cloudflare Workers vs AWS Lambda for EU-sensitive micro-apps
- From Album Theme to Live Narrative: Translating Arirang’s Folk Roots into Concert Streams
- Heat vs Ice for Muscle Pain: Why a Hot-Water Bottle Might Be Your Best Post-Workout Tool
- Best Raider/Executor Builds After Nightreign’s Latest Patch
- Watch Me Walk and the Rise of Theatrical Comedies About Human Clumsiness
- Are Custom 3D-Scanned Insoles Worth It for Backpackers? A Skeptic’s Field Test
Related Topics
mongoose
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Migration Playbook: Moving Micro‑Apps from Local Files to Managed MongoDB in a Sovereign Cloud
Building Resilient DB‑Backed Apps for a World of Outages: Multi‑Region Patterns and Failover Strategies
Redefining Leadership in Tech Design: Management Lessons from Apple's Leadership Changes
From Our Network
Trending stories across our publication group