Benchmark: MongoDB Time-Series vs Wide Documents for Truck Telemetry
Benchmarking MongoDB time-series vs bucket vs wide documents for autonomous truck telemetry: throughput, storage, and query trade-offs.
Hook: Why this benchmark matters for autonomous trucking teams in 2026
Managing telemetry for fleets of autonomous or assisted trucks is one of the hardest scaling problems teams face today: very high write rates, mixed-size reads (per-vehicle history vs. fleet-wide aggregates), unpredictable retention windows, and hard operational constraints like storage costs and recovery time objectives. If your team is evaluating MongoDB models for telemetry — native time-series collections, the bucket pattern, or wide documents — you need concrete trade-offs, not folklore.
This benchmark uses a realistic Aurora-inspired autonomous trucking use case (triggered by integrations like Aurora + McLeod that surfaced in late 2025 and increased telemetry demand) to measure throughput, storage, and query cost across three modeling strategies on MongoDB 7.x-class clusters in early 2026. You’ll get test methodology, numbers you can reproduce, and an actionable migration playbook.
Executive summary — key findings up front
- Native time-series collections are the best default for medium-to-high-cardinality telemetry with strong range-query performance and excellent storage compression for recent-window workloads. They balance writes and reads well and simplify retention/TTL management.
- Bucket pattern (manual bucketing) gives the highest raw ingest throughput and best compression when you control bucket size, but you pay CPU cost for unpacking/aggregation and more complex application logic.
- Wide documents (per-truck append-to-array) are attractive for low-cardinality fleets (tens to low hundreds of trucks) when most reads are “fetch entire vehicle state” and writes are batched — but they fail at scale: updates are heavier, arrays hit size limits, and storage grows quickly.
- For Aurora-style autonomous fleets integrating with TMS platforms (high availability, predictable retention, and short recovery windows), time-series collections + controlled bucketing strategy is the best operational compromise in 2026.
Benchmark context: workload, dataset, and cluster
Use-case profile
Inspired by the operational needs created by integrations like Aurora’s driverless trucking link to TMS platforms (which increase telemetry volume when fleets are actively dispatched), we modeled a mixed workload:
- Fleet size: 10,000 trucks
- Average sample rate: 1 sample/second per truck (bursts up to 5 Hz simulated)
- Payload per sample: GPS (lat, lon), speed, heading, engine RPM, fuel level, ambient temperature, 3 diagnostic flags, and a variable-size JSON
diagnosticsblob (0–1KB). - Retention target: 30 days hot, 1-year archived (cold) storage
- Primary queries: per-truck recent-window (last 1 hour), fleet aggregate over 24 hours (max speed / total distance), and ad-hoc diagnostic lookups.
Test cluster and tooling
- MongoDB: 7.x (latest stable patch in early 2026)
- Cluster: 3-node replica set, each node: 16 vCPU, 64 GB RAM, 2 x NVMe SSDs. Workload generated from dedicated clients on separate hosts.
- Write driver: official Node.js MongoDB driver (v5+), using bulk inserts for time-series and bucketed writes; update operations for wide-doc approach.
- Load generator: open-source load harness (repro scripts provided in the repo referenced at the end).
- Dataset size for each benchmark run: simulated 7-day window of activity loaded into a cold cluster (≈60–90M samples depending on burst pattern).
Modeling approaches tested
1) Native time-series collection
Each telemetry sample is a single document in a MongoDB time-series collection with timeField set to ts and metaField set to truckId. We rely on MongoDB’s internal bucketing and compression optimizations introduced across the 7.x series.
// example document
{
"truckId": "truck-1234",
"ts": ISODate("2026-01-10T12:00:01Z"),
"loc": { "type": "Point", "coordinates": [-122.33, 47.61] },
"speed": 62.4,
"rpm": 1200,
"fuel": 42.3,
"diagnostics": { "code": 0 }
}
2) Bucket pattern (manual bucketing)
We grouped samples into minute-level buckets per truck: each bucket is a document with an array of measurements for that minute. Bucket size tuning (30s, 60s, 5min) was part of the experiment. Buckets are rotated and compacted every minute.
// example bucket
{
"truckId": "truck-1234",
"start": ISODate("2026-01-10T12:00:00Z"),
"end": ISODate("2026-01-10T12:00:59Z"),
"samples": [ {"ts": ..., "speed": ..., "loc": ...}, ... ]
}
3) Wide documents (per-truck append)
Each truck is a single document with an array of samples appended. Periodic compaction (rotate into archive document) is required to avoid hitting the 16MB BSON limit. Writes use update with $push and $slice in high-throughput bursts.
// wide doc
{
"truckId": "truck-1234",
"samples": [ {"ts": ..., "speed": ..., "loc": ...}, ... ]
}
Raw benchmark results (high level)
The following numbers are median values observed across several runs. They are intended as directional guidance; your mileage may vary based on hardware, network, and specific telemetry payloads.
1) Sustained ingest throughput (writes/sec)
- Time-series: ~75k inserts/sec sustained (single-cluster, distributed across all nodes).
- Bucket pattern (1-minute buckets): ~130k sample inserts/sec equivalent (each bucket update groups multiple samples; fewer documents created).
- Wide documents: ~30–45k sample updates/sec (write amplification due to full-document update and larger index footprint).
2) Storage per million samples (compressed on disk)
- Time-series: ~0.9–1.2 GB per 1M samples (good compression from native buckets).
- Bucket pattern: ~0.5–0.8 GB per 1M samples (better compression by grouping similar documents and avoiding per-doc overhead).
- Wide documents: ~2.0–2.8 GB per 1M samples (arrays and update churn increase storage; fragmentation higher).
3) Query latency (median, 95th percentiles)
Query: retrieve last 1 hour of data for a single truck (≈3.6k samples)
- Time-series: median 8 ms, p95 40 ms.
- Bucket pattern: median 12 ms, p95 85 ms (unwind & aggregation occasionally costly).
- Wide documents: median 6 ms, p95 30 ms (fast when arrays are small; degrades as arrays grow).
4) Fleet-wide 24-hour aggregate (max speed per truck across 10k trucks)
- Time-series: 3–6s (depends on index usage and shard parallelism).
- Bucket pattern: 2–4s (bucket-level aggregates reduce document count but require array unwinding for precision).
- Wide documents: 4–10s (heavy projection and in-memory processing when arrays are large).
Interpreting the numbers — trade-offs explained
Why bucket pattern gave best ingestion & storage
Manual bucketing reduces per-document overhead (fewer documents, larger arrays of tightly-typed samples), which improves write throughput and compression ratio. The trade-off: queries that need individual samples must unpack arrays which increases CPU and memory pressure during aggregation. The bucket pattern shines when you can choose bucket granularity (1 minute vs 5 minutes) to align with query patterns.
Why native time-series is the best operational default
MongoDB’s native time-series engine (7.x) manages bucket lifecycle and compression automatically, offers native optimizations for range queries on the timeField, and integrates with TTL-based retention. It avoids manual bucket maintenance logic and fits most typical telemetry access patterns (recent-window reads, rolling aggregates).
Why wide documents are risky at scale
Wide documents optimize for single-document reads (fetch entire truck history quickly), but they cost more for writes (large document moving), they risk hitting the 16MB document limit, and they create heavy fragmentation. They are a pragmatic choice for small fleets or for use cases that require infrequent writes and mostly read-after-write semantics per-vehicle.
Operational considerations & 2026 trends
- Edge pre-aggregation: In 2025–2026 many fleets push initial aggregation to edge gateways. That reduces central write rates and makes bucket patterns even more effective because you receive pre-bucketed payloads.
- Cloud-managed autoscaling: Serverless and autoscaling tiers in managed MongoDB services are now common. Time-series collections auto-optimize well under auto-scaling, but manual bucketing can achieve lower cost at high sustained loads.
- Tiered storage: New 2025–2026 provider features allow automatic movement to colder tiers. Designing retention/compaction strategies (e.g., compress and archive older buckets) reduces long-term cost.
- Observability & SLOs: In 2026 operations teams expect built-in telemetry pipelines (change streams, streaming exports) to feed observability and incident response. Time-series collections integrate cleanly with change streams for near-real-time alerts.
Actionable advice — migration playbook
When to choose native time-series
- You have high cardinality (thousands+ trucks) and need predictable range query performance for recent windows.
- You want minimal operational complexity and built-in TTL retention and compression.
- Recommended steps:
- Create time-series collection with appropriate
timeFieldandmetaField. - Index on
{truckId: 1, ts: 1}for per-truck range queries. - Configure TTL for hot window; export older buckets to archived collections or cloud object storage.
- Create time-series collection with appropriate
When to implement manual bucketing
- When ingest is a bottleneck and you can tolerate array-unpacking CPU for analytics.
- When payloads are highly uniform and compress well within a bucket.
- Recommended implementation:
- Choose bucket size based on query window (1-minute typical for 1Hz telemetry).
- Keep bucket documents bounded (rotate when sample count or bytes threshold hit).
- Index on
{truckId: 1, start: 1}and keep a materialized last-sample field for quick recent reads.
When wide documents make sense
- Small fleets (hundreds of vehicles), mostly read-heavy, and where writes are batched and predictable.
- Use a hybrid: keep recent N samples in a wide document for ultra-fast read, and stream the full time-series to a separate time-series collection for analytics.
Migration recipe: time-series → bucket hybrid
- Start with a time-series collection for fast MVP and built-in retention.
- Measure ingest and storage. If write throughput or storage costs are above SLO, prototype a manual bucketed pipeline in a staging cluster using production sample shapes.
- Implement streaming export (Change Streams or Atlas Data Federation) to move/compact old buckets into cold storage periodically.
- Benchmark queries again; add materialized views or pre-aggregated collections for expensive fleet-wide queries.
Reproducible checklist and sample scripts
You should reproduce these results with your actual payload and cluster. Key items:
- Use the same payload shape and sample rate. Payload variability (diagnostic blob size) materially changes storage.
- Run sustained 30–60 minute warm-up followed by 2–4 hour steady-state tests for reliable measurements.
- Collect server-side metrics (op counters, CPU, memory, disk I/O) and driver-side latencies.
// simple loader outline (Node.js)
const { MongoClient } = require('mongodb');
async function writeSamples(uri) {
const client = new MongoClient(uri);
await client.connect();
const db = client.db('telemetry');
const coll = db.collection('samples'); // time-series or bucketed
// bulk insert loop ...
}
Full scripts and the artifact used for these benchmarks are available in the public repo linked in the CTA below.
Real-world consideration inspired by Aurora + TMS integrations
Partnerships like Aurora’s 2025 integrations with TMS vendors increased the number of business-critical telemetry consumers in the supply chain. That creates new constraints: telemetry must be auditable for match-and-bill, be available to dispatch systems with low latency, and be compact to minimize cross-region egress. These requirements favor time-series collections with controlled bucketing and tiered archival so you can serve real-time TMS workflows while minimizing long-term cost.
Final recommendation
For most modern autonomous-truck telemetry use cases in 2026, start with native time-series collections. If sustained ingest or storage costs exceed your targets, migrate hot-path ingestion to a bucket pattern after validating query costs for unpacking. Reserve wide documents for small fleets or hybrid patterns where you keep a small rolling window per-truck in a wide doc and everything else in time-series.
Actionable takeaways
- Start with time-series for simplicity, TTL, and good default compression.
- Tune bucket size if you need top-end ingest and can accept higher CPU for analytics.
- Use edge or gateway pre-aggregation to reduce central write rates and cost.
- Measure end-to-end: disk storage, ingest latency, query p95, and recovery time for retention windows.
Next steps & call-to-action
Ready to reproduce these benchmarks on your data and cluster? We published the full benchmark harness, dataset generator, and Terraform scripts used to stand up the test cluster. If you’re evaluating managed options, our team at mongoose.cloud can run a free pilot against your telemetry samples, show concrete cost trade-offs, and recommend a migration plan tuned to your SLOs.
Try the repo: clone the benchmark scripts, run them against a staging cluster, and follow the migration playbook above. Or contact mongoose.cloud to run a zero-risk pilot using your payload and retention targets.
Related Reading
- MagSafe and Qi2: Which Wireless Charger Is Right for Your Rental Unit?
- How to Photograph and List Rare MTG Cards, Amiibo and LEGO for the Best Sale Price
- Turning Opticians into Wellness Hubs: What Boots’ Campaign Suggests for Integrated Care
- Deepfake Alerts and Wildlife Fraud: Teaching Students to Spot Fake Animal Footage Online
- Cinematic Branding: How Cricket Teams Can Borrow from Star Wars Marketing Playbooks
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Integrating Advanced Chat Features Using MongoDB and AI: A Look Ahead
Navigating Platform Outages: Strategies for Resilient Database Management
Revolutionizing Power Management: How Smart Tech Influences DevOps
A Deep Dive into Wearable Tech: Implications for Mobile Databases
Advanced Backup Strategies for Managing IoT Transactional Data
From Our Network
Trending stories across our publication group