Benchmarks
This page presents Fila’s benchmark results: self-benchmarks measuring single-node performance, and competitive comparisons against Kafka, RabbitMQ, and NATS.
Results from commit
1e5bb0eon 2026-03-26. Run benchmarks on your own hardware for results relevant to your environment. See Reproducing results for instructions.
Self-benchmarks
Self-benchmarks measure Fila’s single-node performance across throughput, latency, scheduling, and resource usage. The benchmark suite is in crates/fila-bench/ and uses the Fila SDK as a blackbox client against a real server instance.
Throughput
| Metric | Value | Unit |
|---|---|---|
| Enqueue throughput (1KB payload) | 5,051 | msg/s |
| Enqueue throughput (1KB payload) | 4.93 | MB/s |
Single producer, sustained over a 3-second measurement window after 1-second warmup.
End-to-end latency
Round-trip latency: produce a message, consume it, measure the interval. 100 samples per load level.
| Load level | Producers | p50 | p95 | p99 |
|---|---|---|---|---|
| Light | 1 | 0.00 ms | 0.00 ms | 0.00 ms |
Fair scheduling overhead
Compares throughput with DRR fair scheduling enabled vs plain FIFO delivery.
| Mode | Throughput (msg/s) |
|---|---|
| FIFO baseline | 1,307 |
| Fair scheduling (DRR) | 1,247 |
| Overhead | 3.1% |
The DRR scheduler adds minimal overhead compared to FIFO delivery (< 5% target).
Fairness accuracy
Messages enqueued across 5 fairness keys with weights 1:2:3:4:5. 2,000 messages per key (10,000 total), consuming a window of 5,000.
| Key | Weight | Expected share | Actual share | Deviation |
|---|---|---|---|---|
| tenant-1 | 1 | 6.7% | 6.7% | 0.2% |
| tenant-2 | 2 | 13.3% | 13.4% | 0.2% |
| tenant-3 | 3 | 20.0% | 20.0% | 0.1% |
| tenant-4 | 4 | 26.7% | 26.6% | 0.1% |
| tenant-5 | 5 | 33.3% | 33.3% | 0.1% |
The DRR scheduler distributes messages proportionally to weight within any delivery window. Max deviation is < 1%, well within the < 5% NFR target.
Lua script overhead
Measures per-message overhead of executing an on_enqueue Lua hook.
| Metric | Value | Unit |
|---|---|---|
| Throughput without Lua | 987 | msg/s |
Throughput with on_enqueue hook | 943 | msg/s |
| Per-message overhead | 31.7 | us |
The Lua hook adds < 6 us per-message overhead, well within the < 50 us NFR target.
Fairness key cardinality scaling
Scheduling throughput as the number of distinct fairness keys increases.
| Key count | Throughput (msg/s) |
|---|---|
| 10 | 1,479 |
| 1,000 | 818 |
| 10,000 | 509 |
Consumer concurrency scaling
Aggregate consume throughput with increasing concurrent consumer streams.
| Consumers | Throughput (msg/s) |
|---|---|
| 1 | 66 |
| 10 | 1,009 |
| 100 | 1,863 |
Memory footprint
| Metric | Value |
|---|---|
| RSS idle | 351 MB |
| RSS under load (10K messages) | 351 MB |
Memory usage is dominated by the RocksDB buffer pool, not message count.
RocksDB compaction impact
| Metric | p99 latency |
|---|---|
| Idle (no compaction) | 0.00 ms |
| Active compaction | 0.00 ms |
| Delta | < 0.39 ms |
Compaction has no measurable negative impact on tail latency in single-node benchmarks.
Batch benchmarks
Batch benchmarks measure throughput and latency of multi-message Enqueue (multiple EnqueueMessage items per EnqueueRequest) and compare it against single-message enqueue. These benchmarks are gated behind FILA_BENCH_BATCH=1 because they exercise batch-specific code paths and take additional time.
Enable with FILA_BENCH_BATCH=1:
FILA_BENCH_BATCH=1 cargo bench -p fila-bench --bench system
Multi-message enqueue throughput
Measures multi-message Enqueue throughput at various batch sizes with 1KB messages. Reports both messages/s and batches/s.
| Batch size | Throughput (msg/s) | Batches/s |
|---|---|---|
| 1 | — | — |
| 10 | — | — |
| 50 | — | — |
| 100 | — | — |
| 500 | — | — |
Batch size scaling
Measures throughput as a function of batch size (1 to 1000) to identify the point of diminishing returns.
| Batch size | Throughput (msg/s) |
|---|---|
| 1 | — |
| 5 | — |
| 10 | — |
| 25 | — |
| 50 | — |
| 100 | — |
| 250 | — |
| 500 | — |
| 1000 | — |
Auto-batching latency
Measures end-to-end latency (multi-message enqueue to consume) at various producer concurrency levels. Simulates client-side auto-batching by accumulating messages and flushing via the Enqueue RPC with 50 messages per request.
| Producers | p50 | p95 | p99 | p99.9 | p99.99 | max |
|---|---|---|---|---|---|---|
| 1 | — | — | — | — | — | — |
| 10 | — | — | — | — | — | — |
| 50 | — | — | — | — | — | — |
Batched vs unbatched comparison
Runs identical workloads (3,000 messages) with three approaches and reports throughput and speedup ratios.
| Mode | Throughput (msg/s) | Speedup |
|---|---|---|
| Unbatched | — | 1.0x |
| Explicit batch (size 100) | — | —x |
| Auto-batch (size 100) | — | —x |
Speedup ratios are computed relative to the unbatched baseline.
Delivery batching throughput
Measures consumer throughput with varying concurrent consumer counts. Messages are pre-loaded and continuously produced via multi-message Enqueue.
| Consumers | Throughput (msg/s) |
|---|---|
| 1 | — |
| 10 | — |
| 100 | — |
Concurrent producer batching
Measures aggregate throughput with multiple concurrent producers all using multi-message Enqueue (batch size 100).
| Producers | Throughput (msg/s) |
|---|---|
| 1 | — |
| 5 | — |
| 10 | — |
| 50 | — |
Subsystem benchmarks
Subsystem benchmarks isolate and measure each internal component independently, bypassing the full server stack. This helps identify where time is spent and which component dominates in different workloads.
Enable with FILA_BENCH_SUBSYSTEM=1:
FILA_BENCH_SUBSYSTEM=1 cargo bench -p fila-bench --bench system
RocksDB raw write throughput
Measures raw put_message throughput directly against RocksDB, bypassing scheduler, FIBP, and serialization. Isolates storage engine performance.
| Payload | Throughput (ops/s) | p50 latency | p99 latency |
|---|---|---|---|
| 1KB | — | — | — |
| 64KB | — | — | — |
Protobuf serialization throughput
Measures protobuf encode and decode throughput for EnqueueRequest and ConsumeResponse at three payload sizes. Isolates serialization overhead.
| Payload | Encode (MB/s) | Encode (ns/msg) | Decode (ns/msg) |
|---|---|---|---|
| 64B | — | — | — |
| 1KB | — | — | — |
| 64KB | — | — | — |
Reported for both EnqueueRequest (producer path) and ConsumeResponse (consumer path).
DRR scheduler throughput
Measures next_key() + consume_deficit() cycle throughput at varying active key counts. Isolates the scheduling algorithm from storage I/O.
| Active keys | Throughput (sel/s) |
|---|---|
| 10 | — |
| 1,000 | — |
| 10,000 | — |
FIBP round-trip overhead
Measures round-trip latency for a minimal (1-byte payload) Enqueue request. Quantifies the fixed per-call overhead of FIBP framing, separate from message processing.
| Metric | Value | Unit |
|---|---|---|
| p50 latency | — | us |
| p99 latency | — | us |
| p99.9 latency | — | us |
| Throughput | — | ops/s |
Lua execution throughput
Measures on_enqueue hook execution throughput for three script complexity levels, directly against the Lua VM (no server, no FIBP).
| Script | Throughput (exec/s) | p50 | p99 |
|---|---|---|---|
| No-op (return defaults) | — | — | — |
| Header-set (read 2 headers) | — | — | — |
| Complex routing (string ops, conditionals, table insert) | — | — | — |
Competitive comparison
Fila is compared against Kafka, RabbitMQ, and NATS on queue-oriented workloads. All brokers run in Docker containers and are benchmarked using native Rust clients via the bench-competitive binary. See Methodology for details.
How to run competitive benchmarks
cd bench/competitive
make bench-competitive
Results are written to bench/competitive/results/bench-{broker}.json.
Workloads
Each broker is tested with identical workloads using its recommended high-throughput configuration:
| Workload | Description | Batching |
|---|---|---|
| Throughput | Sustained message production rate (64B, 1KB, 64KB payloads) | Each broker’s recommended batching |
| Latency | Produce-consume round-trip (p50/p95/p99) | Unbatched |
| Lifecycle | Full enqueue-consume-ack cycle (1,000 messages) | Unbatched |
| Multi-producer | 3 concurrent producers aggregate throughput | Each broker’s recommended batching |
| Resources | CPU and memory during benchmark | — |
Broker configurations
| Broker | Version | Mode | Throughput batching |
|---|---|---|---|
| Fila | latest | Docker container, DRR scheduler | AccumulatorMode::Auto (4 concurrent producers) |
| Kafka | 3.9 | KRaft (no ZooKeeper), 1 partition | linger.ms=5, batch.num.messages=1000 |
| RabbitMQ | 3.13 | Quorum queues, durable, manual ack | Per-message (no client-side batching) |
| NATS | 2.11 | JetStream, file storage, pull-subscribe | Per-message (no client-side batching) |
All competitors use production-recommended settings, not development defaults. All brokers use native Rust client libraries (rdkafka, lapin, async-nats). Throughput scenarios use each broker’s recommended batching strategy for a fair comparison. Lifecycle scenarios are unbatched for all brokers.
Run make bench-competitive on your hardware to generate comparison tables.
Results
These are reference numbers from a single run. Your results will vary by hardware. All brokers run in Docker containers. Throughput uses each broker’s recommended batching; lifecycle is unbatched.
Throughput (messages/second, batched)
| Payload | Fila | Kafka | RabbitMQ | NATS |
|---|---|---|---|---|
| 64B | — | — | — | — |
| 1KB | — | — | — | — |
| 64KB | — | — | — | — |
Previous unbatched results (Fila 2,637 msg/s vs Kafka 143,278 msg/s at 1KB) were an unfair comparison: Kafka used
linger.ms=5batching while Fila sent 1 message per RPC. The updated benchmark uses each broker’s recommended batching.
End-to-end latency (1KB payload)
| Percentile | Fila | Kafka | RabbitMQ | NATS |
|---|---|---|---|---|
| p50 | 0.92 ms | 101.62 ms | 1.46 ms | 0.29 ms |
| p95 | 2.82 ms | 105.07 ms | 3.32 ms | 0.42 ms |
| p99 | 4.79 ms | 105.30 ms | 5.59 ms | 0.79 ms |
Lifecycle throughput (enqueue + consume + ack, 1KB, unbatched)
| Broker | msg/s |
|---|---|
| NATS | 25,763 |
| Fila | 2,724 |
| RabbitMQ | 658 |
| Kafka | 356 |
Multi-producer throughput (3 producers, 1KB)
| Broker | msg/s |
|---|---|
| Kafka | 186,708 |
| NATS | 150,676 |
| RabbitMQ | 63,660 |
| Fila | 6,769 |
Resource usage
| Broker | CPU | Memory |
|---|---|---|
| NATS | 1.3% | 12 MB |
| Kafka | 2.1% | 1,276 MB |
| Fila | 3.7% | 874 MB |
| RabbitMQ | 56.8% | 654 MB |
Methodology
Measurement parameters
| Parameter | Value |
|---|---|
| Warmup period | 1 second (discarded) |
| Measurement window | 3 seconds |
| Latency samples | 100 per level |
| Runs for CI regression | 3 (median) |
| Competitive runs | 1 (relative comparison) |
Limitations
- Single-node only. All brokers run as single instances. Clustering performance is not tested.
- No network latency. Brokers run on localhost. Real deployments have network overhead.
- Docker containers. All brokers run in Docker containers for a fair comparison.
- Hardware-specific. Results will vary on different hardware. Always include hardware specs when citing numbers.
Reproducing results
Self-benchmarks:
# Build and run the full benchmark suite
cargo bench -p fila-bench --bench system
# Results written to crates/fila-bench/bench-results.json
Competitive benchmarks:
cd bench/competitive
# Run all brokers
make bench-competitive
# Or individual brokers
make bench-kafka
make bench-rabbitmq
make bench-nats
make bench-fila
# Clean up Docker containers
make bench-clean
See bench/competitive/METHODOLOGY.md for complete methodology documentation including broker configuration details and justifications.
CI regression detection
The bench-regression GitHub Actions workflow runs on every push to main and on pull requests:
- Runs the self-benchmark suite 3 times, takes the median
- On
mainpushes: saves results as the baseline - On PRs: compares against the baseline and flags regressions exceeding the threshold (default: 10%)
- Results are uploaded as workflow artifacts for every run
Traceability
Results in this document are from commit 1e5bb0e (2026-03-26). Run cargo bench -p fila-bench --bench system to generate results for the current version. The JSON output includes the commit hash and timestamp for traceability.