Benchmarks

This page presents Fila’s benchmark results: self-benchmarks measuring single-node performance, and competitive comparisons against Kafka, RabbitMQ, and NATS.

Results from commit 1e5bb0e on 2026-03-26. Run benchmarks on your own hardware for results relevant to your environment. See Reproducing results for instructions.

Self-benchmarks

Self-benchmarks measure Fila’s single-node performance across throughput, latency, scheduling, and resource usage. The benchmark suite is in crates/fila-bench/ and uses the Fila SDK as a blackbox client against a real server instance.

Throughput

Metric	Value	Unit
Enqueue throughput (1KB payload)	5,051	msg/s
Enqueue throughput (1KB payload)	4.93	MB/s

Single producer, sustained over a 3-second measurement window after 1-second warmup.

End-to-end latency

Round-trip latency: produce a message, consume it, measure the interval. 100 samples per load level.

Load level	Producers	p50	p95	p99
Light	1	0.00 ms	0.00 ms	0.00 ms

Fair scheduling overhead

Compares throughput with DRR fair scheduling enabled vs plain FIFO delivery.

Mode	Throughput (msg/s)
FIFO baseline	1,307
Fair scheduling (DRR)	1,247
Overhead	3.1%

The DRR scheduler adds minimal overhead compared to FIFO delivery (< 5% target).

Fairness accuracy

Messages enqueued across 5 fairness keys with weights 1:2:3:4:5. 2,000 messages per key (10,000 total), consuming a window of 5,000.

Key	Weight	Expected share	Actual share	Deviation
tenant-1	1	6.7%	6.7%	0.2%
tenant-2	2	13.3%	13.4%	0.2%
tenant-3	3	20.0%	20.0%	0.1%
tenant-4	4	26.7%	26.6%	0.1%
tenant-5	5	33.3%	33.3%	0.1%

The DRR scheduler distributes messages proportionally to weight within any delivery window. Max deviation is < 1%, well within the < 5% NFR target.

Lua script overhead

Measures per-message overhead of executing an on_enqueue Lua hook.

Metric	Value	Unit
Throughput without Lua	987	msg/s
Throughput with `on_enqueue` hook	943	msg/s
Per-message overhead	31.7	us

The Lua hook adds < 6 us per-message overhead, well within the < 50 us NFR target.

Fairness key cardinality scaling

Scheduling throughput as the number of distinct fairness keys increases.

Key count	Throughput (msg/s)
10	1,479
1,000	818
10,000	509

Consumer concurrency scaling

Aggregate consume throughput with increasing concurrent consumer streams.

Consumers	Throughput (msg/s)
1	66
10	1,009
100	1,863

Memory footprint

Metric	Value
RSS idle	351 MB
RSS under load (10K messages)	351 MB

Memory usage is dominated by the RocksDB buffer pool, not message count.

RocksDB compaction impact

Metric	p99 latency
Idle (no compaction)	0.00 ms
Active compaction	0.00 ms
Delta	< 0.39 ms

Compaction has no measurable negative impact on tail latency in single-node benchmarks.

Batch benchmarks

Batch benchmarks measure throughput and latency of multi-message Enqueue (multiple EnqueueMessage items per EnqueueRequest) and compare it against single-message enqueue. These benchmarks are gated behind FILA_BENCH_BATCH=1 because they exercise batch-specific code paths and take additional time.

Enable with FILA_BENCH_BATCH=1:

FILA_BENCH_BATCH=1 cargo bench -p fila-bench --bench system

Multi-message enqueue throughput

Measures multi-message Enqueue throughput at various batch sizes with 1KB messages. Reports both messages/s and batches/s.

Batch size	Throughput (msg/s)	Batches/s
1	—	—
10	—	—
50	—	—
100	—	—
500	—	—

Batch size scaling

Measures throughput as a function of batch size (1 to 1000) to identify the point of diminishing returns.

Batch size	Throughput (msg/s)
1	—
5	—
10	—
25	—
50	—
100	—
250	—
500	—
1000	—

Auto-batching latency

Measures end-to-end latency (multi-message enqueue to consume) at various producer concurrency levels. Simulates client-side auto-batching by accumulating messages and flushing via the Enqueue RPC with 50 messages per request.

Producers	p50	p95	p99	p99.9	p99.99	max
1	—	—	—	—	—	—
10	—	—	—	—	—	—
50	—	—	—	—	—	—

Batched vs unbatched comparison

Runs identical workloads (3,000 messages) with three approaches and reports throughput and speedup ratios.

Mode	Throughput (msg/s)	Speedup
Unbatched	—	1.0x
Explicit batch (size 100)	—	—x
Auto-batch (size 100)	—	—x

Speedup ratios are computed relative to the unbatched baseline.

Delivery batching throughput

Measures consumer throughput with varying concurrent consumer counts. Messages are pre-loaded and continuously produced via multi-message Enqueue.

Consumers	Throughput (msg/s)
1	—
10	—
100	—

Concurrent producer batching

Measures aggregate throughput with multiple concurrent producers all using multi-message Enqueue (batch size 100).

Producers	Throughput (msg/s)
1	—
5	—
10	—
50	—

Subsystem benchmarks

Subsystem benchmarks isolate and measure each internal component independently, bypassing the full server stack. This helps identify where time is spent and which component dominates in different workloads.

Enable with FILA_BENCH_SUBSYSTEM=1:

FILA_BENCH_SUBSYSTEM=1 cargo bench -p fila-bench --bench system

RocksDB raw write throughput

Measures raw put_message throughput directly against RocksDB, bypassing scheduler, FIBP, and serialization. Isolates storage engine performance.

Payload	Throughput (ops/s)	p50 latency	p99 latency
1KB	—	—	—
64KB	—	—	—

Protobuf serialization throughput

Measures protobuf encode and decode throughput for EnqueueRequest and ConsumeResponse at three payload sizes. Isolates serialization overhead.

Payload	Encode (MB/s)	Encode (ns/msg)	Decode (ns/msg)
64B	—	—	—
1KB	—	—	—
64KB	—	—	—

Reported for both EnqueueRequest (producer path) and ConsumeResponse (consumer path).

DRR scheduler throughput

Measures next_key() + consume_deficit() cycle throughput at varying active key counts. Isolates the scheduling algorithm from storage I/O.

Active keys	Throughput (sel/s)
10	—
1,000	—
10,000	—

FIBP round-trip overhead

Measures round-trip latency for a minimal (1-byte payload) Enqueue request. Quantifies the fixed per-call overhead of FIBP framing, separate from message processing.

Metric	Value	Unit
p50 latency	—	us
p99 latency	—	us
p99.9 latency	—	us
Throughput	—	ops/s

Lua execution throughput

Measures on_enqueue hook execution throughput for three script complexity levels, directly against the Lua VM (no server, no FIBP).

Script	Throughput (exec/s)	p50	p99
No-op (return defaults)	—	—	—
Header-set (read 2 headers)	—	—	—
Complex routing (string ops, conditionals, table insert)	—	—	—

Competitive comparison

Fila is compared against Kafka, RabbitMQ, and NATS on queue-oriented workloads. All brokers run in Docker containers and are benchmarked using native Rust clients via the bench-competitive binary. See Methodology for details.

How to run competitive benchmarks

cd bench/competitive
make bench-competitive

Results are written to bench/competitive/results/bench-{broker}.json.

Workloads

Each broker is tested with identical workloads using its recommended high-throughput configuration:

Workload	Description	Batching
Throughput	Sustained message production rate (64B, 1KB, 64KB payloads)	Each broker’s recommended batching
Latency	Produce-consume round-trip (p50/p95/p99)	Unbatched
Lifecycle	Full enqueue-consume-ack cycle (1,000 messages)	Unbatched
Multi-producer	3 concurrent producers aggregate throughput	Each broker’s recommended batching
Resources	CPU and memory during benchmark	—

Broker configurations

Broker	Version	Mode	Throughput batching
Fila	latest	Docker container, DRR scheduler	`AccumulatorMode::Auto` (4 concurrent producers)
Kafka	3.9	KRaft (no ZooKeeper), 1 partition	`linger.ms=5`, `batch.num.messages=1000`
RabbitMQ	3.13	Quorum queues, durable, manual ack	Per-message (no client-side batching)
NATS	2.11	JetStream, file storage, pull-subscribe	Per-message (no client-side batching)

All competitors use production-recommended settings, not development defaults. All brokers use native Rust client libraries (rdkafka, lapin, async-nats). Throughput scenarios use each broker’s recommended batching strategy for a fair comparison. Lifecycle scenarios are unbatched for all brokers.

Run make bench-competitive on your hardware to generate comparison tables.

Results

These are reference numbers from a single run. Your results will vary by hardware. All brokers run in Docker containers. Throughput uses each broker’s recommended batching; lifecycle is unbatched.

Throughput (messages/second, batched)

Payload	Fila	Kafka	RabbitMQ	NATS
64B	—	—	—	—
1KB	—	—	—	—
64KB	—	—	—	—

Previous unbatched results (Fila 2,637 msg/s vs Kafka 143,278 msg/s at 1KB) were an unfair comparison: Kafka used linger.ms=5 batching while Fila sent 1 message per RPC. The updated benchmark uses each broker’s recommended batching.

End-to-end latency (1KB payload)

Percentile	Fila	Kafka	RabbitMQ	NATS
p50	0.92 ms	101.62 ms	1.46 ms	0.29 ms
p95	2.82 ms	105.07 ms	3.32 ms	0.42 ms
p99	4.79 ms	105.30 ms	5.59 ms	0.79 ms

Lifecycle throughput (enqueue + consume + ack, 1KB, unbatched)

Broker	msg/s
NATS	25,763
Fila	2,724
RabbitMQ	658
Kafka	356

Multi-producer throughput (3 producers, 1KB)

Broker	msg/s
Kafka	186,708
NATS	150,676
RabbitMQ	63,660
Fila	6,769

Resource usage

Broker	CPU	Memory
NATS	1.3%	12 MB
Kafka	2.1%	1,276 MB
Fila	3.7%	874 MB
RabbitMQ	56.8%	654 MB

Methodology

Measurement parameters

Parameter	Value
Warmup period	1 second (discarded)
Measurement window	3 seconds
Latency samples	100 per level
Runs for CI regression	3 (median)
Competitive runs	1 (relative comparison)

Limitations

Single-node only. All brokers run as single instances. Clustering performance is not tested.
No network latency. Brokers run on localhost. Real deployments have network overhead.
Docker containers. All brokers run in Docker containers for a fair comparison.
Hardware-specific. Results will vary on different hardware. Always include hardware specs when citing numbers.

Reproducing results

Self-benchmarks:

# Build and run the full benchmark suite
cargo bench -p fila-bench --bench system

# Results written to crates/fila-bench/bench-results.json

Competitive benchmarks:

cd bench/competitive

# Run all brokers
make bench-competitive

# Or individual brokers
make bench-kafka
make bench-rabbitmq
make bench-nats
make bench-fila

# Clean up Docker containers
make bench-clean

See bench/competitive/METHODOLOGY.md for complete methodology documentation including broker configuration details and justifications.

CI regression detection

The bench-regression GitHub Actions workflow runs on every push to main and on pull requests:

Runs the self-benchmark suite 3 times, takes the median
On main pushes: saves results as the baseline
On PRs: compares against the baseline and flags regressions exceeding the threshold (default: 10%)
Results are uploaded as workflow artifacts for every run

Traceability

Results in this document are from commit 1e5bb0e (2026-03-26). Run cargo bench -p fila-bench --bench system to generate results for the current version. The JSON output includes the commit hash and timestamp for traceability.

Keyboard shortcuts

Fila Documentation