Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmarks

This page presents Fila’s benchmark results: self-benchmarks measuring single-node performance, and competitive comparisons against Kafka, RabbitMQ, and NATS.

Results from commit 1e5bb0e on 2026-03-26. Run benchmarks on your own hardware for results relevant to your environment. See Reproducing results for instructions.

Self-benchmarks

Self-benchmarks measure Fila’s single-node performance across throughput, latency, scheduling, and resource usage. The benchmark suite is in crates/fila-bench/ and uses the Fila SDK as a blackbox client against a real server instance.

Throughput

MetricValueUnit
Enqueue throughput (1KB payload)5,051msg/s
Enqueue throughput (1KB payload)4.93MB/s

Single producer, sustained over a 3-second measurement window after 1-second warmup.

End-to-end latency

Round-trip latency: produce a message, consume it, measure the interval. 100 samples per load level.

Load levelProducersp50p95p99
Light10.00 ms0.00 ms0.00 ms

Fair scheduling overhead

Compares throughput with DRR fair scheduling enabled vs plain FIFO delivery.

ModeThroughput (msg/s)
FIFO baseline1,307
Fair scheduling (DRR)1,247
Overhead3.1%

The DRR scheduler adds minimal overhead compared to FIFO delivery (< 5% target).

Fairness accuracy

Messages enqueued across 5 fairness keys with weights 1:2:3:4:5. 2,000 messages per key (10,000 total), consuming a window of 5,000.

KeyWeightExpected shareActual shareDeviation
tenant-116.7%6.7%0.2%
tenant-2213.3%13.4%0.2%
tenant-3320.0%20.0%0.1%
tenant-4426.7%26.6%0.1%
tenant-5533.3%33.3%0.1%

The DRR scheduler distributes messages proportionally to weight within any delivery window. Max deviation is < 1%, well within the < 5% NFR target.

Lua script overhead

Measures per-message overhead of executing an on_enqueue Lua hook.

MetricValueUnit
Throughput without Lua987msg/s
Throughput with on_enqueue hook943msg/s
Per-message overhead31.7us

The Lua hook adds < 6 us per-message overhead, well within the < 50 us NFR target.

Fairness key cardinality scaling

Scheduling throughput as the number of distinct fairness keys increases.

Key countThroughput (msg/s)
101,479
1,000818
10,000509

Consumer concurrency scaling

Aggregate consume throughput with increasing concurrent consumer streams.

ConsumersThroughput (msg/s)
166
101,009
1001,863

Memory footprint

MetricValue
RSS idle351 MB
RSS under load (10K messages)351 MB

Memory usage is dominated by the RocksDB buffer pool, not message count.

RocksDB compaction impact

Metricp99 latency
Idle (no compaction)0.00 ms
Active compaction0.00 ms
Delta< 0.39 ms

Compaction has no measurable negative impact on tail latency in single-node benchmarks.

Batch benchmarks

Batch benchmarks measure throughput and latency of multi-message Enqueue (multiple EnqueueMessage items per EnqueueRequest) and compare it against single-message enqueue. These benchmarks are gated behind FILA_BENCH_BATCH=1 because they exercise batch-specific code paths and take additional time.

Enable with FILA_BENCH_BATCH=1:

FILA_BENCH_BATCH=1 cargo bench -p fila-bench --bench system

Multi-message enqueue throughput

Measures multi-message Enqueue throughput at various batch sizes with 1KB messages. Reports both messages/s and batches/s.

Batch sizeThroughput (msg/s)Batches/s
1
10
50
100
500

Batch size scaling

Measures throughput as a function of batch size (1 to 1000) to identify the point of diminishing returns.

Batch sizeThroughput (msg/s)
1
5
10
25
50
100
250
500
1000

Auto-batching latency

Measures end-to-end latency (multi-message enqueue to consume) at various producer concurrency levels. Simulates client-side auto-batching by accumulating messages and flushing via the Enqueue RPC with 50 messages per request.

Producersp50p95p99p99.9p99.99max
1
10
50

Batched vs unbatched comparison

Runs identical workloads (3,000 messages) with three approaches and reports throughput and speedup ratios.

ModeThroughput (msg/s)Speedup
Unbatched1.0x
Explicit batch (size 100)—x
Auto-batch (size 100)—x

Speedup ratios are computed relative to the unbatched baseline.

Delivery batching throughput

Measures consumer throughput with varying concurrent consumer counts. Messages are pre-loaded and continuously produced via multi-message Enqueue.

ConsumersThroughput (msg/s)
1
10
100

Concurrent producer batching

Measures aggregate throughput with multiple concurrent producers all using multi-message Enqueue (batch size 100).

ProducersThroughput (msg/s)
1
5
10
50

Subsystem benchmarks

Subsystem benchmarks isolate and measure each internal component independently, bypassing the full server stack. This helps identify where time is spent and which component dominates in different workloads.

Enable with FILA_BENCH_SUBSYSTEM=1:

FILA_BENCH_SUBSYSTEM=1 cargo bench -p fila-bench --bench system

RocksDB raw write throughput

Measures raw put_message throughput directly against RocksDB, bypassing scheduler, FIBP, and serialization. Isolates storage engine performance.

PayloadThroughput (ops/s)p50 latencyp99 latency
1KB
64KB

Protobuf serialization throughput

Measures protobuf encode and decode throughput for EnqueueRequest and ConsumeResponse at three payload sizes. Isolates serialization overhead.

PayloadEncode (MB/s)Encode (ns/msg)Decode (ns/msg)
64B
1KB
64KB

Reported for both EnqueueRequest (producer path) and ConsumeResponse (consumer path).

DRR scheduler throughput

Measures next_key() + consume_deficit() cycle throughput at varying active key counts. Isolates the scheduling algorithm from storage I/O.

Active keysThroughput (sel/s)
10
1,000
10,000

FIBP round-trip overhead

Measures round-trip latency for a minimal (1-byte payload) Enqueue request. Quantifies the fixed per-call overhead of FIBP framing, separate from message processing.

MetricValueUnit
p50 latencyus
p99 latencyus
p99.9 latencyus
Throughputops/s

Lua execution throughput

Measures on_enqueue hook execution throughput for three script complexity levels, directly against the Lua VM (no server, no FIBP).

ScriptThroughput (exec/s)p50p99
No-op (return defaults)
Header-set (read 2 headers)
Complex routing (string ops, conditionals, table insert)

Competitive comparison

Fila is compared against Kafka, RabbitMQ, and NATS on queue-oriented workloads. All brokers run in Docker containers and are benchmarked using native Rust clients via the bench-competitive binary. See Methodology for details.

How to run competitive benchmarks

cd bench/competitive
make bench-competitive

Results are written to bench/competitive/results/bench-{broker}.json.

Workloads

Each broker is tested with identical workloads using its recommended high-throughput configuration:

WorkloadDescriptionBatching
ThroughputSustained message production rate (64B, 1KB, 64KB payloads)Each broker’s recommended batching
LatencyProduce-consume round-trip (p50/p95/p99)Unbatched
LifecycleFull enqueue-consume-ack cycle (1,000 messages)Unbatched
Multi-producer3 concurrent producers aggregate throughputEach broker’s recommended batching
ResourcesCPU and memory during benchmark

Broker configurations

BrokerVersionModeThroughput batching
FilalatestDocker container, DRR schedulerAccumulatorMode::Auto (4 concurrent producers)
Kafka3.9KRaft (no ZooKeeper), 1 partitionlinger.ms=5, batch.num.messages=1000
RabbitMQ3.13Quorum queues, durable, manual ackPer-message (no client-side batching)
NATS2.11JetStream, file storage, pull-subscribePer-message (no client-side batching)

All competitors use production-recommended settings, not development defaults. All brokers use native Rust client libraries (rdkafka, lapin, async-nats). Throughput scenarios use each broker’s recommended batching strategy for a fair comparison. Lifecycle scenarios are unbatched for all brokers.

Run make bench-competitive on your hardware to generate comparison tables.

Results

These are reference numbers from a single run. Your results will vary by hardware. All brokers run in Docker containers. Throughput uses each broker’s recommended batching; lifecycle is unbatched.

Throughput (messages/second, batched)

PayloadFilaKafkaRabbitMQNATS
64B
1KB
64KB

Previous unbatched results (Fila 2,637 msg/s vs Kafka 143,278 msg/s at 1KB) were an unfair comparison: Kafka used linger.ms=5 batching while Fila sent 1 message per RPC. The updated benchmark uses each broker’s recommended batching.

End-to-end latency (1KB payload)

PercentileFilaKafkaRabbitMQNATS
p500.92 ms101.62 ms1.46 ms0.29 ms
p952.82 ms105.07 ms3.32 ms0.42 ms
p994.79 ms105.30 ms5.59 ms0.79 ms

Lifecycle throughput (enqueue + consume + ack, 1KB, unbatched)

Brokermsg/s
NATS25,763
Fila2,724
RabbitMQ658
Kafka356

Multi-producer throughput (3 producers, 1KB)

Brokermsg/s
Kafka186,708
NATS150,676
RabbitMQ63,660
Fila6,769

Resource usage

BrokerCPUMemory
NATS1.3%12 MB
Kafka2.1%1,276 MB
Fila3.7%874 MB
RabbitMQ56.8%654 MB

Methodology

Measurement parameters

ParameterValue
Warmup period1 second (discarded)
Measurement window3 seconds
Latency samples100 per level
Runs for CI regression3 (median)
Competitive runs1 (relative comparison)

Limitations

  • Single-node only. All brokers run as single instances. Clustering performance is not tested.
  • No network latency. Brokers run on localhost. Real deployments have network overhead.
  • Docker containers. All brokers run in Docker containers for a fair comparison.
  • Hardware-specific. Results will vary on different hardware. Always include hardware specs when citing numbers.

Reproducing results

Self-benchmarks:

# Build and run the full benchmark suite
cargo bench -p fila-bench --bench system

# Results written to crates/fila-bench/bench-results.json

Competitive benchmarks:

cd bench/competitive

# Run all brokers
make bench-competitive

# Or individual brokers
make bench-kafka
make bench-rabbitmq
make bench-nats
make bench-fila

# Clean up Docker containers
make bench-clean

See bench/competitive/METHODOLOGY.md for complete methodology documentation including broker configuration details and justifications.

CI regression detection

The bench-regression GitHub Actions workflow runs on every push to main and on pull requests:

  • Runs the self-benchmark suite 3 times, takes the median
  • On main pushes: saves results as the baseline
  • On PRs: compares against the baseline and flags regressions exceeding the threshold (default: 10%)
  • Results are uploaded as workflow artifacts for every run

Traceability

Results in this document are from commit 1e5bb0e (2026-03-26). Run cargo bench -p fila-bench --bench system to generate results for the current version. The JSON output includes the commit hash and timestamp for traceability.