Fila
A message broker that makes fair scheduling and per-key throttling first-class primitives.
The problem
Every existing broker delivers messages in FIFO order. When multiple tenants, customers, or workload types share a queue, a single noisy producer can starve everyone else. Rate limiting is pushed to the consumer — which means the consumer has to fetch a message, check the limit, and re-enqueue it. That wastes work and adds latency.
Fila moves scheduling decisions into the broker:
- Deficit Round Robin (DRR) fair scheduling — each fairness key gets its fair share of delivery bandwidth. No tenant starves another.
- Token bucket throttling — per-key rate limits enforced at the broker, before delivery. Consumers only receive messages that are ready to process.
- Lua rules engine —
on_enqueueandon_failurehooks let you define scheduling policy (assign fairness keys, set weights, decide retry vs. dead-letter) in user-supplied Lua scripts. - Zero wasted work — consumers never receive a message they can’t act on.
Quickstart
Docker
docker run -p 5555:5555 ghcr.io/faiscadev/fila:dev
Install script
curl -fsSL https://raw.githubusercontent.com/faiscadev/fila/main/install.sh -o install.sh
less install.sh
bash install.sh
fila-server
Cargo
cargo install fila-server fila-cli
fila-server
Try it out
Once the broker is running on localhost:5555:
# Create a queue
fila queue create orders
# Enqueue a message
fila enqueue orders --header tenant=acme --payload hello
# Check queue stats
fila queue inspect orders
Core Concepts
This document explains the key concepts behind Fila’s scheduling and message handling.
Message lifecycle
A message moves through these states:
Producer Broker Consumer
| | |
|-- Enqueue ----------------->| |
| |-- on_enqueue (Lua) --------->|
| | assigns fairness_key, |
| | weight, throttle_keys |
| | |
| |-- Stored (pending) --------->|
| | |
| |-- DRR scheduler picks ------>|
| | checks throttle tokens |
| | |
| |-- Consume (leased) -------->|-- Processing
| | |
| |<-------- Ack ----------------| (success)
| | message deleted |
| | |
| |<-------- Nack ---------------| (failure)
| |-- on_failure (Lua) --------->|
| | retry or dead-letter |
| | |
| |-- Visibility timeout ------->|
| | re-enqueue if not acked |
- Enqueue — producer sends a message to a queue. If the queue has an
on_enqueueLua script, it runs to assign scheduling metadata. - Pending — the message is persisted in RocksDB and indexed by fairness key.
- Scheduled — the DRR scheduler picks the next fairness key and checks throttle tokens. If tokens are available, the message is delivered to a waiting consumer.
- Leased — the consumer is processing the message. A visibility timeout timer starts.
- Acked — the consumer confirms success. The message is deleted.
- Nacked — the consumer reports failure. The
on_failurehook decides: retry (re-enqueue) or dead-letter. - Expired — if the visibility timeout fires before ack/nack, the message is automatically re-enqueued.
Fairness groups
Every message belongs to a fairness group identified by its fairness_key. The key is assigned during enqueue — either by an on_enqueue Lua script or defaulting to "default".
Common fairness key strategies:
- Per-tenant:
msg.headers["tenant_id"]— prevents one tenant from monopolizing the queue - Per-customer:
msg.headers["customer_id"]— fair delivery across customers - Per-priority:
msg.headers["priority"]— combined with weights for priority scheduling
Deficit Round Robin (DRR)
Fila uses the DRR algorithm to schedule delivery across fairness groups:
- Each fairness key has a deficit counter (starts at 0) and a weight (default 1).
- In each scheduling round, every key receives
weight * quantumadditional deficit. - The scheduler delivers messages from a key as long as its deficit is positive, decrementing by 1 per delivery.
- When a key’s deficit reaches 0 or it has no pending messages, the scheduler moves to the next key.
Example: Two tenants with equal weight and quantum=1000. Each gets 1000 deficit per round — the scheduler delivers ~1000 messages from tenant A, then ~1000 from tenant B, then back to A. A noisy tenant sending 100x more messages doesn’t starve the quiet tenant.
Weights: A key with weight=3 gets 3x the deficit of a key with weight=1, so it receives ~3x the delivery bandwidth. Use weights for priority lanes.
Token bucket throttling
Fila supports per-key rate limiting via token bucket throttlers. Each throttle key has:
- rate — tokens refilled per second
- burst — maximum tokens the bucket can hold
When the scheduler is about to deliver a message, it checks all of the message’s throttle_keys. If any bucket is empty, the message is held until tokens refill. The consumer never receives a message it would have to reject for rate limiting.
Setting up throttle rates
Throttle rates are managed via runtime configuration:
# Allow 10 requests/second with burst of 20 for the "api" throttle key
fila config set throttle:api:rate 10
fila config set throttle:api:burst 20
Messages are assigned throttle keys in the on_enqueue Lua hook:
function on_enqueue(msg)
return {
fairness_key = msg.headers["tenant"],
throttle_keys = { msg.headers["api_endpoint"] }
}
end
Lua hooks
Fila embeds a Lua 5.4 runtime for user-defined scheduling policy. Scripts run inside a sandbox with configurable timeouts and memory limits.
on_enqueue
Runs when a message is enqueued. Returns scheduling metadata:
function on_enqueue(msg)
-- msg.headers — table of string key-value pairs
-- msg.payload_size — byte count of the payload
-- msg.queue — queue name
return {
fairness_key = msg.headers["tenant"] or "default",
weight = tonumber(msg.headers["priority"]) or 1,
throttle_keys = { msg.headers["endpoint"] }
}
end
Return fields:
| Field | Type | Default | Description |
|---|---|---|---|
fairness_key | string | "default" | Groups the message for DRR scheduling |
weight | number | 1 | DRR weight for this fairness key |
throttle_keys | list of strings | [] | Token bucket keys to check before delivery |
on_failure
Runs when a consumer nacks a message. Decides retry vs. dead-letter:
function on_failure(msg)
-- msg.headers — table of string key-value pairs
-- msg.id — message UUID
-- msg.attempts — current attempt count
-- msg.queue — queue name
-- msg.error — error description from the nack
if msg.attempts >= 3 then
return { action = "dlq" }
end
return { action = "retry", delay_ms = 1000 * msg.attempts }
end
Return fields:
| Field | Type | Description |
|---|---|---|
action | "retry" or "dlq" | Whether to re-enqueue or dead-letter |
delay_ms | number (optional) | Delay before re-enqueue (retry only) |
Lua API
Scripts can read runtime configuration from the broker:
local limit = fila.get("rate_limit:tenant_a") -- returns string or nil
Safety
| Setting | Default | Description |
|---|---|---|
lua.default_timeout_ms | 10 | Max script execution time |
lua.default_memory_limit_bytes | 1 MB | Max memory per script |
lua.circuit_breaker_threshold | 3 | Consecutive failures before circuit break |
lua.circuit_breaker_cooldown_ms | 10000 | Cooldown period after circuit break |
When the circuit breaker trips, Lua hooks are bypassed and messages use default scheduling (fairness_key="default", weight=1, no throttle keys). The circuit breaker resets automatically after the cooldown period.
Dead letter queue
Messages that exhaust retries (when on_failure returns { action = "dlq" }) are moved to a dead letter queue named <queue>.dlq. For example, messages dead-lettered from orders go to orders.dlq.
Inspecting and redriving
# Check how many messages are in the DLQ
fila queue inspect orders.dlq
# Move 10 messages back to the source queue
fila redrive orders.dlq --count 10
Redrive moves pending (non-leased) messages from the DLQ back to the original source queue, where they go through the normal enqueue flow again.
Runtime configuration
The broker maintains a key-value configuration store that persists across restarts. Values are accessible from Lua scripts via fila.get(key) and managed through the CLI or API.
fila config set feature:new_flow enabled
fila config get feature:new_flow
fila config list --prefix feature:
Common use cases:
- Feature flags: toggle behavior in Lua scripts without redeployment
- Throttle rates:
throttle:<key>:rateandthrottle:<key>:burst - Dynamic routing: change fairness key assignment logic based on config values
Visibility timeout
When a consumer receives a message via Consume, the message is “leased” for a configurable duration (set per-queue at creation time via visibility_timeout_ms). During this lease:
- The message is not delivered to other consumers
- A timer tracks the lease expiry
If the consumer does not Ack or Nack the message before the timeout expires, the message is automatically re-enqueued and becomes available for delivery again. This prevents messages from being lost when consumers crash.
The default visibility timeout is set per-queue at creation:
fila queue create orders --visibility-timeout 30000 # 30 seconds
Tutorials
Step-by-step guides for common Fila use cases. Each tutorial assumes you have a running broker (see quickstart).
Multi-tenant fair scheduling
Goal: Prevent a noisy tenant from starving other tenants in a shared queue.
1. Create a queue with tenant-aware fairness
fila queue create orders \
--on-enqueue 'function on_enqueue(msg)
return { fairness_key = msg.headers["tenant_id"] or "default" }
end'
The on_enqueue hook extracts a tenant_id header and uses it as the fairness key. Each unique tenant gets its own DRR scheduling group.
2. Produce messages from multiple tenants
# Python SDK
from fila import FilaClient
client = FilaClient("localhost:5555")
# Noisy tenant sends 1000 messages
for i in range(1000):
client.enqueue("orders", {"tenant_id": "noisy-corp"}, f"order-{i}")
# Other tenants send a few each
for tenant in ["acme", "globex", "initech"]:
for i in range(10):
client.enqueue("orders", {"tenant_id": tenant}, f"{tenant}-order-{i}")
3. Consume and observe fairness
stream = client.consume("orders")
for msg in stream:
print(f"tenant={msg.metadata.fairness_key} id={msg.id}")
client.ack("orders", msg.id)
Without Fila, all 1000 noisy-corp messages would be delivered first. With DRR scheduling, each tenant gets interleaved delivery — acme, globex, and initech messages arrive alongside noisy-corp’s, not after.
4. Verify with stats
fila queue inspect orders
The per-key breakdown shows each tenant’s pending count and current DRR deficit.
Weighted fairness
Give premium tenants more bandwidth by setting weights:
fila queue create orders \
--on-enqueue 'function on_enqueue(msg)
local weights = { premium = 3, standard = 1 }
local tier = msg.headers["tier"] or "standard"
return {
fairness_key = msg.headers["tenant_id"] or "default",
weight = weights[tier] or 1
}
end'
A premium tenant with weight=3 gets 3x the delivery bandwidth of a standard tenant with weight=1.
Per-provider throttling
Goal: Rate-limit outgoing API calls per external provider without wasting consumer resources.
1. Create a queue with throttle keys
fila queue create api-calls \
--on-enqueue 'function on_enqueue(msg)
local keys = {}
if msg.headers["provider"] then
table.insert(keys, "provider:" .. msg.headers["provider"])
end
return {
fairness_key = msg.headers["tenant"] or "default",
throttle_keys = keys
}
end'
2. Set throttle rates
# Stripe: 100 requests/second, burst up to 150
fila config set throttle.provider:stripe 100,150
# SendGrid: 10 requests/second, burst up to 20
fila config set throttle.provider:sendgrid 10,20
The format is rate,burst. Rate is tokens per second; burst is the maximum bucket capacity.
3. Produce messages
// Go SDK
client, _ := fila.Connect("localhost:5555")
// These will be throttled to 100/s
for i := 0; i < 500; i++ {
client.Enqueue(ctx, "api-calls", map[string]string{
"tenant": "acme",
"provider": "stripe",
}, []byte(fmt.Sprintf("charge-%d", i)))
}
4. Consume — the broker does the throttling
stream, _ := client.Consume(ctx, "api-calls")
for msg := range stream {
// Every message received is within the rate limit.
// No need to check limits client-side.
callExternalAPI(msg.Payload)
client.Ack(ctx, "api-calls", msg.ID)
}
Consumers receive messages at the provider’s rate limit. No consumer-side rate checking, no wasted fetches, no re-enqueue loops.
Adjusting rates at runtime
Change rates without restarting the broker:
# Double Stripe's rate
fila config set throttle.provider:stripe 200,300
The token bucket updates immediately.
Exponential backoff retry
Goal: Retry failed messages with increasing delays, then dead-letter after max attempts.
1. Create a queue with retry logic
fila queue create jobs \
--on-enqueue 'function on_enqueue(msg)
return { fairness_key = msg.headers["job_type"] or "default" }
end' \
--on-failure 'function on_failure(msg)
local max_attempts = tonumber(fila.get("max_retries") or "5")
if msg.attempts >= max_attempts then
return { action = "dlq" }
end
-- Exponential backoff: 1s, 2s, 4s, 8s, 16s...
local delay = math.min(1000 * (2 ^ (msg.attempts - 1)), 60000)
return { action = "retry", delay_ms = delay }
end' \
--visibility-timeout 30000
2. Configure max retries at runtime
fila config set max_retries 3
The on_failure hook reads this with fila.get("max_retries"). Change it without redeploying.
3. Process messages with failure handling
// JavaScript SDK
const { FilaClient } = require('fila-client');
async function main() {
const client = await FilaClient.connect('localhost:5555');
const stream = client.consume('jobs');
for await (const msg of stream) {
try {
await processJob(msg.payload);
await client.ack('jobs', msg.id);
} catch (err) {
// Nack triggers on_failure hook — broker handles retry/DLQ
await client.nack('jobs', msg.id, err.message);
}
}
}
main().catch(console.error);
4. Monitor and redrive
# Check how many messages ended up in the DLQ
fila queue inspect jobs.dlq
# After fixing the root cause, redrive them
fila redrive jobs.dlq --count 0 # 0 = all messages
Customizing backoff per job type
Read config to customize behavior per job type:
function on_failure(msg)
-- Different retry strategies per job type
local job_type = msg.headers["job_type"] or "default"
local max = tonumber(fila.get("max_retries:" .. job_type) or "5")
if msg.attempts >= max then
return { action = "dlq" }
end
-- Critical jobs: shorter delays, more retries
-- Batch jobs: longer delays, fewer retries
local base_ms = tonumber(fila.get("retry_base_ms:" .. job_type) or "1000")
local delay = math.min(base_ms * (2 ^ (msg.attempts - 1)), 300000)
return { action = "retry", delay_ms = delay }
end
fila config set max_retries:payment 10
fila config set retry_base_ms:payment 500
fila config set max_retries:report 3
fila config set retry_base_ms:report 5000
Lua Hook Patterns
Copy-paste patterns for common scheduling scenarios. See concepts for hook API details.
Tenant fairness
Assign each tenant its own fairness group so the DRR scheduler gives equal delivery bandwidth:
function on_enqueue(msg)
return {
fairness_key = msg.headers["tenant_id"] or "default"
}
end
With weighted tiers
Premium tenants get more bandwidth:
function on_enqueue(msg)
local tier = msg.headers["tier"] or "standard"
local weight = 1
if tier == "premium" then weight = 3 end
if tier == "enterprise" then weight = 5 end
return {
fairness_key = msg.headers["tenant_id"] or "default",
weight = weight
}
end
With dynamic weights from config
function on_enqueue(msg)
local tenant = msg.headers["tenant_id"] or "default"
local weight = tonumber(fila.get("weight:" .. tenant) or "1")
return {
fairness_key = tenant,
weight = weight
}
end
Set weights at runtime: fila config set weight:acme 5
Provider throttling
Rate-limit outbound calls per external API provider:
function on_enqueue(msg)
local keys = {}
if msg.headers["provider"] then
table.insert(keys, "provider:" .. msg.headers["provider"])
end
return {
fairness_key = msg.headers["tenant"] or "default",
throttle_keys = keys
}
end
Set rates: fila config set throttle.provider:stripe 100,200
Multi-dimensional throttling
Throttle by both provider and tenant (composite key):
function on_enqueue(msg)
local tenant = msg.headers["tenant"] or "default"
local provider = msg.headers["provider"]
local keys = {}
if provider then
-- Global provider limit
table.insert(keys, "provider:" .. provider)
-- Per-tenant-per-provider limit
table.insert(keys, "tenant-provider:" .. tenant .. ":" .. provider)
end
return {
fairness_key = tenant,
throttle_keys = keys
}
end
# Global: Stripe allows 1000 req/s total
fila config set throttle.provider:stripe 1000,1500
# Per-tenant: each tenant gets at most 100 req/s to Stripe
fila config set throttle.tenant-provider:acme:stripe 100,150
fila config set throttle.tenant-provider:globex:stripe 100,150
Exponential backoff retry
Retry with increasing delays, dead-letter after max attempts:
function on_failure(msg)
if msg.attempts >= 5 then
return { action = "dlq" }
end
-- 1s, 2s, 4s, 8s, 16s
local delay = math.min(1000 * (2 ^ (msg.attempts - 1)), 60000)
return { action = "retry", delay_ms = delay }
end
With configurable max retries
function on_failure(msg)
local max = tonumber(fila.get("max_retries") or "5")
if msg.attempts >= max then
return { action = "dlq" }
end
local delay = math.min(1000 * (2 ^ (msg.attempts - 1)), 60000)
return { action = "retry", delay_ms = delay }
end
Change at runtime: fila config set max_retries 10
Linear backoff
function on_failure(msg)
if msg.attempts >= 5 then
return { action = "dlq" }
end
-- 5s, 10s, 15s, 20s, 25s
return { action = "retry", delay_ms = 5000 * msg.attempts }
end
Immediate retry (no delay)
function on_failure(msg)
if msg.attempts >= 3 then
return { action = "dlq" }
end
return { action = "retry", delay_ms = 0 }
end
Header-based routing
Use headers to make dynamic scheduling decisions.
Route by priority
function on_enqueue(msg)
local priority = msg.headers["priority"] or "normal"
local weights = {
critical = 10,
high = 5,
normal = 2,
low = 1
}
return {
fairness_key = "priority:" .. priority,
weight = weights[priority] or 2
}
end
Route by region
function on_enqueue(msg)
local region = msg.headers["region"] or "default"
return {
fairness_key = "region:" .. region,
throttle_keys = { "region:" .. region }
}
end
# Rate limit per region
fila config set throttle.region:us-east 500,750
fila config set throttle.region:eu-west 300,450
Conditional dead-letter by error type
function on_failure(msg)
-- Permanent errors: dead-letter immediately
if msg.error:find("4%d%d") then -- HTTP 4xx
return { action = "dlq" }
end
-- Transient errors: retry with backoff
if msg.attempts >= 5 then
return { action = "dlq" }
end
local delay = 1000 * (2 ^ (msg.attempts - 1))
return { action = "retry", delay_ms = delay }
end
Feature flag gating
function on_enqueue(msg)
local tenant = msg.headers["tenant"] or "default"
local new_flow = fila.get("feature:new_flow:" .. tenant)
if new_flow == "enabled" then
return { fairness_key = tenant .. ":v2", weight = 1 }
end
return { fairness_key = tenant, weight = 1 }
end
# Enable new flow for one tenant
fila config set feature:new_flow:acme enabled
Built-in helpers
Fila provides fila.helpers — a set of convenience functions for common patterns. These are available in all scripts alongside fila.get().
fila.helpers.exponential_backoff(attempts, base_ms, max_ms)
Returns a delay in milliseconds with exponential growth and ±25% jitter.
attempts— current attempt countbase_ms— base delay (first attempt delay)max_ms— maximum delay cap
function on_failure(msg)
if msg.attempts >= 5 then
return { action = "dlq" }
end
return {
action = "retry",
delay_ms = fila.helpers.exponential_backoff(msg.attempts, 1000, 60000)
}
end
fila.helpers.tenant_route(msg, header_name)
Extracts a header value as the fairness key. Returns { fairness_key = "default" } if the header is missing.
function on_enqueue(msg)
return fila.helpers.tenant_route(msg, "tenant_id")
end
fila.helpers.rate_limit_keys(msg, patterns)
Generates throttle key strings from patterns. Each {placeholder} is replaced by the corresponding header value. Patterns with missing headers are omitted.
function on_enqueue(msg)
local route = fila.helpers.tenant_route(msg, "tenant_id")
route.throttle_keys = fila.helpers.rate_limit_keys(msg, {
"provider:{provider}",
"region:{region}"
})
return route
end
fila.helpers.max_retries(attempts, max)
Returns { action = "retry" } if attempts < max, otherwise { action = "dlq" }.
function on_failure(msg)
return fila.helpers.max_retries(msg.attempts, 5)
end
Combining helpers
Helpers compose naturally. Here’s a complete on_failure script using multiple helpers:
function on_failure(msg)
local decision = fila.helpers.max_retries(msg.attempts, 5)
if decision.action == "retry" then
decision.delay_ms = fila.helpers.exponential_backoff(msg.attempts, 1000, 60000)
end
return decision
end
SDK Quick Start
Get up and running with Fila in your language of choice. Each section covers installation, connection, and the core enqueue/consume/ack flow.
Prerequisites
# Start a Fila broker
fila-server &
# Create a queue
fila queue create demo
Rust
cargo add fila-sdk tokio tokio-stream
use fila_sdk::FilaClient;
use std::collections::HashMap;
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = FilaClient::connect("http://localhost:5555").await?;
// Enqueue a message
let mut headers = HashMap::new();
headers.insert("tenant".to_string(), "acme".to_string());
let id = client.enqueue("demo", headers, b"hello".to_vec()).await?;
println!("enqueued: {id}");
// Consume messages
let mut stream = client.consume("demo").await?;
if let Some(msg) = stream.next().await {
let msg = msg?;
println!("received: {}", String::from_utf8_lossy(&msg.payload));
// Acknowledge
client.ack("demo", &msg.id).await?;
}
Ok(())
}
Go
go get github.com/faiscadev/fila-go
package main
import (
"context"
"fmt"
"log"
fila "github.com/faiscadev/fila-go"
)
func main() {
client, err := fila.Connect("localhost:5555")
if err != nil {
log.Fatal(err)
}
defer client.Close()
ctx := context.Background()
// Enqueue
id, err := client.Enqueue(ctx, "demo",
map[string]string{"tenant": "acme"}, []byte("hello"))
if err != nil {
log.Fatal(err)
}
fmt.Println("enqueued:", id)
// Consume
stream, err := client.Consume(ctx, "demo")
if err != nil {
log.Fatal(err)
}
msg, err := stream.Recv()
if err != nil {
log.Fatal(err)
}
fmt.Printf("received: %s\n", msg.Payload)
// Ack
if err := client.Ack(ctx, "demo", msg.ID); err != nil {
log.Fatal(err)
}
}
Python
pip install fila
import asyncio
from fila import FilaClient
async def main():
client = await FilaClient.connect("localhost:5555")
# Enqueue
msg_id = await client.enqueue("demo",
headers={"tenant": "acme"}, payload=b"hello")
print(f"enqueued: {msg_id}")
# Consume
async for msg in client.consume("demo"):
print(f"received: {msg.payload}")
await client.ack("demo", msg.id)
break
asyncio.run(main())
JavaScript / Node.js
npm install fila-client
const { FilaClient } = require('fila-client');
async function main() {
const client = await FilaClient.connect('localhost:5555');
// Enqueue
const id = await client.enqueue('demo',
{ tenant: 'acme' }, Buffer.from('hello'));
console.log('enqueued:', id);
// Consume
const stream = client.consume('demo');
for await (const msg of stream) {
console.log('received:', msg.payload.toString());
await client.ack('demo', msg.id);
break;
}
}
main().catch(console.error);
Ruby
gem install fila-client
require 'fila'
client = Fila::Client.new('localhost:5555')
# Enqueue
id = client.enqueue('demo',
headers: { 'tenant' => 'acme' }, payload: 'hello')
puts "enqueued: #{id}"
# Consume
client.consume('demo') do |msg|
puts "received: #{msg.payload}"
client.ack('demo', msg.id)
break
end
Java
<dependency>
<groupId>dev.faisca</groupId>
<artifactId>fila-client</artifactId>
<version>0.1.0</version>
</dependency>
import dev.faisca.fila.FilaClient;
import java.util.Map;
public class Demo {
public static void main(String[] args) throws Exception {
var client = FilaClient.connect("localhost:5555");
// Enqueue
var id = client.enqueue("demo",
Map.of("tenant", "acme"), "hello".getBytes());
System.out.println("enqueued: " + id);
// Consume
var stream = client.consume("demo");
var msg = stream.next();
System.out.println("received: " + new String(msg.getPayload()));
// Ack
client.ack("demo", msg.getId());
}
}
Security
TLS
Connect to a TLS-enabled broker by specifying the CA certificate or using system trust store.
| SDK | TLS Connection |
|---|---|
| Rust | FilaClient::builder("https://host:5555").with_tls().connect().await? |
| Go | fila.Connect("host:5555", fila.WithTLS()) |
| Python | FilaClient.connect("host:5555", tls=True) |
| JavaScript | FilaClient.connect('host:5555', { tls: true }) |
| Ruby | Fila::Client.new('host:5555', tls: true) |
| Java | FilaClient.connect("host:5555", FilaClient.withTLS()) |
For custom CA certificates:
| SDK | Custom CA |
|---|---|
| Rust | .with_tls_ca("path/to/ca.crt") |
| Go | fila.WithTLSCA("path/to/ca.crt") |
| Python | tls_ca="path/to/ca.crt" |
| JavaScript | { tlsCa: 'path/to/ca.crt' } |
| Ruby | tls_ca: 'path/to/ca.crt' |
| Java | FilaClient.withTLSCA("path/to/ca.crt") |
API Key Authentication
| SDK | API Key |
|---|---|
| Rust | .with_api_key("your-key") |
| Go | fila.WithAPIKey("your-key") |
| Python | api_key="your-key" |
| JavaScript | { apiKey: 'your-key' } |
| Ruby | api_key: 'your-key' |
| Java | FilaClient.withAPIKey("your-key") |
SDK Examples
Working code for every SDK showing the core enqueue -> consume -> ack flow.
Setup
All examples assume a running broker with a queue:
fila-server &
fila queue create demo
Rust
use std::collections::HashMap;
use std::time::Duration;
use fila_sdk::FilaClient;
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = FilaClient::connect("http://localhost:5555").await?;
// Enqueue
let mut headers = HashMap::new();
headers.insert("tenant".to_string(), "acme".to_string());
let id = client.enqueue("demo", headers, b"hello".to_vec()).await?;
println!("enqueued: {id}");
// Consume
let mut stream = client.consume("demo").await?;
let msg = tokio::time::timeout(Duration::from_secs(5), stream.next())
.await??
.expect("stream error");
println!("received: {} ({})", msg.id, String::from_utf8_lossy(&msg.payload));
// Ack
client.ack("demo", &msg.id).await?;
println!("acked: {}", msg.id);
Ok(())
}
Add to Cargo.toml:
[dependencies]
fila-sdk = "0.1"
tokio = { version = "1", features = ["full"] }
tokio-stream = "0.1"
Rust with API key authentication
#![allow(unused)]
fn main() {
let opts = ConnectOptions::new("127.0.0.1:5555")
.with_api_key("my-secret-key");
let client = FilaClient::connect_with_options(opts).await?;
}
Go
package main
import (
"context"
"fmt"
"log"
fila "github.com/faiscadev/fila-go"
)
func main() {
ctx := context.Background()
client, err := fila.Connect("localhost:5555")
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Enqueue
id, err := client.Enqueue(ctx, "demo", map[string]string{
"tenant": "acme",
}, []byte("hello"))
if err != nil {
log.Fatal(err)
}
fmt.Println("enqueued:", id)
// Consume
stream, err := client.Consume(ctx, "demo")
if err != nil {
log.Fatal(err)
}
msg, err := stream.Next()
if err != nil {
log.Fatal(err)
}
fmt.Printf("received: %s (%s)\n", msg.ID, string(msg.Payload))
// Ack
if err := client.Ack(ctx, "demo", msg.ID); err != nil {
log.Fatal(err)
}
fmt.Println("acked:", msg.ID)
}
Python
from fila import FilaClient
client = FilaClient("localhost:5555")
# Enqueue
msg_id = client.enqueue("demo", {"tenant": "acme"}, b"hello")
print(f"enqueued: {msg_id}")
# Consume
stream = client.consume("demo")
msg = next(stream)
print(f"received: {msg.id} ({msg.payload.decode()})")
# Ack
client.ack("demo", msg.id)
print(f"acked: {msg.id}")
JavaScript / Node.js
const { FilaClient } = require('fila-client');
async function main() {
const client = await FilaClient.connect('localhost:5555');
// Enqueue
const id = await client.enqueue('demo', { tenant: 'acme' }, Buffer.from('hello'));
console.log('enqueued:', id);
// Consume
const stream = client.consume('demo');
for await (const msg of stream) {
console.log(`received: ${msg.id} (${msg.payload.toString()})`);
// Ack
await client.ack('demo', msg.id);
console.log('acked:', msg.id);
break; // just one message for the demo
}
}
main().catch(console.error);
Ruby
require 'fila'
client = Fila::Client.new('localhost:5555')
# Enqueue
id = client.enqueue('demo', { 'tenant' => 'acme' }, 'hello')
puts "enqueued: #{id}"
# Consume
client.consume('demo') do |msg|
puts "received: #{msg.id} (#{msg.payload})"
# Ack
client.ack('demo', msg.id)
puts "acked: #{msg.id}"
break
end
Java
import dev.fila.client.FilaClient;
import dev.fila.client.Message;
public class Demo {
public static void main(String[] args) throws Exception {
FilaClient client = FilaClient.connect("localhost:5555");
// Enqueue
String id = client.enqueue("demo",
java.util.Map.of("tenant", "acme"),
"hello".getBytes());
System.out.println("enqueued: " + id);
// Consume
var stream = client.consume("demo");
Message msg = stream.next();
System.out.printf("received: %s (%s)%n", msg.getId(),
new String(msg.getPayload()));
// Ack
client.ack("demo", msg.getId());
System.out.println("acked: " + msg.getId());
client.close();
}
}
Integration Patterns
Common messaging patterns with Fila.
Producer / Consumer
The most basic pattern: one service produces messages, another consumes them.
┌──────────┐ enqueue ┌──────┐ consume ┌──────────┐
│ Producer │ ────────────► │ Fila │ ────────────► │ Consumer │
└──────────┘ └──────┘ └──────────┘
Use when: You want to decouple a producer from a consumer — the producer doesn’t need to wait for processing to complete.
# producer.py
import asyncio
from fila import FilaClient
async def produce():
client = await FilaClient.connect("localhost:5555")
for i in range(100):
await client.enqueue("orders",
headers={"tenant": "acme"},
payload=f"order-{i}".encode())
asyncio.run(produce())
# consumer.py
import asyncio
from fila import FilaClient
async def consume():
client = await FilaClient.connect("localhost:5555")
async for msg in client.consume("orders"):
print(f"processing: {msg.payload}")
# ... do work ...
await client.ack("orders", msg.id)
asyncio.run(consume())
Fan-out
Multiple consumers process different types of work from a shared queue. Fila’s fairness keys ensure each workload type gets its fair share.
┌──────────────┐
┌──► │ Consumer (A) │
┌──────────┐ enqueue │ └──────────────┘
│ Producer │ ──────► ┌──┴──┐
└──────────┘ │ Fila │ fairness_key routing
└──┬──┘
└──► ┌──────────────┐
│ Consumer (B) │
└──────────────┘
Use when: Different message types need fair scheduling. Without fairness keys, a burst of type-A messages would starve type-B consumers.
# Producer assigns fairness keys via Lua on_enqueue
# Queue created with:
# fila queue create work --on-enqueue '
# function on_enqueue(msg)
# return { fairness_key = msg.headers["type"] or "default" }
# end
# '
# producer.py
async def produce():
client = await FilaClient.connect("localhost:5555")
# Type A messages (high volume)
for i in range(1000):
await client.enqueue("work",
headers={"type": "email"}, payload=f"email-{i}".encode())
# Type B messages (low volume but equally important)
for i in range(10):
await client.enqueue("work",
headers={"type": "sms"}, payload=f"sms-{i}".encode())
# Fila's DRR scheduler ensures SMS messages aren't starved by emails
Request-Reply
Implement synchronous-style request/reply over async messaging using a correlation ID and a reply queue.
┌─────────┐ enqueue(request) ┌──────┐ consume ┌─────────┐
│ Client │ ─────────────────► │ Fila │ ────────► │ Service │
│ │ ◄───────────────── │ │ ◄──────── │ │
└─────────┘ consume(reply) └──────┘ enqueue └─────────┘
(reply)
Use when: A service needs a response but you still want the decoupling and reliability benefits of message queues.
// client.go — sends request, waits for reply
package main
import (
"context"
"fmt"
"log"
"github.com/google/uuid"
fila "github.com/faiscadev/fila-go"
)
func main() {
client, _ := fila.Connect("localhost:5555")
ctx := context.Background()
correlationID := uuid.New().String()
replyQueue := "replies-" + correlationID
// Create a temporary reply queue
client.CreateQueue(ctx, replyQueue)
defer client.DeleteQueue(ctx, replyQueue)
// Send request with correlation ID
client.Enqueue(ctx, "requests",
map[string]string{
"correlation_id": correlationID,
"reply_to": replyQueue,
},
[]byte("what is 2+2?"))
// Wait for reply
stream, _ := client.Consume(ctx, replyQueue)
reply, _ := stream.Recv()
fmt.Printf("reply: %s\n", reply.Payload)
client.Ack(ctx, replyQueue, reply.ID)
}
// service.go — processes requests, sends replies
package main
import (
"context"
"fmt"
"log"
fila "github.com/faiscadev/fila-go"
)
func main() {
client, _ := fila.Connect("localhost:5555")
ctx := context.Background()
stream, _ := client.Consume(ctx, "requests")
for {
msg, err := stream.Recv()
if err != nil {
log.Fatal(err)
}
// Process request
answer := fmt.Sprintf("answer: 4 (to: %s)", string(msg.Payload))
// Send reply to the caller's reply queue
replyTo := msg.Headers["reply_to"]
client.Enqueue(ctx, replyTo,
map[string]string{
"correlation_id": msg.Headers["correlation_id"],
},
[]byte(answer))
client.Ack(ctx, "requests", msg.ID)
}
}
Configuration Reference
Fila reads configuration from a TOML file. It searches for:
fila.tomlin the current working directory/etc/fila/fila.toml
If no file is found, all defaults are used. The broker runs with zero configuration.
Environment variables
| Variable | Default | Description |
|---|---|---|
FILA_DATA_DIR | data | Path to the RocksDB data directory |
FILA_FIBP_PORT_FILE | (none) | When set, the server writes the actual FIBP listen address to this file after binding (useful for test harnesses with port 0) |
Full configuration
[server]
listen_addr = "0.0.0.0:5555" # FIBP listen address
[scheduler]
command_channel_capacity = 10000 # internal command channel buffer size
idle_timeout_ms = 100 # scheduler idle timeout between rounds (ms)
quantum = 1000 # DRR quantum per fairness key per round
[lua]
default_timeout_ms = 10 # max script execution time (ms)
default_memory_limit_bytes = 1048576 # max memory per script (1 MB)
circuit_breaker_threshold = 3 # consecutive failures before circuit break
circuit_breaker_cooldown_ms = 10000 # cooldown period after circuit break (ms)
[telemetry]
otlp_endpoint = "http://localhost:4317" # OTLP endpoint (omit to disable)
service_name = "fila" # OTel service name
metrics_interval_ms = 10000 # metrics export interval (ms)
# Uncomment to enable the web management GUI
# [gui]
# listen_addr = "0.0.0.0:8080" # HTTP port for the dashboard
# FIBP transport tuning (defaults are optimized — override only if needed)
# [fibp]
# listen_addr = "0.0.0.0:5555" # TCP listen address
# max_frame_size = 16777216 # max frame size in bytes (16 MB)
# keepalive_interval_secs = 15 # heartbeat ping interval
# keepalive_timeout_secs = 10 # timeout waiting for pong
Section reference
[server]
| Key | Type | Default | Description |
|---|---|---|---|
listen_addr | string | "0.0.0.0:5555" | Address and port for the FIBP server |
[scheduler]
| Key | Type | Default | Description |
|---|---|---|---|
command_channel_capacity | integer | 10000 | Size of the bounded channel between FIBP connection handlers and the scheduler loop. Increase if you see backpressure under high load. |
idle_timeout_ms | integer | 100 | How long the scheduler waits when there’s no work before checking again. Lower values reduce latency at the cost of CPU. |
quantum | integer | 1000 | DRR quantum. Each fairness key gets weight * quantum deficit per scheduling round. Higher values mean more messages delivered per key per round (coarser interleaving). |
[lua]
| Key | Type | Default | Description |
|---|---|---|---|
default_timeout_ms | integer | 10 | Maximum execution time for Lua scripts. Enforced via instruction count hook (approximate). |
default_memory_limit_bytes | integer | 1048576 (1 MB) | Maximum memory a Lua script can allocate. |
circuit_breaker_threshold | integer | 3 | Number of consecutive Lua execution failures before the circuit breaker trips. When tripped, Lua hooks are bypassed and default scheduling is used. |
circuit_breaker_cooldown_ms | integer | 10000 | How long to wait after circuit breaker trips before retrying Lua execution. |
[telemetry]
Telemetry export is optional. When otlp_endpoint is omitted, the broker uses plain tracing-subscriber logging only.
| Key | Type | Default | Description |
|---|---|---|---|
otlp_endpoint | string | (none) | OTLP endpoint for exporting traces and metrics. Example: "http://localhost:4317". |
service_name | string | "fila" | Service name reported in OTel traces and metrics. |
metrics_interval_ms | integer | 10000 | How often metrics are exported to the OTLP endpoint. |
[gui]
Web management GUI. Disabled by default. When enabled, serves a read-only dashboard on a separate HTTP port.
| Key | Type | Default | Description |
|---|---|---|---|
listen_addr | string | "0.0.0.0:8080" | Address and port for the web dashboard HTTP server |
[fibp]
FIBP (Fila Binary Protocol) transport configuration. FIBP is Fila’s binary TCP protocol, using length-prefixed frames with a 6-byte header (flags, op code, correlation ID).
When [tls] is configured, FIBP connections are TLS-wrapped. When [auth] is configured, FIBP clients must send an OP_AUTH frame (containing the API key) as the first frame after handshake. Per-queue ACL checks apply to all data and admin operations.
FIBP supports all admin operations (create/delete queue, list queues, queue stats, redrive) using protobuf-encoded payloads for schema evolution.
| Key | Type | Default | Description |
|---|---|---|---|
listen_addr | string | "0.0.0.0:5557" | TCP address for the FIBP transport |
max_frame_size | integer | 16777216 (16 MB) | Maximum frame size in bytes. Frames exceeding this limit are rejected. |
keepalive_interval_secs | integer | 15 | Interval between keepalive heartbeat pings. |
keepalive_timeout_secs | integer | 10 | Timeout waiting for a keepalive pong before closing the connection. |
SDK connection
The Rust SDK connects via FIBP:
#![allow(unused)]
fn main() {
let client = FilaClient::connect("127.0.0.1:5555").await?;
// With TLS and API key
let opts = ConnectOptions::new("127.0.0.1:5555")
.with_tls()
.with_api_key("my-key");
let client = FilaClient::connect_with_options(opts).await?;
}
Notes:
- The address is a raw TCP address (not a URL with
http://). - TLS and API key options are set on
ConnectOptions. - The FIBP wire format supports multi-message enqueue in a single frame.
OpenTelemetry metrics
When telemetry is enabled, Fila exports the following metrics:
| Metric | Type | Description |
|---|---|---|
fila.messages.enqueued | Counter | Messages enqueued |
fila.messages.delivered | Counter | Messages delivered to consumers |
fila.messages.acked | Counter | Messages acknowledged |
fila.messages.nacked | Counter | Messages rejected |
fila.messages.expired | Counter | Messages expired (visibility timeout) |
fila.messages.dead_lettered | Counter | Messages moved to DLQ |
fila.messages.redriven | Counter | Messages redriven from DLQ |
fila.queue.depth | Gauge | Pending messages per queue |
fila.queue.in_flight | Gauge | Leased messages per queue |
fila.queue.consumers | Gauge | Active consumers per queue |
fila.queue.fairness_keys | Gauge | Active fairness keys per queue |
fila.delivery.latency | Histogram | Time from enqueue to consumer delivery |
fila.lua.executions | Counter | Lua script executions (by hook type and outcome) |
fila.throttle.limited | Counter | Messages held due to throttle limits |
All queue-scoped metrics include a queue attribute.
Deployment Guide
This guide covers deploying Fila in production environments.
Docker (single node)
The quickest way to run Fila:
docker run -d \
--name fila \
-p 5555:5555 \
-v fila-data:/var/lib/fila \
ghcr.io/faiscadev/fila:latest
With a custom config file:
docker run -d \
--name fila \
-p 5555:5555 \
-v fila-data:/var/lib/fila \
-v ./fila.toml:/etc/fila/fila.toml:ro \
ghcr.io/faiscadev/fila:latest \
fila-server --config /etc/fila/fila.toml
systemd
For bare-metal or VM deployments on Linux.
1. Install the binary
curl -fsSL https://raw.githubusercontent.com/faiscadev/fila/main/install.sh -o install.sh
bash install.sh
2. Create system user and directories
sudo useradd --system --no-create-home --shell /usr/sbin/nologin fila
sudo mkdir -p /etc/fila /var/lib/fila
sudo chown fila:fila /var/lib/fila
3. Create configuration
Copy the example config and customize:
sudo cp deploy/fila.toml /etc/fila/fila.toml
sudo editor /etc/fila/fila.toml
See Configuration Reference for all options.
4. Install and start the service
sudo cp deploy/fila.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now fila
sudo systemctl status fila
5. Verify
fila queue create test-queue
fila queue inspect test-queue
Docker Compose cluster
Run a 3-node cluster locally using Docker Compose.
1. Start the cluster
docker compose -f deploy/docker-compose.cluster.yml up -d
This starts three Fila nodes:
- node1 — bootstrap node, port 5555
- node2 — port 5565
- node3 — port 5575
2. Create a queue and test
# Connect to any node
fila --addr localhost:5555 queue create orders
# Enqueue via node1, consume via node2 (transparent routing)
fila --addr localhost:5555 enqueue orders --payload "hello"
fila --addr localhost:5565 queue inspect orders
3. Stop the cluster
docker compose -f deploy/docker-compose.cluster.yml down
# Add -v to also remove data volumes
Kubernetes with Helm
Single-node deployment
helm install fila deploy/helm/fila/
Clustered deployment
helm install fila deploy/helm/fila/ \
--set replicaCount=3 \
--set cluster.enabled=true \
--set cluster.replicationFactor=3 \
--set persistence.enabled=true \
--set persistence.size=20Gi
With TLS and authentication
# Create TLS secret first
kubectl create secret tls fila-tls \
--cert=server.crt --key=server.key
helm install fila deploy/helm/fila/ \
--set tls.enabled=true \
--set tls.certSecret=fila-tls \
--set auth.bootstrapApiKey=your-secret-key
With OpenTelemetry
helm install fila deploy/helm/fila/ \
--set telemetry.enabled=true \
--set telemetry.otlpEndpoint=http://otel-collector:4317
Custom values file
Create a my-values.yaml:
replicaCount: 3
cluster:
enabled: true
replicationFactor: 3
persistence:
enabled: true
size: 50Gi
storageClass: gp3
tls:
enabled: true
certSecret: fila-tls
auth:
bootstrapApiKey: "change-me-in-production"
telemetry:
enabled: true
otlpEndpoint: "http://otel-collector.monitoring:4317"
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
helm install fila deploy/helm/fila/ -f my-values.yaml
Production checklist
Before going to production, verify these items:
- Data persistence — data directory is on a persistent volume (not ephemeral container storage)
- TLS enabled — all client connections encrypted; consider mTLS for service-to-service
- API key auth — bootstrap API key set; per-queue ACLs configured for multi-tenant
- Backups — data directory backup strategy in place (RocksDB is crash-consistent, point-in-time snapshots work)
- Monitoring — OTel exporter configured, scraping endpoint or collector receiving metrics
- Log level — set
RUST_LOG=info(default); usedebugonly for troubleshooting - File descriptors —
LimitNOFILE=65536or higher for high-connection workloads - Cluster sizing — for HA, run 3+ nodes with
replication_factor = 3 - Resource limits — set CPU/memory limits appropriate for workload
- Lua safety — review script timeout and memory limits per queue
Cluster Scaling Benchmark Methodology
This document describes how to measure Fila’s horizontal scaling characteristics
using the fila-bench harness against a multi-node cluster.
Prerequisites
- Fila server binary (3 copies, or 1 binary run 3 times with different configs)
fila-benchbinary (fromcrates/fila-bench)filaCLI binary (fromcrates/fila-cli)
Setting Up a 3-Node Local Cluster
Create three configuration files, one per node.
node1.toml
[server]
listen_addr = "127.0.0.1:5555"
[cluster]
enabled = true
node_id = 1
bind_addr = "127.0.0.1:5556"
bootstrap = true
peers = []
node2.toml
[server]
listen_addr = "127.0.0.1:5565"
[cluster]
enabled = true
node_id = 2
bind_addr = "127.0.0.1:5566"
bootstrap = false
peers = ["127.0.0.1:5556"]
node3.toml
[server]
listen_addr = "127.0.0.1:5575"
[cluster]
enabled = true
node_id = 3
bind_addr = "127.0.0.1:5576"
bootstrap = false
peers = ["127.0.0.1:5556"]
Starting the Cluster
# Terminal 1
fila-server --config node1.toml
# Terminal 2
fila-server --config node2.toml
# Terminal 3
fila-server --config node3.toml
Verify the cluster is healthy:
fila --addr 127.0.0.1:5555 queue list
# Should show "Cluster nodes: 3" at the bottom
Queue-to-Node Assignment
When a queue is created in cluster mode, Fila automatically distributes queue leadership across nodes for balanced load:
- Preferred leader selection: each new queue’s initial leader is the node with the fewest current queue leaderships (tie-break: lowest node ID).
- Node subset selection: when the cluster has more nodes than
replication_factor(default: 3), only the N least-loaded nodes participate in each queue’s Raft group. This spreads I/O across more nodes. - No manual placement: operators do not need to specify which nodes a queue runs on. The cluster handles assignment automatically.
Configure replication_factor in the [cluster] section:
[cluster]
enabled = true
node_id = 1
replication_factor = 3 # default: 3 nodes per queue Raft group
Inspect queue leadership via fila queue inspect <name> — the “Raft leader”
line shows which node currently leads each queue.
Running Scaling Benchmarks
Step 1: Baseline (Single Node)
Run fila-bench against a single standalone node to establish baseline throughput:
fila-bench --addr 127.0.0.1:5555 --category enqueue_throughput
fila-bench --addr 127.0.0.1:5555 --category consume_throughput
fila-bench --addr 127.0.0.1:5555 --category enqueue_consume_mixed
Record the messages/second for each category.
Step 2: Cluster Throughput
Create queues distributed across the cluster, then run the same benchmarks:
# Create test queues (they'll be replicated across all 3 nodes)
fila --addr 127.0.0.1:5555 queue create bench-q1
fila --addr 127.0.0.1:5555 queue create bench-q2
fila --addr 127.0.0.1:5555 queue create bench-q3
# Check leadership distribution
fila --addr 127.0.0.1:5555 queue list
Run benchmarks targeting different nodes to exercise the full cluster:
# Enqueue through node 1 (may forward to queue leaders on other nodes)
fila-bench --addr 127.0.0.1:5555 --category enqueue_throughput
# Enqueue through node 2
fila-bench --addr 127.0.0.1:5565 --category enqueue_throughput
# Mixed workload across all nodes (run in parallel)
fila-bench --addr 127.0.0.1:5555 --category enqueue_consume_mixed &
fila-bench --addr 127.0.0.1:5565 --category enqueue_consume_mixed &
fila-bench --addr 127.0.0.1:5575 --category enqueue_consume_mixed &
wait
Step 3: Measure Scaling Factor
Compare cluster throughput to single-node baseline:
scaling_factor = cluster_total_msgs_per_sec / single_node_msgs_per_sec
For a well-distributed workload across 3 queues on 3 nodes, the ideal scaling factor is 3.0x. In practice, expect 2.0x-2.5x due to Raft consensus overhead (log replication, leader forwarding).
What to Measure
| Metric | How | Tool |
|---|---|---|
| Enqueue throughput (msg/s) | fila-bench --category enqueue_throughput | fila-bench |
| Consume throughput (msg/s) | fila-bench --category consume_throughput | fila-bench |
| P99 enqueue latency | fila-bench --category enqueue_latency | fila-bench |
| Raft consensus overhead | Compare single-node vs cluster latency | fila-bench |
| Leadership distribution | fila queue list LEADER column | fila CLI |
| Failover time | Kill leader, measure time to new leader election | manual |
Interpreting Results
- Linear scaling means the scaling factor approaches N (number of nodes). Fila achieves this when queues are distributed across different leaders.
- Sub-linear scaling is expected for single-queue workloads since all writes go through one Raft leader regardless of cluster size.
- Raft overhead adds ~1-3ms per write (log replication to majority). This is the cost of durability and fault tolerance.
- Forwarding overhead adds latency when a client connects to a non-leader node. The request is transparently forwarded to the leader via the internal cluster protocol.
Notes
- The
fila-benchharness runs single-node benchmarks by default. To benchmark a cluster, point it at any cluster node’s client address. - Queue creation in cluster mode goes through the meta Raft group. All nodes in the cluster will create the queue locally.
- Consumer streams are served by the queue’s Raft leader. If the leader changes (failover), consumers automatically reconnect to the new leader.
Troubleshooting
Common issues and their solutions.
Connection refused
Symptom: transport error: connection refused or failed to connect
Causes:
- Broker not running. Start it:
fila-serverordocker run -p 5555:5555 ghcr.io/faiscadev/fila:latest - Wrong address. Default is
localhost:5555. Check your connection string. - Firewall blocking port 5555. Verify:
nc -zv localhost 5555 - In Docker: use
host.docker.internal:5555from containers, notlocalhost
TLS errors
Symptom: tls handshake error, certificate verify failed, or unknown ca
Solutions:
- Self-signed cert: Provide the CA certificate to the SDK (
with_tls_ca("ca.crt")) - System trust store: Use
with_tls()(no CA path) if the broker’s cert is issued by a public CA - Expired cert: Check cert expiry:
openssl x509 -in server.crt -noout -enddate - Wrong hostname: The cert’s CN/SAN must match the hostname you’re connecting to
- mTLS required: If the broker requires client certs, provide both client cert and key
Authentication failures
Symptom: UNAUTHENTICATED error status
Solutions:
- Missing API key: Set
api_key/with_api_key()on your client connection - Wrong key: Verify the key matches what’s configured on the broker
- Key revoked: Check if the key was deleted. List keys:
fila auth list - Permission denied (PERMISSION_DENIED): The key exists but lacks permission for the queue. Check ACLs:
fila auth acl list
Queue not found
Symptom: NOT_FOUND error status on enqueue or consume
Solutions:
- Queue doesn’t exist. Create it:
fila queue create <name> - Typo in queue name. List queues:
fila queue list - In clustered mode: if you get
UNAVAILABLEinstead ofNOT_FOUND, the node is still starting up — retry after a moment
Consumer timeout
Symptom: Consumer stream hangs with no messages delivered
Causes:
- Queue is empty. Check depth:
fila queue inspect <name> - All messages are leased to other consumers. Check pending count in queue stats.
- Visibility timeout expired and messages were re-queued. Increase timeout or process faster.
- Throttle rate limit hit. Check throttle config:
fila config list
Message redelivery
Symptom: Same message delivered multiple times
Causes:
- Visibility timeout expiry: If a consumer takes too long to ack, the message is re-delivered. Solution: ack promptly, or increase the visibility timeout.
- Nack without DLQ: If on_failure returns
{ action = "retry" }, the message goes back to the queue. After enough retries, consider dead-lettering. - Consumer crash: Leased messages are re-delivered after visibility timeout. This is by design — at-least-once delivery.
Best practice: Make consumers idempotent. Use msg.id as a deduplication key.
Lua script errors
Symptom: Warning logs about script failures, circuit breaker tripping
Solutions:
- Syntax error: Validate before setting:
fila queue create test --on-enqueue 'function on_enqueue(msg) return {} end' - Runtime error: Check logs for the specific error. Common: accessing nil headers, type mismatches.
- Timeout: Script took too long. Default limit is 10ms. Simplify the script or increase
lua.default_timeout_ms. - Memory limit: Script uses too much memory. Default is 1MB. Check for unbounded table growth.
- Circuit breaker: After 3 consecutive failures, scripts are bypassed for 10 seconds. Fix the script to reset the breaker.
High memory usage
Solutions:
- Check queue depths:
fila queue inspect <name>. Large queues consume memory. - RocksDB block cache can be tuned via configuration.
- In clustered mode, each node replicates data for its assigned queues.
Cluster issues
Symptom: Node not joining cluster, split-brain, leader election failures
Solutions:
- Node not joining: Check
cluster.peersconfig — all nodes must list each other’s cluster addresses (port 5556, not 5555). - Bootstrap: Exactly one node should have
bootstrap = true. Start it first. - Network: Verify nodes can reach each other on the cluster port:
nc -zv node2 5556 - Node ID: Each node must have a unique
node_id. Duplicates cause undefined behavior.
API Reference
Fila exposes two service groups on the same port (default 5555) via FIBP (Fila Binary Protocol). Protobuf message definitions are in proto/fila/v1/.
Hot-path service (fila.v1.FilaService)
Used by producers and consumers for message operations.
Enqueue
Enqueue one or more messages. Single-message enqueue is a batch of one.
rpc Enqueue(EnqueueRequest) returns (EnqueueResponse)
Request (EnqueueRequest):
| Field | Type | Description |
|---|---|---|
messages | repeated EnqueueMessage | One or more messages to enqueue |
EnqueueMessage:
| Field | Type | Description |
|---|---|---|
queue | string | Queue name |
headers | map<string, string> | Arbitrary key-value headers (accessible in Lua hooks) |
payload | bytes | Message body |
Response (EnqueueResponse):
| Field | Type | Description |
|---|---|---|
results | repeated EnqueueResult | One result per input message (same order) |
EnqueueResult:
| Field | Type | Description |
|---|---|---|
message_id | string | UUID assigned (on success) |
error | EnqueueError | Error details (on failure) |
Only one of message_id or error is set (protobuf oneof).
EnqueueErrorCode:
| Code | Description |
|---|---|
ENQUEUE_ERROR_CODE_QUEUE_NOT_FOUND | Queue does not exist |
ENQUEUE_ERROR_CODE_STORAGE | Storage layer error |
ENQUEUE_ERROR_CODE_LUA | Lua hook rejected the message |
ENQUEUE_ERROR_CODE_PERMISSION_DENIED | Caller lacks permission |
StreamEnqueue
Bidirectional streaming enqueue with sequence tracking. The client sends batches of messages on the request stream and receives per-batch results on the response stream. Sequence numbers allow the client to correlate responses with requests.
rpc StreamEnqueue(stream StreamEnqueueRequest) returns (stream StreamEnqueueResponse)
Request (stream, StreamEnqueueRequest):
| Field | Type | Description |
|---|---|---|
messages | repeated EnqueueMessage | Messages to enqueue in this batch |
sequence_number | uint64 | Client-assigned sequence number for correlation |
Response (stream, StreamEnqueueResponse):
| Field | Type | Description |
|---|---|---|
sequence_number | uint64 | Echoed sequence number from the request |
results | repeated EnqueueResult | One result per input message |
Consume
Open a server-streaming connection to receive messages. The broker delivers messages according to the DRR scheduler, respecting fairness groups and throttle limits.
rpc Consume(ConsumeRequest) returns (stream ConsumeResponse)
Request:
| Field | Type | Description |
|---|---|---|
queue | string | Queue name to consume from |
Response (stream, ConsumeResponse):
| Field | Type | Description |
|---|---|---|
messages | repeated Message | One or more delivered messages (see Message below) |
The stream stays open until the client disconnects. Messages are delivered as they become available — the stream blocks when no messages are ready.
Errors:
| Error Status | Condition |
|---|---|
NOT_FOUND | Queue does not exist |
Ack
Acknowledge one or more messages. Removes acknowledged messages from the broker.
rpc Ack(AckRequest) returns (AckResponse)
Request (AckRequest):
| Field | Type | Description |
|---|---|---|
messages | repeated AckMessage | One or more messages to acknowledge |
AckMessage:
| Field | Type | Description |
|---|---|---|
queue | string | Queue name |
message_id | string | ID of the message to acknowledge |
Response (AckResponse):
| Field | Type | Description |
|---|---|---|
results | repeated AckResult | One result per input message |
AckResult:
| Field | Type | Description |
|---|---|---|
success | AckSuccess | Empty message (on success) |
error | AckError | Error details (on failure) |
Only one of success or error is set (protobuf oneof).
AckErrorCode:
| Code | Description |
|---|---|
ACK_ERROR_CODE_MESSAGE_NOT_FOUND | Message does not exist or is not leased |
ACK_ERROR_CODE_STORAGE | Storage layer error |
ACK_ERROR_CODE_PERMISSION_DENIED | Caller lacks permission |
Nack
Reject one or more messages. Triggers the on_failure Lua hook (if configured) to decide retry vs. dead-letter.
rpc Nack(NackRequest) returns (NackResponse)
Request (NackRequest):
| Field | Type | Description |
|---|---|---|
messages | repeated NackMessage | One or more messages to reject |
NackMessage:
| Field | Type | Description |
|---|---|---|
queue | string | Queue name |
message_id | string | ID of the message to reject |
error | string | Error description (passed to on_failure hook as msg.error) |
Response (NackResponse):
| Field | Type | Description |
|---|---|---|
results | repeated NackResult | One result per input message |
NackResult:
| Field | Type | Description |
|---|---|---|
success | NackSuccess | Empty message (on success) |
error | NackError | Error details (on failure) |
Only one of success or error is set (protobuf oneof).
NackErrorCode:
| Code | Description |
|---|---|
NACK_ERROR_CODE_MESSAGE_NOT_FOUND | Message does not exist or is not leased |
NACK_ERROR_CODE_STORAGE | Storage layer error |
NACK_ERROR_CODE_PERMISSION_DENIED | Caller lacks permission |
Admin service (fila.v1.FilaAdmin)
Used by operators and the fila CLI for queue management, configuration, and diagnostics.
CreateQueue
Create a new queue with optional Lua hooks and visibility timeout.
rpc CreateQueue(CreateQueueRequest) returns (CreateQueueResponse)
Request:
| Field | Type | Description |
|---|---|---|
name | string | Queue name |
config | QueueConfig | Optional configuration (see below) |
QueueConfig:
| Field | Type | Default | Description |
|---|---|---|---|
on_enqueue_script | string | (none) | Lua script run on every enqueue |
on_failure_script | string | (none) | Lua script run on every nack |
visibility_timeout_ms | uint64 | 30000 | Lease duration in milliseconds |
Response:
| Field | Type | Description |
|---|---|---|
queue_id | string | Queue identifier |
Errors:
| Error Status | Condition |
|---|---|
ALREADY_EXISTS | Queue with that name already exists |
DeleteQueue
Delete a queue and all its messages.
rpc DeleteQueue(DeleteQueueRequest) returns (DeleteQueueResponse)
Request:
| Field | Type | Description |
|---|---|---|
queue | string | Queue name |
Errors:
| Error Status | Condition |
|---|---|
NOT_FOUND | Queue does not exist |
ListQueues
List all queues with summary statistics.
rpc ListQueues(ListQueuesRequest) returns (ListQueuesResponse)
Response:
| Field | Type | Description |
|---|---|---|
queues | repeated QueueInfo | List of queues |
QueueInfo:
| Field | Type | Description |
|---|---|---|
name | string | Queue name |
depth | uint64 | Number of pending messages |
in_flight | uint64 | Number of leased (in-flight) messages |
active_consumers | uint32 | Number of connected consumers |
SetConfig
Set a runtime configuration key-value pair. Persisted across restarts.
rpc SetConfig(SetConfigRequest) returns (SetConfigResponse)
Request:
| Field | Type | Description |
|---|---|---|
key | string | Configuration key |
value | string | Configuration value |
GetConfig
Retrieve a configuration value by key.
rpc GetConfig(GetConfigRequest) returns (GetConfigResponse)
Request:
| Field | Type | Description |
|---|---|---|
key | string | Configuration key |
Response:
| Field | Type | Description |
|---|---|---|
value | string | Configuration value |
Errors:
| Error Status | Condition |
|---|---|
NOT_FOUND | Key does not exist |
ListConfig
List configuration entries, optionally filtered by prefix.
rpc ListConfig(ListConfigRequest) returns (ListConfigResponse)
Request:
| Field | Type | Description |
|---|---|---|
prefix | string | Filter entries by key prefix (empty = all) |
Response:
| Field | Type | Description |
|---|---|---|
entries | repeated ConfigEntry | Key-value pairs |
total_count | uint32 | Total number of matching entries |
ConfigEntry:
| Field | Type | Description |
|---|---|---|
key | string | Configuration key |
value | string | Configuration value |
GetStats
Get detailed statistics for a queue, including per-fairness-key and per-throttle-key breakdowns.
rpc GetStats(GetStatsRequest) returns (GetStatsResponse)
Request:
| Field | Type | Description |
|---|---|---|
queue | string | Queue name |
Response:
| Field | Type | Description |
|---|---|---|
depth | uint64 | Total pending messages |
in_flight | uint64 | Messages currently leased |
active_fairness_keys | uint64 | Number of fairness keys with pending messages |
active_consumers | uint32 | Connected consumers |
quantum | uint32 | DRR quantum value |
per_key_stats | repeated PerFairnessKeyStats | Per-fairness-key breakdown |
per_throttle_stats | repeated PerThrottleKeyStats | Per-throttle-key breakdown |
PerFairnessKeyStats:
| Field | Type | Description |
|---|---|---|
key | string | Fairness key |
pending_count | uint64 | Pending messages for this key |
current_deficit | int64 | Current DRR deficit |
weight | uint32 | DRR weight |
PerThrottleKeyStats:
| Field | Type | Description |
|---|---|---|
key | string | Throttle key |
tokens | double | Current available tokens |
rate_per_second | double | Token refill rate |
burst | double | Maximum bucket capacity |
Errors:
| Error Status | Condition |
|---|---|
NOT_FOUND | Queue does not exist |
Redrive
Move pending messages from a dead letter queue back to the source queue.
rpc Redrive(RedriveRequest) returns (RedriveResponse)
Request:
| Field | Type | Description |
|---|---|---|
dlq_queue | string | DLQ name (e.g., orders.dlq) |
count | uint64 | Maximum number of messages to redrive |
Response:
| Field | Type | Description |
|---|---|---|
redriven | uint64 | Number of messages actually moved |
Only pending (non-leased) messages are redriven. Leased messages in the DLQ are skipped to avoid interfering with active consumers.
Errors:
| Error Status | Condition |
|---|---|
NOT_FOUND | DLQ does not exist |
Message types
Message
The core message envelope returned by Consume.
message Message {
string id = 1;
map<string, string> headers = 2;
bytes payload = 3;
MessageMetadata metadata = 4;
MessageTimestamps timestamps = 5;
}
| Field | Type | Description |
|---|---|---|
id | string | UUID assigned at enqueue |
headers | map<string, string> | Headers set by the producer |
payload | bytes | Message body |
metadata | MessageMetadata | Broker-assigned scheduling metadata |
timestamps | MessageTimestamps | Lifecycle timestamps |
MessageMetadata
Scheduling metadata assigned by the broker (via Lua on_enqueue or defaults).
message MessageMetadata {
string fairness_key = 1;
uint32 weight = 2;
repeated string throttle_keys = 3;
uint32 attempt_count = 4;
string queue_id = 5;
}
| Field | Type | Description |
|---|---|---|
fairness_key | string | DRR fairness group key |
weight | uint32 | DRR weight for this key |
throttle_keys | repeated string | Token bucket keys checked before delivery |
attempt_count | uint32 | Number of delivery attempts |
queue_id | string | Queue this message belongs to |
MessageTimestamps
message MessageTimestamps {
google.protobuf.Timestamp enqueued_at = 1;
google.protobuf.Timestamp leased_at = 2;
}
| Field | Type | Description |
|---|---|---|
enqueued_at | Timestamp | When the message was first enqueued |
leased_at | Timestamp | When the message was last delivered to a consumer |
Benchmarks
This page presents Fila’s benchmark results: self-benchmarks measuring single-node performance, and competitive comparisons against Kafka, RabbitMQ, and NATS.
Results from commit
1e5bb0eon 2026-03-26. Run benchmarks on your own hardware for results relevant to your environment. See Reproducing results for instructions.
Self-benchmarks
Self-benchmarks measure Fila’s single-node performance across throughput, latency, scheduling, and resource usage. The benchmark suite is in crates/fila-bench/ and uses the Fila SDK as a blackbox client against a real server instance.
Throughput
| Metric | Value | Unit |
|---|---|---|
| Enqueue throughput (1KB payload) | 5,051 | msg/s |
| Enqueue throughput (1KB payload) | 4.93 | MB/s |
Single producer, sustained over a 3-second measurement window after 1-second warmup.
End-to-end latency
Round-trip latency: produce a message, consume it, measure the interval. 100 samples per load level.
| Load level | Producers | p50 | p95 | p99 |
|---|---|---|---|---|
| Light | 1 | 0.00 ms | 0.00 ms | 0.00 ms |
Fair scheduling overhead
Compares throughput with DRR fair scheduling enabled vs plain FIFO delivery.
| Mode | Throughput (msg/s) |
|---|---|
| FIFO baseline | 1,307 |
| Fair scheduling (DRR) | 1,247 |
| Overhead | 3.1% |
The DRR scheduler adds minimal overhead compared to FIFO delivery (< 5% target).
Fairness accuracy
Messages enqueued across 5 fairness keys with weights 1:2:3:4:5. 2,000 messages per key (10,000 total), consuming a window of 5,000.
| Key | Weight | Expected share | Actual share | Deviation |
|---|---|---|---|---|
| tenant-1 | 1 | 6.7% | 6.7% | 0.2% |
| tenant-2 | 2 | 13.3% | 13.4% | 0.2% |
| tenant-3 | 3 | 20.0% | 20.0% | 0.1% |
| tenant-4 | 4 | 26.7% | 26.6% | 0.1% |
| tenant-5 | 5 | 33.3% | 33.3% | 0.1% |
The DRR scheduler distributes messages proportionally to weight within any delivery window. Max deviation is < 1%, well within the < 5% NFR target.
Lua script overhead
Measures per-message overhead of executing an on_enqueue Lua hook.
| Metric | Value | Unit |
|---|---|---|
| Throughput without Lua | 987 | msg/s |
Throughput with on_enqueue hook | 943 | msg/s |
| Per-message overhead | 31.7 | us |
The Lua hook adds < 6 us per-message overhead, well within the < 50 us NFR target.
Fairness key cardinality scaling
Scheduling throughput as the number of distinct fairness keys increases.
| Key count | Throughput (msg/s) |
|---|---|
| 10 | 1,479 |
| 1,000 | 818 |
| 10,000 | 509 |
Consumer concurrency scaling
Aggregate consume throughput with increasing concurrent consumer streams.
| Consumers | Throughput (msg/s) |
|---|---|
| 1 | 66 |
| 10 | 1,009 |
| 100 | 1,863 |
Memory footprint
| Metric | Value |
|---|---|
| RSS idle | 351 MB |
| RSS under load (10K messages) | 351 MB |
Memory usage is dominated by the RocksDB buffer pool, not message count.
RocksDB compaction impact
| Metric | p99 latency |
|---|---|
| Idle (no compaction) | 0.00 ms |
| Active compaction | 0.00 ms |
| Delta | < 0.39 ms |
Compaction has no measurable negative impact on tail latency in single-node benchmarks.
Batch benchmarks
Batch benchmarks measure throughput and latency of multi-message Enqueue (multiple EnqueueMessage items per EnqueueRequest) and compare it against single-message enqueue. These benchmarks are gated behind FILA_BENCH_BATCH=1 because they exercise batch-specific code paths and take additional time.
Enable with FILA_BENCH_BATCH=1:
FILA_BENCH_BATCH=1 cargo bench -p fila-bench --bench system
Multi-message enqueue throughput
Measures multi-message Enqueue throughput at various batch sizes with 1KB messages. Reports both messages/s and batches/s.
| Batch size | Throughput (msg/s) | Batches/s |
|---|---|---|
| 1 | — | — |
| 10 | — | — |
| 50 | — | — |
| 100 | — | — |
| 500 | — | — |
Batch size scaling
Measures throughput as a function of batch size (1 to 1000) to identify the point of diminishing returns.
| Batch size | Throughput (msg/s) |
|---|---|
| 1 | — |
| 5 | — |
| 10 | — |
| 25 | — |
| 50 | — |
| 100 | — |
| 250 | — |
| 500 | — |
| 1000 | — |
Auto-batching latency
Measures end-to-end latency (multi-message enqueue to consume) at various producer concurrency levels. Simulates client-side auto-batching by accumulating messages and flushing via the Enqueue RPC with 50 messages per request.
| Producers | p50 | p95 | p99 | p99.9 | p99.99 | max |
|---|---|---|---|---|---|---|
| 1 | — | — | — | — | — | — |
| 10 | — | — | — | — | — | — |
| 50 | — | — | — | — | — | — |
Batched vs unbatched comparison
Runs identical workloads (3,000 messages) with three approaches and reports throughput and speedup ratios.
| Mode | Throughput (msg/s) | Speedup |
|---|---|---|
| Unbatched | — | 1.0x |
| Explicit batch (size 100) | — | —x |
| Auto-batch (size 100) | — | —x |
Speedup ratios are computed relative to the unbatched baseline.
Delivery batching throughput
Measures consumer throughput with varying concurrent consumer counts. Messages are pre-loaded and continuously produced via multi-message Enqueue.
| Consumers | Throughput (msg/s) |
|---|---|
| 1 | — |
| 10 | — |
| 100 | — |
Concurrent producer batching
Measures aggregate throughput with multiple concurrent producers all using multi-message Enqueue (batch size 100).
| Producers | Throughput (msg/s) |
|---|---|
| 1 | — |
| 5 | — |
| 10 | — |
| 50 | — |
Subsystem benchmarks
Subsystem benchmarks isolate and measure each internal component independently, bypassing the full server stack. This helps identify where time is spent and which component dominates in different workloads.
Enable with FILA_BENCH_SUBSYSTEM=1:
FILA_BENCH_SUBSYSTEM=1 cargo bench -p fila-bench --bench system
RocksDB raw write throughput
Measures raw put_message throughput directly against RocksDB, bypassing scheduler, FIBP, and serialization. Isolates storage engine performance.
| Payload | Throughput (ops/s) | p50 latency | p99 latency |
|---|---|---|---|
| 1KB | — | — | — |
| 64KB | — | — | — |
Protobuf serialization throughput
Measures protobuf encode and decode throughput for EnqueueRequest and ConsumeResponse at three payload sizes. Isolates serialization overhead.
| Payload | Encode (MB/s) | Encode (ns/msg) | Decode (ns/msg) |
|---|---|---|---|
| 64B | — | — | — |
| 1KB | — | — | — |
| 64KB | — | — | — |
Reported for both EnqueueRequest (producer path) and ConsumeResponse (consumer path).
DRR scheduler throughput
Measures next_key() + consume_deficit() cycle throughput at varying active key counts. Isolates the scheduling algorithm from storage I/O.
| Active keys | Throughput (sel/s) |
|---|---|
| 10 | — |
| 1,000 | — |
| 10,000 | — |
FIBP round-trip overhead
Measures round-trip latency for a minimal (1-byte payload) Enqueue request. Quantifies the fixed per-call overhead of FIBP framing, separate from message processing.
| Metric | Value | Unit |
|---|---|---|
| p50 latency | — | us |
| p99 latency | — | us |
| p99.9 latency | — | us |
| Throughput | — | ops/s |
Lua execution throughput
Measures on_enqueue hook execution throughput for three script complexity levels, directly against the Lua VM (no server, no FIBP).
| Script | Throughput (exec/s) | p50 | p99 |
|---|---|---|---|
| No-op (return defaults) | — | — | — |
| Header-set (read 2 headers) | — | — | — |
| Complex routing (string ops, conditionals, table insert) | — | — | — |
Competitive comparison
Fila is compared against Kafka, RabbitMQ, and NATS on queue-oriented workloads. All brokers run in Docker containers and are benchmarked using native Rust clients via the bench-competitive binary. See Methodology for details.
How to run competitive benchmarks
cd bench/competitive
make bench-competitive
Results are written to bench/competitive/results/bench-{broker}.json.
Workloads
Each broker is tested with identical workloads using its recommended high-throughput configuration:
| Workload | Description | Batching |
|---|---|---|
| Throughput | Sustained message production rate (64B, 1KB, 64KB payloads) | Each broker’s recommended batching |
| Latency | Produce-consume round-trip (p50/p95/p99) | Unbatched |
| Lifecycle | Full enqueue-consume-ack cycle (1,000 messages) | Unbatched |
| Multi-producer | 3 concurrent producers aggregate throughput | Each broker’s recommended batching |
| Resources | CPU and memory during benchmark | — |
Broker configurations
| Broker | Version | Mode | Throughput batching |
|---|---|---|---|
| Fila | latest | Docker container, DRR scheduler | AccumulatorMode::Auto (4 concurrent producers) |
| Kafka | 3.9 | KRaft (no ZooKeeper), 1 partition | linger.ms=5, batch.num.messages=1000 |
| RabbitMQ | 3.13 | Quorum queues, durable, manual ack | Per-message (no client-side batching) |
| NATS | 2.11 | JetStream, file storage, pull-subscribe | Per-message (no client-side batching) |
All competitors use production-recommended settings, not development defaults. All brokers use native Rust client libraries (rdkafka, lapin, async-nats). Throughput scenarios use each broker’s recommended batching strategy for a fair comparison. Lifecycle scenarios are unbatched for all brokers.
Run make bench-competitive on your hardware to generate comparison tables.
Results
These are reference numbers from a single run. Your results will vary by hardware. All brokers run in Docker containers. Throughput uses each broker’s recommended batching; lifecycle is unbatched.
Throughput (messages/second, batched)
| Payload | Fila | Kafka | RabbitMQ | NATS |
|---|---|---|---|---|
| 64B | — | — | — | — |
| 1KB | — | — | — | — |
| 64KB | — | — | — | — |
Previous unbatched results (Fila 2,637 msg/s vs Kafka 143,278 msg/s at 1KB) were an unfair comparison: Kafka used
linger.ms=5batching while Fila sent 1 message per RPC. The updated benchmark uses each broker’s recommended batching.
End-to-end latency (1KB payload)
| Percentile | Fila | Kafka | RabbitMQ | NATS |
|---|---|---|---|---|
| p50 | 0.92 ms | 101.62 ms | 1.46 ms | 0.29 ms |
| p95 | 2.82 ms | 105.07 ms | 3.32 ms | 0.42 ms |
| p99 | 4.79 ms | 105.30 ms | 5.59 ms | 0.79 ms |
Lifecycle throughput (enqueue + consume + ack, 1KB, unbatched)
| Broker | msg/s |
|---|---|
| NATS | 25,763 |
| Fila | 2,724 |
| RabbitMQ | 658 |
| Kafka | 356 |
Multi-producer throughput (3 producers, 1KB)
| Broker | msg/s |
|---|---|
| Kafka | 186,708 |
| NATS | 150,676 |
| RabbitMQ | 63,660 |
| Fila | 6,769 |
Resource usage
| Broker | CPU | Memory |
|---|---|---|
| NATS | 1.3% | 12 MB |
| Kafka | 2.1% | 1,276 MB |
| Fila | 3.7% | 874 MB |
| RabbitMQ | 56.8% | 654 MB |
Methodology
Measurement parameters
| Parameter | Value |
|---|---|
| Warmup period | 1 second (discarded) |
| Measurement window | 3 seconds |
| Latency samples | 100 per level |
| Runs for CI regression | 3 (median) |
| Competitive runs | 1 (relative comparison) |
Limitations
- Single-node only. All brokers run as single instances. Clustering performance is not tested.
- No network latency. Brokers run on localhost. Real deployments have network overhead.
- Docker containers. All brokers run in Docker containers for a fair comparison.
- Hardware-specific. Results will vary on different hardware. Always include hardware specs when citing numbers.
Reproducing results
Self-benchmarks:
# Build and run the full benchmark suite
cargo bench -p fila-bench --bench system
# Results written to crates/fila-bench/bench-results.json
Competitive benchmarks:
cd bench/competitive
# Run all brokers
make bench-competitive
# Or individual brokers
make bench-kafka
make bench-rabbitmq
make bench-nats
make bench-fila
# Clean up Docker containers
make bench-clean
See bench/competitive/METHODOLOGY.md for complete methodology documentation including broker configuration details and justifications.
CI regression detection
The bench-regression GitHub Actions workflow runs on every push to main and on pull requests:
- Runs the self-benchmark suite 3 times, takes the median
- On
mainpushes: saves results as the baseline - On PRs: compares against the baseline and flags regressions exceeding the threshold (default: 10%)
- Results are uploaded as workflow artifacts for every run
Traceability
Results in this document are from commit 1e5bb0e (2026-03-26). Run cargo bench -p fila-bench --bench system to generate results for the current version. The JSON output includes the commit hash and timestamp for traceability.
Versioning & Compatibility Policy
Versioning
Fila uses semantic versioning (semver) for all releases:
- MAJOR — Breaking proto/API changes. Existing SDK versions may not work.
- MINOR — New features, backward compatible. Existing SDKs continue to work; new features require SDK update.
- PATCH — Bug fixes only. No behavior changes.
Proto Backward Compatibility
The fila.v1 proto package follows strict additive-only rules within a MAJOR version:
- New RPCs may be added
- New fields may be added to existing messages
- Existing fields are never removed, renamed, or retyped
- Field numbers are never reused
A MAJOR version bump (e.g., v1 → v2) would introduce a new proto package (fila.v2) and may remove deprecated RPCs or fields.
Deprecation Policy
- Features are deprecated with at least 1 MINOR version warning before removal
- Deprecated RPCs and fields are documented in release notes and proto comments
- Removal only occurs in a MAJOR version bump