Fila

A message broker that makes fair scheduling and per-key throttling first-class primitives.

The problem

Every existing broker delivers messages in FIFO order. When multiple tenants, customers, or workload types share a queue, a single noisy producer can starve everyone else. Rate limiting is pushed to the consumer — which means the consumer has to fetch a message, check the limit, and re-enqueue it. That wastes work and adds latency.

Fila moves scheduling decisions into the broker:

Deficit Round Robin (DRR) fair scheduling — each fairness key gets its fair share of delivery bandwidth. No tenant starves another.
Token bucket throttling — per-key rate limits enforced at the broker, before delivery. Consumers only receive messages that are ready to process.
Lua rules engine — on_enqueue and on_failure hooks let you define scheduling policy (assign fairness keys, set weights, decide retry vs. dead-letter) in user-supplied Lua scripts.
Zero wasted work — consumers never receive a message they can’t act on.

Quickstart

Docker

docker run -p 5555:5555 ghcr.io/faiscadev/fila:dev

Install script

curl -fsSL https://raw.githubusercontent.com/faiscadev/fila/main/install.sh -o install.sh
less install.sh
bash install.sh
fila-server

Cargo

cargo install fila-server fila-cli
fila-server

Try it out

Once the broker is running on localhost:5555:

# Create a queue
fila queue create orders

# Enqueue a message
fila enqueue orders --header tenant=acme --payload hello

# Check queue stats
fila queue inspect orders

Core Concepts

This document explains the key concepts behind Fila’s scheduling and message handling.

Message lifecycle

A message moves through these states:

Producer                      Broker                        Consumer
   |                            |                              |
   |-- Enqueue ----------------->|                              |
   |                            |-- on_enqueue (Lua) --------->|
   |                            |   assigns fairness_key,      |
   |                            |   weight, throttle_keys      |
   |                            |                              |
   |                            |-- Stored (pending) --------->|
   |                            |                              |
   |                            |-- DRR scheduler picks ------>|
   |                            |   checks throttle tokens     |
   |                            |                              |
   |                            |-- Consume (leased) -------->|-- Processing
   |                            |                              |
   |                            |<-------- Ack ----------------|  (success)
   |                            |   message deleted            |
   |                            |                              |
   |                            |<-------- Nack ---------------|  (failure)
   |                            |-- on_failure (Lua) --------->|
   |                            |   retry or dead-letter       |
   |                            |                              |
   |                            |-- Visibility timeout ------->|
   |                            |   re-enqueue if not acked    |

Enqueue — producer sends a message to a queue. If the queue has an on_enqueue Lua script, it runs to assign scheduling metadata.
Pending — the message is persisted in RocksDB and indexed by fairness key.
Scheduled — the DRR scheduler picks the next fairness key and checks throttle tokens. If tokens are available, the message is delivered to a waiting consumer.
Leased — the consumer is processing the message. A visibility timeout timer starts.
Acked — the consumer confirms success. The message is deleted.
Nacked — the consumer reports failure. The on_failure hook decides: retry (re-enqueue) or dead-letter.
Expired — if the visibility timeout fires before ack/nack, the message is automatically re-enqueued.

Fairness groups

Every message belongs to a fairness group identified by its fairness_key. The key is assigned during enqueue — either by an on_enqueue Lua script or defaulting to "default".

Common fairness key strategies:

Per-tenant: msg.headers["tenant_id"] — prevents one tenant from monopolizing the queue
Per-customer: msg.headers["customer_id"] — fair delivery across customers
Per-priority: msg.headers["priority"] — combined with weights for priority scheduling

Deficit Round Robin (DRR)

Fila uses the DRR algorithm to schedule delivery across fairness groups:

Each fairness key has a deficit counter (starts at 0) and a weight (default 1).
In each scheduling round, every key receives weight * quantum additional deficit.
The scheduler delivers messages from a key as long as its deficit is positive, decrementing by 1 per delivery.
When a key’s deficit reaches 0 or it has no pending messages, the scheduler moves to the next key.

Example: Two tenants with equal weight and quantum=1000. Each gets 1000 deficit per round — the scheduler delivers ~1000 messages from tenant A, then ~1000 from tenant B, then back to A. A noisy tenant sending 100x more messages doesn’t starve the quiet tenant.

Weights: A key with weight=3 gets 3x the deficit of a key with weight=1, so it receives ~3x the delivery bandwidth. Use weights for priority lanes.

Token bucket throttling

Fila supports per-key rate limiting via token bucket throttlers. Each throttle key has:

rate — tokens refilled per second
burst — maximum tokens the bucket can hold

When the scheduler is about to deliver a message, it checks all of the message’s throttle_keys. If any bucket is empty, the message is held until tokens refill. The consumer never receives a message it would have to reject for rate limiting.

Setting up throttle rates

Throttle rates are managed via runtime configuration:

# Allow 10 requests/second with burst of 20 for the "api" throttle key
fila config set throttle:api:rate 10
fila config set throttle:api:burst 20

Messages are assigned throttle keys in the on_enqueue Lua hook:

function on_enqueue(msg)
  return {
    fairness_key = msg.headers["tenant"],
    throttle_keys = { msg.headers["api_endpoint"] }
  }
end

Lua hooks

Fila embeds a Lua 5.4 runtime for user-defined scheduling policy. Scripts run inside a sandbox with configurable timeouts and memory limits.

on_enqueue

Runs when a message is enqueued. Returns scheduling metadata:

function on_enqueue(msg)
  -- msg.headers       — table of string key-value pairs
  -- msg.payload_size  — byte count of the payload
  -- msg.queue         — queue name

  return {
    fairness_key = msg.headers["tenant"] or "default",
    weight = tonumber(msg.headers["priority"]) or 1,
    throttle_keys = { msg.headers["endpoint"] }
  }
end

Return fields:

Field	Type	Default	Description
`fairness_key`	string	`"default"`	Groups the message for DRR scheduling
`weight`	number	`1`	DRR weight for this fairness key
`throttle_keys`	list of strings	`[]`	Token bucket keys to check before delivery

on_failure

Runs when a consumer nacks a message. Decides retry vs. dead-letter:

function on_failure(msg)
  -- msg.headers   — table of string key-value pairs
  -- msg.id        — message UUID
  -- msg.attempts  — current attempt count
  -- msg.queue     — queue name
  -- msg.error     — error description from the nack

  if msg.attempts >= 3 then
    return { action = "dlq" }
  end
  return { action = "retry", delay_ms = 1000 * msg.attempts }
end

Return fields:

Field	Type	Description
`action`	`"retry"` or `"dlq"`	Whether to re-enqueue or dead-letter
`delay_ms`	number (optional)	Delay before re-enqueue (retry only)

Lua API

Scripts can read runtime configuration from the broker:

local limit = fila.get("rate_limit:tenant_a")  -- returns string or nil

Safety

Setting	Default	Description
`lua.default_timeout_ms`	10	Max script execution time
`lua.default_memory_limit_bytes`	1 MB	Max memory per script
`lua.circuit_breaker_threshold`	3	Consecutive failures before circuit break
`lua.circuit_breaker_cooldown_ms`	10000	Cooldown period after circuit break

When the circuit breaker trips, Lua hooks are bypassed and messages use default scheduling (fairness_key="default", weight=1, no throttle keys). The circuit breaker resets automatically after the cooldown period.

Dead letter queue

Messages that exhaust retries (when on_failure returns { action = "dlq" }) are moved to a dead letter queue named <queue>.dlq. For example, messages dead-lettered from orders go to orders.dlq.

Inspecting and redriving

# Check how many messages are in the DLQ
fila queue inspect orders.dlq

# Move 10 messages back to the source queue
fila redrive orders.dlq --count 10

Redrive moves pending (non-leased) messages from the DLQ back to the original source queue, where they go through the normal enqueue flow again.

Runtime configuration

The broker maintains a key-value configuration store that persists across restarts. Values are accessible from Lua scripts via fila.get(key) and managed through the CLI or API.

fila config set feature:new_flow enabled
fila config get feature:new_flow
fila config list --prefix feature:

Common use cases:

Feature flags: toggle behavior in Lua scripts without redeployment
Throttle rates: throttle:<key>:rate and throttle:<key>:burst
Dynamic routing: change fairness key assignment logic based on config values

Visibility timeout

When a consumer receives a message via Consume, the message is “leased” for a configurable duration (set per-queue at creation time via visibility_timeout_ms). During this lease:

The message is not delivered to other consumers
A timer tracks the lease expiry

If the consumer does not Ack or Nack the message before the timeout expires, the message is automatically re-enqueued and becomes available for delivery again. This prevents messages from being lost when consumers crash.

The default visibility timeout is set per-queue at creation:

fila queue create orders --visibility-timeout 30000  # 30 seconds

Tutorials

Step-by-step guides for common Fila use cases. Each tutorial assumes you have a running broker (see quickstart).

Multi-tenant fair scheduling

Goal: Prevent a noisy tenant from starving other tenants in a shared queue.

1. Create a queue with tenant-aware fairness

fila queue create orders \
  --on-enqueue 'function on_enqueue(msg)
    return { fairness_key = msg.headers["tenant_id"] or "default" }
  end'

The on_enqueue hook extracts a tenant_id header and uses it as the fairness key. Each unique tenant gets its own DRR scheduling group.

2. Produce messages from multiple tenants

# Python SDK
from fila import FilaClient

client = FilaClient("localhost:5555")

# Noisy tenant sends 1000 messages
for i in range(1000):
    client.enqueue("orders", {"tenant_id": "noisy-corp"}, f"order-{i}")

# Other tenants send a few each
for tenant in ["acme", "globex", "initech"]:
    for i in range(10):
        client.enqueue("orders", {"tenant_id": tenant}, f"{tenant}-order-{i}")

3. Consume and observe fairness

stream = client.consume("orders")
for msg in stream:
    print(f"tenant={msg.metadata.fairness_key} id={msg.id}")
    client.ack("orders", msg.id)

Without Fila, all 1000 noisy-corp messages would be delivered first. With DRR scheduling, each tenant gets interleaved delivery — acme, globex, and initech messages arrive alongside noisy-corp’s, not after.

4. Verify with stats

fila queue inspect orders

The per-key breakdown shows each tenant’s pending count and current DRR deficit.

Weighted fairness

Give premium tenants more bandwidth by setting weights:

fila queue create orders \
  --on-enqueue 'function on_enqueue(msg)
    local weights = { premium = 3, standard = 1 }
    local tier = msg.headers["tier"] or "standard"
    return {
      fairness_key = msg.headers["tenant_id"] or "default",
      weight = weights[tier] or 1
    }
  end'

A premium tenant with weight=3 gets 3x the delivery bandwidth of a standard tenant with weight=1.

Per-provider throttling

Goal: Rate-limit outgoing API calls per external provider without wasting consumer resources.

1. Create a queue with throttle keys

fila queue create api-calls \
  --on-enqueue 'function on_enqueue(msg)
    local keys = {}
    if msg.headers["provider"] then
      table.insert(keys, "provider:" .. msg.headers["provider"])
    end
    return {
      fairness_key = msg.headers["tenant"] or "default",
      throttle_keys = keys
    }
  end'

2. Set throttle rates

# Stripe: 100 requests/second, burst up to 150
fila config set throttle.provider:stripe 100,150

# SendGrid: 10 requests/second, burst up to 20
fila config set throttle.provider:sendgrid 10,20

The format is rate,burst. Rate is tokens per second; burst is the maximum bucket capacity.

3. Produce messages

// Go SDK
client, _ := fila.Connect("localhost:5555")

// These will be throttled to 100/s
for i := 0; i < 500; i++ {
    client.Enqueue(ctx, "api-calls", map[string]string{
        "tenant":   "acme",
        "provider": "stripe",
    }, []byte(fmt.Sprintf("charge-%d", i)))
}

4. Consume — the broker does the throttling

stream, _ := client.Consume(ctx, "api-calls")
for msg := range stream {
    // Every message received is within the rate limit.
    // No need to check limits client-side.
    callExternalAPI(msg.Payload)
    client.Ack(ctx, "api-calls", msg.ID)
}

Consumers receive messages at the provider’s rate limit. No consumer-side rate checking, no wasted fetches, no re-enqueue loops.

Adjusting rates at runtime

Change rates without restarting the broker:

# Double Stripe's rate
fila config set throttle.provider:stripe 200,300

The token bucket updates immediately.

Exponential backoff retry

Goal: Retry failed messages with increasing delays, then dead-letter after max attempts.

1. Create a queue with retry logic

fila queue create jobs \
  --on-enqueue 'function on_enqueue(msg)
    return { fairness_key = msg.headers["job_type"] or "default" }
  end' \
  --on-failure 'function on_failure(msg)
    local max_attempts = tonumber(fila.get("max_retries") or "5")
    if msg.attempts >= max_attempts then
      return { action = "dlq" }
    end
    -- Exponential backoff: 1s, 2s, 4s, 8s, 16s...
    local delay = math.min(1000 * (2 ^ (msg.attempts - 1)), 60000)
    return { action = "retry", delay_ms = delay }
  end' \
  --visibility-timeout 30000

2. Configure max retries at runtime

fila config set max_retries 3

The on_failure hook reads this with fila.get("max_retries"). Change it without redeploying.

3. Process messages with failure handling

// JavaScript SDK
const { FilaClient } = require('fila-client');

async function main() {
  const client = await FilaClient.connect('localhost:5555');
  const stream = client.consume('jobs');

  for await (const msg of stream) {
    try {
      await processJob(msg.payload);
      await client.ack('jobs', msg.id);
    } catch (err) {
      // Nack triggers on_failure hook — broker handles retry/DLQ
      await client.nack('jobs', msg.id, err.message);
    }
  }
}

main().catch(console.error);

4. Monitor and redrive

# Check how many messages ended up in the DLQ
fila queue inspect jobs.dlq

# After fixing the root cause, redrive them
fila redrive jobs.dlq --count 0  # 0 = all messages

Customizing backoff per job type

Read config to customize behavior per job type:

function on_failure(msg)
  -- Different retry strategies per job type
  local job_type = msg.headers["job_type"] or "default"
  local max = tonumber(fila.get("max_retries:" .. job_type) or "5")

  if msg.attempts >= max then
    return { action = "dlq" }
  end

  -- Critical jobs: shorter delays, more retries
  -- Batch jobs: longer delays, fewer retries
  local base_ms = tonumber(fila.get("retry_base_ms:" .. job_type) or "1000")
  local delay = math.min(base_ms * (2 ^ (msg.attempts - 1)), 300000)
  return { action = "retry", delay_ms = delay }
end

fila config set max_retries:payment 10
fila config set retry_base_ms:payment 500

fila config set max_retries:report 3
fila config set retry_base_ms:report 5000

Lua Hook Patterns

Copy-paste patterns for common scheduling scenarios. See concepts for hook API details.

Tenant fairness

Assign each tenant its own fairness group so the DRR scheduler gives equal delivery bandwidth:

function on_enqueue(msg)
  return {
    fairness_key = msg.headers["tenant_id"] or "default"
  }
end

With weighted tiers

Premium tenants get more bandwidth:

function on_enqueue(msg)
  local tier = msg.headers["tier"] or "standard"
  local weight = 1
  if tier == "premium" then weight = 3 end
  if tier == "enterprise" then weight = 5 end

  return {
    fairness_key = msg.headers["tenant_id"] or "default",
    weight = weight
  }
end

With dynamic weights from config

function on_enqueue(msg)
  local tenant = msg.headers["tenant_id"] or "default"
  local weight = tonumber(fila.get("weight:" .. tenant) or "1")

  return {
    fairness_key = tenant,
    weight = weight
  }
end

Set weights at runtime: fila config set weight:acme 5

Provider throttling

Rate-limit outbound calls per external API provider:

function on_enqueue(msg)
  local keys = {}
  if msg.headers["provider"] then
    table.insert(keys, "provider:" .. msg.headers["provider"])
  end

  return {
    fairness_key = msg.headers["tenant"] or "default",
    throttle_keys = keys
  }
end

Set rates: fila config set throttle.provider:stripe 100,200

Multi-dimensional throttling

Throttle by both provider and tenant (composite key):

function on_enqueue(msg)
  local tenant = msg.headers["tenant"] or "default"
  local provider = msg.headers["provider"]

  local keys = {}
  if provider then
    -- Global provider limit
    table.insert(keys, "provider:" .. provider)
    -- Per-tenant-per-provider limit
    table.insert(keys, "tenant-provider:" .. tenant .. ":" .. provider)
  end

  return {
    fairness_key = tenant,
    throttle_keys = keys
  }
end

# Global: Stripe allows 1000 req/s total
fila config set throttle.provider:stripe 1000,1500

# Per-tenant: each tenant gets at most 100 req/s to Stripe
fila config set throttle.tenant-provider:acme:stripe 100,150
fila config set throttle.tenant-provider:globex:stripe 100,150

Exponential backoff retry

Retry with increasing delays, dead-letter after max attempts:

function on_failure(msg)
  if msg.attempts >= 5 then
    return { action = "dlq" }
  end

  -- 1s, 2s, 4s, 8s, 16s
  local delay = math.min(1000 * (2 ^ (msg.attempts - 1)), 60000)
  return { action = "retry", delay_ms = delay }
end

With configurable max retries

function on_failure(msg)
  local max = tonumber(fila.get("max_retries") or "5")
  if msg.attempts >= max then
    return { action = "dlq" }
  end

  local delay = math.min(1000 * (2 ^ (msg.attempts - 1)), 60000)
  return { action = "retry", delay_ms = delay }
end

Change at runtime: fila config set max_retries 10

Linear backoff

function on_failure(msg)
  if msg.attempts >= 5 then
    return { action = "dlq" }
  end

  -- 5s, 10s, 15s, 20s, 25s
  return { action = "retry", delay_ms = 5000 * msg.attempts }
end

Immediate retry (no delay)

function on_failure(msg)
  if msg.attempts >= 3 then
    return { action = "dlq" }
  end
  return { action = "retry", delay_ms = 0 }
end

Header-based routing

Use headers to make dynamic scheduling decisions.

Route by priority

function on_enqueue(msg)
  local priority = msg.headers["priority"] or "normal"
  local weights = {
    critical = 10,
    high = 5,
    normal = 2,
    low = 1
  }

  return {
    fairness_key = "priority:" .. priority,
    weight = weights[priority] or 2
  }
end

Route by region

function on_enqueue(msg)
  local region = msg.headers["region"] or "default"

  return {
    fairness_key = "region:" .. region,
    throttle_keys = { "region:" .. region }
  }
end

# Rate limit per region
fila config set throttle.region:us-east 500,750
fila config set throttle.region:eu-west 300,450

Conditional dead-letter by error type

function on_failure(msg)
  -- Permanent errors: dead-letter immediately
  if msg.error:find("4%d%d") then  -- HTTP 4xx
    return { action = "dlq" }
  end

  -- Transient errors: retry with backoff
  if msg.attempts >= 5 then
    return { action = "dlq" }
  end

  local delay = 1000 * (2 ^ (msg.attempts - 1))
  return { action = "retry", delay_ms = delay }
end

Feature flag gating

function on_enqueue(msg)
  local tenant = msg.headers["tenant"] or "default"
  local new_flow = fila.get("feature:new_flow:" .. tenant)

  if new_flow == "enabled" then
    return { fairness_key = tenant .. ":v2", weight = 1 }
  end

  return { fairness_key = tenant, weight = 1 }
end

# Enable new flow for one tenant
fila config set feature:new_flow:acme enabled

Built-in helpers

Fila provides fila.helpers — a set of convenience functions for common patterns. These are available in all scripts alongside fila.get().

`fila.helpers.exponential_backoff(attempts, base_ms, max_ms)`

Returns a delay in milliseconds with exponential growth and ±25% jitter.

attempts — current attempt count
base_ms — base delay (first attempt delay)
max_ms — maximum delay cap

function on_failure(msg)
  if msg.attempts >= 5 then
    return { action = "dlq" }
  end
  return {
    action = "retry",
    delay_ms = fila.helpers.exponential_backoff(msg.attempts, 1000, 60000)
  }
end

`fila.helpers.tenant_route(msg, header_name)`

Extracts a header value as the fairness key. Returns { fairness_key = "default" } if the header is missing.

function on_enqueue(msg)
  return fila.helpers.tenant_route(msg, "tenant_id")
end

`fila.helpers.rate_limit_keys(msg, patterns)`

Generates throttle key strings from patterns. Each {placeholder} is replaced by the corresponding header value. Patterns with missing headers are omitted.

function on_enqueue(msg)
  local route = fila.helpers.tenant_route(msg, "tenant_id")
  route.throttle_keys = fila.helpers.rate_limit_keys(msg, {
    "provider:{provider}",
    "region:{region}"
  })
  return route
end

`fila.helpers.max_retries(attempts, max)`

Returns { action = "retry" } if attempts < max, otherwise { action = "dlq" }.

function on_failure(msg)
  return fila.helpers.max_retries(msg.attempts, 5)
end

Combining helpers

Helpers compose naturally. Here’s a complete on_failure script using multiple helpers:

function on_failure(msg)
  local decision = fila.helpers.max_retries(msg.attempts, 5)
  if decision.action == "retry" then
    decision.delay_ms = fila.helpers.exponential_backoff(msg.attempts, 1000, 60000)
  end
  return decision
end

SDK Quick Start

Get up and running with Fila in your language of choice. Each section covers installation, connection, and the core enqueue/consume/ack flow.

Prerequisites

# Start a Fila broker
fila-server &

# Create a queue
fila queue create demo

Rust

cargo add fila-sdk tokio tokio-stream

use fila_sdk::FilaClient;
use std::collections::HashMap;
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = FilaClient::connect("http://localhost:5555").await?;

    // Enqueue a message
    let mut headers = HashMap::new();
    headers.insert("tenant".to_string(), "acme".to_string());
    let id = client.enqueue("demo", headers, b"hello".to_vec()).await?;
    println!("enqueued: {id}");

    // Consume messages
    let mut stream = client.consume("demo").await?;
    if let Some(msg) = stream.next().await {
        let msg = msg?;
        println!("received: {}", String::from_utf8_lossy(&msg.payload));

        // Acknowledge
        client.ack("demo", &msg.id).await?;
    }

    Ok(())
}

Go

go get github.com/faiscadev/fila-go

package main

import (
    "context"
    "fmt"
    "log"

    fila "github.com/faiscadev/fila-go"
)

func main() {
    client, err := fila.Connect("localhost:5555")
    if err != nil {
        log.Fatal(err)
    }
    defer client.Close()

    ctx := context.Background()

    // Enqueue
    id, err := client.Enqueue(ctx, "demo",
        map[string]string{"tenant": "acme"}, []byte("hello"))
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println("enqueued:", id)

    // Consume
    stream, err := client.Consume(ctx, "demo")
    if err != nil {
        log.Fatal(err)
    }

    msg, err := stream.Recv()
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("received: %s\n", msg.Payload)

    // Ack
    if err := client.Ack(ctx, "demo", msg.ID); err != nil {
        log.Fatal(err)
    }
}

Python

pip install fila

import asyncio
from fila import FilaClient

async def main():
    client = await FilaClient.connect("localhost:5555")

    # Enqueue
    msg_id = await client.enqueue("demo",
        headers={"tenant": "acme"}, payload=b"hello")
    print(f"enqueued: {msg_id}")

    # Consume
    async for msg in client.consume("demo"):
        print(f"received: {msg.payload}")
        await client.ack("demo", msg.id)
        break

asyncio.run(main())

JavaScript / Node.js

npm install fila-client

const { FilaClient } = require('fila-client');

async function main() {
    const client = await FilaClient.connect('localhost:5555');

    // Enqueue
    const id = await client.enqueue('demo',
        { tenant: 'acme' }, Buffer.from('hello'));
    console.log('enqueued:', id);

    // Consume
    const stream = client.consume('demo');
    for await (const msg of stream) {
        console.log('received:', msg.payload.toString());
        await client.ack('demo', msg.id);
        break;
    }
}

main().catch(console.error);

Ruby

gem install fila-client

require 'fila'

client = Fila::Client.new('localhost:5555')

# Enqueue
id = client.enqueue('demo',
  headers: { 'tenant' => 'acme' }, payload: 'hello')
puts "enqueued: #{id}"

# Consume
client.consume('demo') do |msg|
  puts "received: #{msg.payload}"
  client.ack('demo', msg.id)
  break
end

Java

<dependency>
    <groupId>dev.faisca</groupId>
    <artifactId>fila-client</artifactId>
    <version>0.1.0</version>
</dependency>

import dev.faisca.fila.FilaClient;
import java.util.Map;

public class Demo {
    public static void main(String[] args) throws Exception {
        var client = FilaClient.connect("localhost:5555");

        // Enqueue
        var id = client.enqueue("demo",
            Map.of("tenant", "acme"), "hello".getBytes());
        System.out.println("enqueued: " + id);

        // Consume
        var stream = client.consume("demo");
        var msg = stream.next();
        System.out.println("received: " + new String(msg.getPayload()));

        // Ack
        client.ack("demo", msg.getId());
    }
}

Security

TLS

Connect to a TLS-enabled broker by specifying the CA certificate or using system trust store.

SDK	TLS Connection
Rust	`FilaClient::builder("https://host:5555").with_tls().connect().await?`
Go	`fila.Connect("host:5555", fila.WithTLS())`
Python	`FilaClient.connect("host:5555", tls=True)`
JavaScript	`FilaClient.connect('host:5555', { tls: true })`
Ruby	`Fila::Client.new('host:5555', tls: true)`
Java	`FilaClient.connect("host:5555", FilaClient.withTLS())`

For custom CA certificates:

SDK	Custom CA
Rust	`.with_tls_ca("path/to/ca.crt")`
Go	`fila.WithTLSCA("path/to/ca.crt")`
Python	`tls_ca="path/to/ca.crt"`
JavaScript	`{ tlsCa: 'path/to/ca.crt' }`
Ruby	`tls_ca: 'path/to/ca.crt'`
Java	`FilaClient.withTLSCA("path/to/ca.crt")`

API Key Authentication

SDK	API Key
Rust	`.with_api_key("your-key")`
Go	`fila.WithAPIKey("your-key")`
Python	`api_key="your-key"`
JavaScript	`{ apiKey: 'your-key' }`
Ruby	`api_key: 'your-key'`
Java	`FilaClient.withAPIKey("your-key")`

SDK Examples

Working code for every SDK showing the core enqueue -> consume -> ack flow.

Setup

All examples assume a running broker with a queue:

fila-server &
fila queue create demo

Rust

use std::collections::HashMap;
use std::time::Duration;

use fila_sdk::FilaClient;
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = FilaClient::connect("http://localhost:5555").await?;

    // Enqueue
    let mut headers = HashMap::new();
    headers.insert("tenant".to_string(), "acme".to_string());
    let id = client.enqueue("demo", headers, b"hello".to_vec()).await?;
    println!("enqueued: {id}");

    // Consume
    let mut stream = client.consume("demo").await?;
    let msg = tokio::time::timeout(Duration::from_secs(5), stream.next())
        .await??
        .expect("stream error");

    println!("received: {} ({})", msg.id, String::from_utf8_lossy(&msg.payload));

    // Ack
    client.ack("demo", &msg.id).await?;
    println!("acked: {}", msg.id);

    Ok(())
}

Add to Cargo.toml:

[dependencies]
fila-sdk = "0.1"
tokio = { version = "1", features = ["full"] }
tokio-stream = "0.1"

Rust with API key authentication

#![allow(unused)]
fn main() {
let opts = ConnectOptions::new("127.0.0.1:5555")
    .with_api_key("my-secret-key");
let client = FilaClient::connect_with_options(opts).await?;
}

Go

package main

import (
	"context"
	"fmt"
	"log"

	fila "github.com/faiscadev/fila-go"
)

func main() {
	ctx := context.Background()
	client, err := fila.Connect("localhost:5555")
	if err != nil {
		log.Fatal(err)
	}
	defer client.Close()

	// Enqueue
	id, err := client.Enqueue(ctx, "demo", map[string]string{
		"tenant": "acme",
	}, []byte("hello"))
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("enqueued:", id)

	// Consume
	stream, err := client.Consume(ctx, "demo")
	if err != nil {
		log.Fatal(err)
	}

	msg, err := stream.Next()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("received: %s (%s)\n", msg.ID, string(msg.Payload))

	// Ack
	if err := client.Ack(ctx, "demo", msg.ID); err != nil {
		log.Fatal(err)
	}
	fmt.Println("acked:", msg.ID)
}

Python

from fila import FilaClient

client = FilaClient("localhost:5555")

# Enqueue
msg_id = client.enqueue("demo", {"tenant": "acme"}, b"hello")
print(f"enqueued: {msg_id}")

# Consume
stream = client.consume("demo")
msg = next(stream)
print(f"received: {msg.id} ({msg.payload.decode()})")

# Ack
client.ack("demo", msg.id)
print(f"acked: {msg.id}")

JavaScript / Node.js

const { FilaClient } = require('fila-client');

async function main() {
  const client = await FilaClient.connect('localhost:5555');

  // Enqueue
  const id = await client.enqueue('demo', { tenant: 'acme' }, Buffer.from('hello'));
  console.log('enqueued:', id);

  // Consume
  const stream = client.consume('demo');
  for await (const msg of stream) {
    console.log(`received: ${msg.id} (${msg.payload.toString()})`);

    // Ack
    await client.ack('demo', msg.id);
    console.log('acked:', msg.id);
    break; // just one message for the demo
  }
}

main().catch(console.error);

Ruby

require 'fila'

client = Fila::Client.new('localhost:5555')

# Enqueue
id = client.enqueue('demo', { 'tenant' => 'acme' }, 'hello')
puts "enqueued: #{id}"

# Consume
client.consume('demo') do |msg|
  puts "received: #{msg.id} (#{msg.payload})"

  # Ack
  client.ack('demo', msg.id)
  puts "acked: #{msg.id}"
  break
end

Java

import dev.fila.client.FilaClient;
import dev.fila.client.Message;

public class Demo {
    public static void main(String[] args) throws Exception {
        FilaClient client = FilaClient.connect("localhost:5555");

        // Enqueue
        String id = client.enqueue("demo",
            java.util.Map.of("tenant", "acme"),
            "hello".getBytes());
        System.out.println("enqueued: " + id);

        // Consume
        var stream = client.consume("demo");
        Message msg = stream.next();
        System.out.printf("received: %s (%s)%n", msg.getId(),
            new String(msg.getPayload()));

        // Ack
        client.ack("demo", msg.getId());
        System.out.println("acked: " + msg.getId());

        client.close();
    }
}

Integration Patterns

Common messaging patterns with Fila.

Producer / Consumer

The most basic pattern: one service produces messages, another consumes them.

┌──────────┐    enqueue    ┌──────┐    consume    ┌──────────┐
│ Producer │ ────────────► │ Fila │ ────────────► │ Consumer │
└──────────┘               └──────┘               └──────────┘

Use when: You want to decouple a producer from a consumer — the producer doesn’t need to wait for processing to complete.

# producer.py
import asyncio
from fila import FilaClient

async def produce():
    client = await FilaClient.connect("localhost:5555")
    for i in range(100):
        await client.enqueue("orders",
            headers={"tenant": "acme"},
            payload=f"order-{i}".encode())

asyncio.run(produce())

# consumer.py
import asyncio
from fila import FilaClient

async def consume():
    client = await FilaClient.connect("localhost:5555")
    async for msg in client.consume("orders"):
        print(f"processing: {msg.payload}")
        # ... do work ...
        await client.ack("orders", msg.id)

asyncio.run(consume())

Fan-out

Multiple consumers process different types of work from a shared queue. Fila’s fairness keys ensure each workload type gets its fair share.

                              ┌──────────────┐
                         ┌──► │ Consumer (A)  │
┌──────────┐  enqueue   │    └──────────────┘
│ Producer │ ──────► ┌──┴──┐
└──────────┘         │ Fila │  fairness_key routing
                     └──┬──┘
                         └──► ┌──────────────┐
                              │ Consumer (B)  │
                              └──────────────┘

Use when: Different message types need fair scheduling. Without fairness keys, a burst of type-A messages would starve type-B consumers.

# Producer assigns fairness keys via Lua on_enqueue
# Queue created with:
# fila queue create work --on-enqueue '
#   function on_enqueue(msg)
#     return { fairness_key = msg.headers["type"] or "default" }
#   end
# '

# producer.py
async def produce():
    client = await FilaClient.connect("localhost:5555")
    # Type A messages (high volume)
    for i in range(1000):
        await client.enqueue("work",
            headers={"type": "email"}, payload=f"email-{i}".encode())
    # Type B messages (low volume but equally important)
    for i in range(10):
        await client.enqueue("work",
            headers={"type": "sms"}, payload=f"sms-{i}".encode())
    # Fila's DRR scheduler ensures SMS messages aren't starved by emails

Request-Reply

Implement synchronous-style request/reply over async messaging using a correlation ID and a reply queue.

┌─────────┐  enqueue(request)  ┌──────┐  consume   ┌─────────┐
│ Client  │ ─────────────────► │ Fila │ ────────► │ Service │
│         │ ◄───────────────── │      │ ◄──────── │         │
└─────────┘  consume(reply)    └──────┘  enqueue   └─────────┘
                                         (reply)

Use when: A service needs a response but you still want the decoupling and reliability benefits of message queues.

// client.go — sends request, waits for reply
package main

import (
    "context"
    "fmt"
    "log"

    "github.com/google/uuid"
    fila "github.com/faiscadev/fila-go"
)

func main() {
    client, _ := fila.Connect("localhost:5555")
    ctx := context.Background()

    correlationID := uuid.New().String()
    replyQueue := "replies-" + correlationID

    // Create a temporary reply queue
    client.CreateQueue(ctx, replyQueue)
    defer client.DeleteQueue(ctx, replyQueue)

    // Send request with correlation ID
    client.Enqueue(ctx, "requests",
        map[string]string{
            "correlation_id": correlationID,
            "reply_to":       replyQueue,
        },
        []byte("what is 2+2?"))

    // Wait for reply
    stream, _ := client.Consume(ctx, replyQueue)
    reply, _ := stream.Recv()
    fmt.Printf("reply: %s\n", reply.Payload)
    client.Ack(ctx, replyQueue, reply.ID)
}

// service.go — processes requests, sends replies
package main

import (
    "context"
    "fmt"
    "log"

    fila "github.com/faiscadev/fila-go"
)

func main() {
    client, _ := fila.Connect("localhost:5555")
    ctx := context.Background()

    stream, _ := client.Consume(ctx, "requests")
    for {
        msg, err := stream.Recv()
        if err != nil {
            log.Fatal(err)
        }

        // Process request
        answer := fmt.Sprintf("answer: 4 (to: %s)", string(msg.Payload))

        // Send reply to the caller's reply queue
        replyTo := msg.Headers["reply_to"]
        client.Enqueue(ctx, replyTo,
            map[string]string{
                "correlation_id": msg.Headers["correlation_id"],
            },
            []byte(answer))

        client.Ack(ctx, "requests", msg.ID)
    }
}

Configuration Reference

Fila reads configuration from a TOML file. It searches for:

fila.toml in the current working directory
/etc/fila/fila.toml

If no file is found, all defaults are used. The broker runs with zero configuration.

Environment variables

Variable	Default	Description
`FILA_DATA_DIR`	`data`	Path to the RocksDB data directory
`FILA_FIBP_PORT_FILE`	(none)	When set, the server writes the actual FIBP listen address to this file after binding (useful for test harnesses with port 0)

Full configuration

[server]
listen_addr = "0.0.0.0:5555"    # FIBP listen address

[scheduler]
command_channel_capacity = 10000 # internal command channel buffer size
idle_timeout_ms = 100            # scheduler idle timeout between rounds (ms)
quantum = 1000                   # DRR quantum per fairness key per round

[lua]
default_timeout_ms = 10              # max script execution time (ms)
default_memory_limit_bytes = 1048576 # max memory per script (1 MB)
circuit_breaker_threshold = 3        # consecutive failures before circuit break
circuit_breaker_cooldown_ms = 10000  # cooldown period after circuit break (ms)

[telemetry]
otlp_endpoint = "http://localhost:4317"  # OTLP endpoint (omit to disable)
service_name = "fila"                     # OTel service name
metrics_interval_ms = 10000              # metrics export interval (ms)

# Uncomment to enable the web management GUI
# [gui]
# listen_addr = "0.0.0.0:8080"  # HTTP port for the dashboard

# FIBP transport tuning (defaults are optimized — override only if needed)
# [fibp]
# listen_addr = "0.0.0.0:5555"       # TCP listen address
# max_frame_size = 16777216           # max frame size in bytes (16 MB)
# keepalive_interval_secs = 15        # heartbeat ping interval
# keepalive_timeout_secs = 10         # timeout waiting for pong

Section reference

`[server]`

Key	Type	Default	Description
`listen_addr`	string	`"0.0.0.0:5555"`	Address and port for the FIBP server

`[scheduler]`

Key	Type	Default	Description
`command_channel_capacity`	integer	`10000`	Size of the bounded channel between FIBP connection handlers and the scheduler loop. Increase if you see backpressure under high load.
`idle_timeout_ms`	integer	`100`	How long the scheduler waits when there’s no work before checking again. Lower values reduce latency at the cost of CPU.
`quantum`	integer	`1000`	DRR quantum. Each fairness key gets `weight * quantum` deficit per scheduling round. Higher values mean more messages delivered per key per round (coarser interleaving).

`[lua]`

Key	Type	Default	Description
`default_timeout_ms`	integer	`10`	Maximum execution time for Lua scripts. Enforced via instruction count hook (approximate).
`default_memory_limit_bytes`	integer	`1048576` (1 MB)	Maximum memory a Lua script can allocate.
`circuit_breaker_threshold`	integer	`3`	Number of consecutive Lua execution failures before the circuit breaker trips. When tripped, Lua hooks are bypassed and default scheduling is used.
`circuit_breaker_cooldown_ms`	integer	`10000`	How long to wait after circuit breaker trips before retrying Lua execution.

`[telemetry]`

Telemetry export is optional. When otlp_endpoint is omitted, the broker uses plain tracing-subscriber logging only.

Key	Type	Default	Description
`otlp_endpoint`	string	(none)	OTLP endpoint for exporting traces and metrics. Example: `"http://localhost:4317"`.
`service_name`	string	`"fila"`	Service name reported in OTel traces and metrics.
`metrics_interval_ms`	integer	`10000`	How often metrics are exported to the OTLP endpoint.

`[gui]`

Web management GUI. Disabled by default. When enabled, serves a read-only dashboard on a separate HTTP port.

Key	Type	Default	Description
`listen_addr`	string	`"0.0.0.0:8080"`	Address and port for the web dashboard HTTP server

`[fibp]`

FIBP (Fila Binary Protocol) transport configuration. FIBP is Fila’s binary TCP protocol, using length-prefixed frames with a 6-byte header (flags, op code, correlation ID).

When [tls] is configured, FIBP connections are TLS-wrapped. When [auth] is configured, FIBP clients must send an OP_AUTH frame (containing the API key) as the first frame after handshake. Per-queue ACL checks apply to all data and admin operations.

FIBP supports all admin operations (create/delete queue, list queues, queue stats, redrive) using protobuf-encoded payloads for schema evolution.

Key	Type	Default	Description
`listen_addr`	string	`"0.0.0.0:5557"`	TCP address for the FIBP transport
`max_frame_size`	integer	`16777216` (16 MB)	Maximum frame size in bytes. Frames exceeding this limit are rejected.
`keepalive_interval_secs`	integer	`15`	Interval between keepalive heartbeat pings.
`keepalive_timeout_secs`	integer	`10`	Timeout waiting for a keepalive pong before closing the connection.

SDK connection

The Rust SDK connects via FIBP:

#![allow(unused)]
fn main() {
let client = FilaClient::connect("127.0.0.1:5555").await?;

// With TLS and API key
let opts = ConnectOptions::new("127.0.0.1:5555")
    .with_tls()
    .with_api_key("my-key");
let client = FilaClient::connect_with_options(opts).await?;
}

Notes:

The address is a raw TCP address (not a URL with http://).
TLS and API key options are set on ConnectOptions.
The FIBP wire format supports multi-message enqueue in a single frame.

OpenTelemetry metrics

When telemetry is enabled, Fila exports the following metrics:

Metric	Type	Description
`fila.messages.enqueued`	Counter	Messages enqueued
`fila.messages.delivered`	Counter	Messages delivered to consumers
`fila.messages.acked`	Counter	Messages acknowledged
`fila.messages.nacked`	Counter	Messages rejected
`fila.messages.expired`	Counter	Messages expired (visibility timeout)
`fila.messages.dead_lettered`	Counter	Messages moved to DLQ
`fila.messages.redriven`	Counter	Messages redriven from DLQ
`fila.queue.depth`	Gauge	Pending messages per queue
`fila.queue.in_flight`	Gauge	Leased messages per queue
`fila.queue.consumers`	Gauge	Active consumers per queue
`fila.queue.fairness_keys`	Gauge	Active fairness keys per queue
`fila.delivery.latency`	Histogram	Time from enqueue to consumer delivery
`fila.lua.executions`	Counter	Lua script executions (by hook type and outcome)
`fila.throttle.limited`	Counter	Messages held due to throttle limits

All queue-scoped metrics include a queue attribute.

Deployment Guide

This guide covers deploying Fila in production environments.

Docker (single node)

The quickest way to run Fila:

docker run -d \
  --name fila \
  -p 5555:5555 \
  -v fila-data:/var/lib/fila \
  ghcr.io/faiscadev/fila:latest

With a custom config file:

docker run -d \
  --name fila \
  -p 5555:5555 \
  -v fila-data:/var/lib/fila \
  -v ./fila.toml:/etc/fila/fila.toml:ro \
  ghcr.io/faiscadev/fila:latest \
  fila-server --config /etc/fila/fila.toml

systemd

For bare-metal or VM deployments on Linux.

1. Install the binary

curl -fsSL https://raw.githubusercontent.com/faiscadev/fila/main/install.sh -o install.sh
bash install.sh

2. Create system user and directories

sudo useradd --system --no-create-home --shell /usr/sbin/nologin fila
sudo mkdir -p /etc/fila /var/lib/fila
sudo chown fila:fila /var/lib/fila

3. Create configuration

Copy the example config and customize:

sudo cp deploy/fila.toml /etc/fila/fila.toml
sudo editor /etc/fila/fila.toml

See Configuration Reference for all options.

4. Install and start the service

sudo cp deploy/fila.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now fila
sudo systemctl status fila

5. Verify

fila queue create test-queue
fila queue inspect test-queue

Docker Compose cluster

Run a 3-node cluster locally using Docker Compose.

1. Start the cluster

docker compose -f deploy/docker-compose.cluster.yml up -d

This starts three Fila nodes:

node1 — bootstrap node, port 5555
node2 — port 5565
node3 — port 5575

2. Create a queue and test

# Connect to any node
fila --addr localhost:5555 queue create orders

# Enqueue via node1, consume via node2 (transparent routing)
fila --addr localhost:5555 enqueue orders --payload "hello"
fila --addr localhost:5565 queue inspect orders

3. Stop the cluster

docker compose -f deploy/docker-compose.cluster.yml down
# Add -v to also remove data volumes

Kubernetes with Helm

Single-node deployment

helm install fila deploy/helm/fila/

Clustered deployment

helm install fila deploy/helm/fila/ \
  --set replicaCount=3 \
  --set cluster.enabled=true \
  --set cluster.replicationFactor=3 \
  --set persistence.enabled=true \
  --set persistence.size=20Gi

With TLS and authentication

# Create TLS secret first
kubectl create secret tls fila-tls \
  --cert=server.crt --key=server.key

helm install fila deploy/helm/fila/ \
  --set tls.enabled=true \
  --set tls.certSecret=fila-tls \
  --set auth.bootstrapApiKey=your-secret-key

With OpenTelemetry

helm install fila deploy/helm/fila/ \
  --set telemetry.enabled=true \
  --set telemetry.otlpEndpoint=http://otel-collector:4317

Custom values file

Create a my-values.yaml:

replicaCount: 3

cluster:
  enabled: true
  replicationFactor: 3

persistence:
  enabled: true
  size: 50Gi
  storageClass: gp3

tls:
  enabled: true
  certSecret: fila-tls

auth:
  bootstrapApiKey: "change-me-in-production"

telemetry:
  enabled: true
  otlpEndpoint: "http://otel-collector.monitoring:4317"

resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 2000m
    memory: 2Gi

helm install fila deploy/helm/fila/ -f my-values.yaml

Production checklist

Before going to production, verify these items:

Data persistence — data directory is on a persistent volume (not ephemeral container storage)
TLS enabled — all client connections encrypted; consider mTLS for service-to-service
API key auth — bootstrap API key set; per-queue ACLs configured for multi-tenant
Backups — data directory backup strategy in place (RocksDB is crash-consistent, point-in-time snapshots work)
Monitoring — OTel exporter configured, scraping endpoint or collector receiving metrics
Log level — set RUST_LOG=info (default); use debug only for troubleshooting
File descriptors — LimitNOFILE=65536 or higher for high-connection workloads
Cluster sizing — for HA, run 3+ nodes with replication_factor = 3
Resource limits — set CPU/memory limits appropriate for workload
Lua safety — review script timeout and memory limits per queue

Cluster Scaling Benchmark Methodology

This document describes how to measure Fila’s horizontal scaling characteristics using the fila-bench harness against a multi-node cluster.

Prerequisites

Fila server binary (3 copies, or 1 binary run 3 times with different configs)
fila-bench binary (from crates/fila-bench)
fila CLI binary (from crates/fila-cli)

Setting Up a 3-Node Local Cluster

Create three configuration files, one per node.

node1.toml

[server]
listen_addr = "127.0.0.1:5555"

[cluster]
enabled = true
node_id = 1
bind_addr = "127.0.0.1:5556"
bootstrap = true
peers = []

node2.toml

[server]
listen_addr = "127.0.0.1:5565"

[cluster]
enabled = true
node_id = 2
bind_addr = "127.0.0.1:5566"
bootstrap = false
peers = ["127.0.0.1:5556"]

node3.toml

[server]
listen_addr = "127.0.0.1:5575"

[cluster]
enabled = true
node_id = 3
bind_addr = "127.0.0.1:5576"
bootstrap = false
peers = ["127.0.0.1:5556"]

Starting the Cluster

# Terminal 1
fila-server --config node1.toml

# Terminal 2
fila-server --config node2.toml

# Terminal 3
fila-server --config node3.toml

Verify the cluster is healthy:

fila --addr 127.0.0.1:5555 queue list
# Should show "Cluster nodes: 3" at the bottom

Queue-to-Node Assignment

When a queue is created in cluster mode, Fila automatically distributes queue leadership across nodes for balanced load:

Preferred leader selection: each new queue’s initial leader is the node with the fewest current queue leaderships (tie-break: lowest node ID).
Node subset selection: when the cluster has more nodes than replication_factor (default: 3), only the N least-loaded nodes participate in each queue’s Raft group. This spreads I/O across more nodes.
No manual placement: operators do not need to specify which nodes a queue runs on. The cluster handles assignment automatically.

Configure replication_factor in the [cluster] section:

[cluster]
enabled = true
node_id = 1
replication_factor = 3  # default: 3 nodes per queue Raft group

Inspect queue leadership via fila queue inspect <name> — the “Raft leader” line shows which node currently leads each queue.

Running Scaling Benchmarks

Step 1: Baseline (Single Node)

Run fila-bench against a single standalone node to establish baseline throughput:

fila-bench --addr 127.0.0.1:5555 --category enqueue_throughput
fila-bench --addr 127.0.0.1:5555 --category consume_throughput
fila-bench --addr 127.0.0.1:5555 --category enqueue_consume_mixed

Record the messages/second for each category.

Step 2: Cluster Throughput

Create queues distributed across the cluster, then run the same benchmarks:

# Create test queues (they'll be replicated across all 3 nodes)
fila --addr 127.0.0.1:5555 queue create bench-q1
fila --addr 127.0.0.1:5555 queue create bench-q2
fila --addr 127.0.0.1:5555 queue create bench-q3

# Check leadership distribution
fila --addr 127.0.0.1:5555 queue list

Run benchmarks targeting different nodes to exercise the full cluster:

# Enqueue through node 1 (may forward to queue leaders on other nodes)
fila-bench --addr 127.0.0.1:5555 --category enqueue_throughput

# Enqueue through node 2
fila-bench --addr 127.0.0.1:5565 --category enqueue_throughput

# Mixed workload across all nodes (run in parallel)
fila-bench --addr 127.0.0.1:5555 --category enqueue_consume_mixed &
fila-bench --addr 127.0.0.1:5565 --category enqueue_consume_mixed &
fila-bench --addr 127.0.0.1:5575 --category enqueue_consume_mixed &
wait

Step 3: Measure Scaling Factor

Compare cluster throughput to single-node baseline:

scaling_factor = cluster_total_msgs_per_sec / single_node_msgs_per_sec

For a well-distributed workload across 3 queues on 3 nodes, the ideal scaling factor is 3.0x. In practice, expect 2.0x-2.5x due to Raft consensus overhead (log replication, leader forwarding).

What to Measure

Metric	How	Tool
Enqueue throughput (msg/s)	`fila-bench --category enqueue_throughput`	fila-bench
Consume throughput (msg/s)	`fila-bench --category consume_throughput`	fila-bench
P99 enqueue latency	`fila-bench --category enqueue_latency`	fila-bench
Raft consensus overhead	Compare single-node vs cluster latency	fila-bench
Leadership distribution	`fila queue list` LEADER column	fila CLI
Failover time	Kill leader, measure time to new leader election	manual

Interpreting Results

Linear scaling means the scaling factor approaches N (number of nodes). Fila achieves this when queues are distributed across different leaders.
Sub-linear scaling is expected for single-queue workloads since all writes go through one Raft leader regardless of cluster size.
Raft overhead adds ~1-3ms per write (log replication to majority). This is the cost of durability and fault tolerance.
Forwarding overhead adds latency when a client connects to a non-leader node. The request is transparently forwarded to the leader via the internal cluster protocol.

Notes

The fila-bench harness runs single-node benchmarks by default. To benchmark a cluster, point it at any cluster node’s client address.
Queue creation in cluster mode goes through the meta Raft group. All nodes in the cluster will create the queue locally.
Consumer streams are served by the queue’s Raft leader. If the leader changes (failover), consumers automatically reconnect to the new leader.

Troubleshooting

Common issues and their solutions.

Connection refused

Symptom: transport error: connection refused or failed to connect

Causes:

Broker not running. Start it: fila-server or docker run -p 5555:5555 ghcr.io/faiscadev/fila:latest
Wrong address. Default is localhost:5555. Check your connection string.
Firewall blocking port 5555. Verify: nc -zv localhost 5555
In Docker: use host.docker.internal:5555 from containers, not localhost

TLS errors

Symptom: tls handshake error, certificate verify failed, or unknown ca

Solutions:

Self-signed cert: Provide the CA certificate to the SDK (with_tls_ca("ca.crt"))
System trust store: Use with_tls() (no CA path) if the broker’s cert is issued by a public CA
Expired cert: Check cert expiry: openssl x509 -in server.crt -noout -enddate
Wrong hostname: The cert’s CN/SAN must match the hostname you’re connecting to
mTLS required: If the broker requires client certs, provide both client cert and key

Authentication failures

Symptom: UNAUTHENTICATED error status

Solutions:

Missing API key: Set api_key / with_api_key() on your client connection
Wrong key: Verify the key matches what’s configured on the broker
Key revoked: Check if the key was deleted. List keys: fila auth list
Permission denied (PERMISSION_DENIED): The key exists but lacks permission for the queue. Check ACLs: fila auth acl list

Queue not found

Symptom: NOT_FOUND error status on enqueue or consume

Solutions:

Queue doesn’t exist. Create it: fila queue create <name>
Typo in queue name. List queues: fila queue list
In clustered mode: if you get UNAVAILABLE instead of NOT_FOUND, the node is still starting up — retry after a moment

Consumer timeout

Symptom: Consumer stream hangs with no messages delivered

Causes:

Queue is empty. Check depth: fila queue inspect <name>
All messages are leased to other consumers. Check pending count in queue stats.
Visibility timeout expired and messages were re-queued. Increase timeout or process faster.
Throttle rate limit hit. Check throttle config: fila config list

Message redelivery

Symptom: Same message delivered multiple times

Causes:

Visibility timeout expiry: If a consumer takes too long to ack, the message is re-delivered. Solution: ack promptly, or increase the visibility timeout.
Nack without DLQ: If on_failure returns { action = "retry" }, the message goes back to the queue. After enough retries, consider dead-lettering.
Consumer crash: Leased messages are re-delivered after visibility timeout. This is by design — at-least-once delivery.

Best practice: Make consumers idempotent. Use msg.id as a deduplication key.

Lua script errors

Symptom: Warning logs about script failures, circuit breaker tripping

Solutions:

Syntax error: Validate before setting: fila queue create test --on-enqueue 'function on_enqueue(msg) return {} end'
Runtime error: Check logs for the specific error. Common: accessing nil headers, type mismatches.
Timeout: Script took too long. Default limit is 10ms. Simplify the script or increase lua.default_timeout_ms.
Memory limit: Script uses too much memory. Default is 1MB. Check for unbounded table growth.
Circuit breaker: After 3 consecutive failures, scripts are bypassed for 10 seconds. Fix the script to reset the breaker.

High memory usage

Solutions:

Check queue depths: fila queue inspect <name>. Large queues consume memory.
RocksDB block cache can be tuned via configuration.
In clustered mode, each node replicates data for its assigned queues.

Cluster issues

Symptom: Node not joining cluster, split-brain, leader election failures

Solutions:

Node not joining: Check cluster.peers config — all nodes must list each other’s cluster addresses (port 5556, not 5555).
Bootstrap: Exactly one node should have bootstrap = true. Start it first.
Network: Verify nodes can reach each other on the cluster port: nc -zv node2 5556
Node ID: Each node must have a unique node_id. Duplicates cause undefined behavior.

API Reference

Fila exposes two service groups on the same port (default 5555) via FIBP (Fila Binary Protocol). Protobuf message definitions are in proto/fila/v1/.

Hot-path service (`fila.v1.FilaService`)

Used by producers and consumers for message operations.

Enqueue

Enqueue one or more messages. Single-message enqueue is a batch of one.

rpc Enqueue(EnqueueRequest) returns (EnqueueResponse)

Request (EnqueueRequest):

Field	Type	Description
`messages`	repeated EnqueueMessage	One or more messages to enqueue

EnqueueMessage:

Field	Type	Description
`queue`	string	Queue name
`headers`	map<string, string>	Arbitrary key-value headers (accessible in Lua hooks)
`payload`	bytes	Message body

Response (EnqueueResponse):

Field	Type	Description
`results`	repeated EnqueueResult	One result per input message (same order)

EnqueueResult:

Field	Type	Description
`message_id`	string	UUID assigned (on success)
`error`	EnqueueError	Error details (on failure)

Only one of message_id or error is set (protobuf oneof).

EnqueueErrorCode:

Code	Description
`ENQUEUE_ERROR_CODE_QUEUE_NOT_FOUND`	Queue does not exist
`ENQUEUE_ERROR_CODE_STORAGE`	Storage layer error
`ENQUEUE_ERROR_CODE_LUA`	Lua hook rejected the message
`ENQUEUE_ERROR_CODE_PERMISSION_DENIED`	Caller lacks permission

StreamEnqueue

Bidirectional streaming enqueue with sequence tracking. The client sends batches of messages on the request stream and receives per-batch results on the response stream. Sequence numbers allow the client to correlate responses with requests.

rpc StreamEnqueue(stream StreamEnqueueRequest) returns (stream StreamEnqueueResponse)

Request (stream, StreamEnqueueRequest):

Field	Type	Description
`messages`	repeated EnqueueMessage	Messages to enqueue in this batch
`sequence_number`	uint64	Client-assigned sequence number for correlation

Response (stream, StreamEnqueueResponse):

Field	Type	Description
`sequence_number`	uint64	Echoed sequence number from the request
`results`	repeated EnqueueResult	One result per input message

Consume

Open a server-streaming connection to receive messages. The broker delivers messages according to the DRR scheduler, respecting fairness groups and throttle limits.

rpc Consume(ConsumeRequest) returns (stream ConsumeResponse)

Request:

Field	Type	Description
`queue`	string	Queue name to consume from

Response (stream, ConsumeResponse):

Field	Type	Description
`messages`	repeated Message	One or more delivered messages (see Message below)

The stream stays open until the client disconnects. Messages are delivered as they become available — the stream blocks when no messages are ready.

Errors:

Error Status	Condition
`NOT_FOUND`	Queue does not exist

Ack

Acknowledge one or more messages. Removes acknowledged messages from the broker.

rpc Ack(AckRequest) returns (AckResponse)

Request (AckRequest):

Field	Type	Description
`messages`	repeated AckMessage	One or more messages to acknowledge

AckMessage:

Field	Type	Description
`queue`	string	Queue name
`message_id`	string	ID of the message to acknowledge

Response (AckResponse):

Field	Type	Description
`results`	repeated AckResult	One result per input message

AckResult:

Field	Type	Description
`success`	AckSuccess	Empty message (on success)
`error`	AckError	Error details (on failure)

Only one of success or error is set (protobuf oneof).

AckErrorCode:

Code	Description
`ACK_ERROR_CODE_MESSAGE_NOT_FOUND`	Message does not exist or is not leased
`ACK_ERROR_CODE_STORAGE`	Storage layer error
`ACK_ERROR_CODE_PERMISSION_DENIED`	Caller lacks permission

Nack

Reject one or more messages. Triggers the on_failure Lua hook (if configured) to decide retry vs. dead-letter.

rpc Nack(NackRequest) returns (NackResponse)

Request (NackRequest):

Field	Type	Description
`messages`	repeated NackMessage	One or more messages to reject

NackMessage:

Field	Type	Description
`queue`	string	Queue name
`message_id`	string	ID of the message to reject
`error`	string	Error description (passed to `on_failure` hook as `msg.error`)

Response (NackResponse):

Field	Type	Description
`results`	repeated NackResult	One result per input message

NackResult:

Field	Type	Description
`success`	NackSuccess	Empty message (on success)
`error`	NackError	Error details (on failure)

Only one of success or error is set (protobuf oneof).

NackErrorCode:

Code	Description
`NACK_ERROR_CODE_MESSAGE_NOT_FOUND`	Message does not exist or is not leased
`NACK_ERROR_CODE_STORAGE`	Storage layer error
`NACK_ERROR_CODE_PERMISSION_DENIED`	Caller lacks permission

Admin service (`fila.v1.FilaAdmin`)

Used by operators and the fila CLI for queue management, configuration, and diagnostics.

CreateQueue

Create a new queue with optional Lua hooks and visibility timeout.

rpc CreateQueue(CreateQueueRequest) returns (CreateQueueResponse)

Request:

Field	Type	Description
`name`	string	Queue name
`config`	QueueConfig	Optional configuration (see below)

QueueConfig:

Field	Type	Default	Description
`on_enqueue_script`	string	(none)	Lua script run on every enqueue
`on_failure_script`	string	(none)	Lua script run on every nack
`visibility_timeout_ms`	uint64	30000	Lease duration in milliseconds

Response:

Field	Type	Description
`queue_id`	string	Queue identifier

Errors:

Error Status	Condition
`ALREADY_EXISTS`	Queue with that name already exists

DeleteQueue

Delete a queue and all its messages.

rpc DeleteQueue(DeleteQueueRequest) returns (DeleteQueueResponse)

Request:

Field	Type	Description
`queue`	string	Queue name

Errors:

Error Status	Condition
`NOT_FOUND`	Queue does not exist

ListQueues

List all queues with summary statistics.

rpc ListQueues(ListQueuesRequest) returns (ListQueuesResponse)

Response:

Field	Type	Description
`queues`	repeated QueueInfo	List of queues

QueueInfo:

Field	Type	Description
`name`	string	Queue name
`depth`	uint64	Number of pending messages
`in_flight`	uint64	Number of leased (in-flight) messages
`active_consumers`	uint32	Number of connected consumers

SetConfig

Set a runtime configuration key-value pair. Persisted across restarts.

rpc SetConfig(SetConfigRequest) returns (SetConfigResponse)

Request:

Field	Type	Description
`key`	string	Configuration key
`value`	string	Configuration value

GetConfig

Retrieve a configuration value by key.

rpc GetConfig(GetConfigRequest) returns (GetConfigResponse)

Request:

Field	Type	Description
`key`	string	Configuration key

Response:

Field	Type	Description
`value`	string	Configuration value

Errors:

Error Status	Condition
`NOT_FOUND`	Key does not exist

ListConfig

List configuration entries, optionally filtered by prefix.

rpc ListConfig(ListConfigRequest) returns (ListConfigResponse)

Request:

Field	Type	Description
`prefix`	string	Filter entries by key prefix (empty = all)

Response:

Field	Type	Description
`entries`	repeated ConfigEntry	Key-value pairs
`total_count`	uint32	Total number of matching entries

ConfigEntry:

Field	Type	Description
`key`	string	Configuration key
`value`	string	Configuration value

GetStats

Get detailed statistics for a queue, including per-fairness-key and per-throttle-key breakdowns.

rpc GetStats(GetStatsRequest) returns (GetStatsResponse)

Request:

Field	Type	Description
`queue`	string	Queue name

Response:

Field	Type	Description
`depth`	uint64	Total pending messages
`in_flight`	uint64	Messages currently leased
`active_fairness_keys`	uint64	Number of fairness keys with pending messages
`active_consumers`	uint32	Connected consumers
`quantum`	uint32	DRR quantum value
`per_key_stats`	repeated PerFairnessKeyStats	Per-fairness-key breakdown
`per_throttle_stats`	repeated PerThrottleKeyStats	Per-throttle-key breakdown

PerFairnessKeyStats:

Field	Type	Description
`key`	string	Fairness key
`pending_count`	uint64	Pending messages for this key
`current_deficit`	int64	Current DRR deficit
`weight`	uint32	DRR weight

PerThrottleKeyStats:

Field	Type	Description
`key`	string	Throttle key
`tokens`	double	Current available tokens
`rate_per_second`	double	Token refill rate
`burst`	double	Maximum bucket capacity

Errors:

Error Status	Condition
`NOT_FOUND`	Queue does not exist

Redrive

Move pending messages from a dead letter queue back to the source queue.

rpc Redrive(RedriveRequest) returns (RedriveResponse)

Request:

Field	Type	Description
`dlq_queue`	string	DLQ name (e.g., `orders.dlq`)
`count`	uint64	Maximum number of messages to redrive

Response:

Field	Type	Description
`redriven`	uint64	Number of messages actually moved

Only pending (non-leased) messages are redriven. Leased messages in the DLQ are skipped to avoid interfering with active consumers.

Errors:

Error Status	Condition
`NOT_FOUND`	DLQ does not exist

Message types

Message

The core message envelope returned by Consume.

message Message {
  string id = 1;
  map<string, string> headers = 2;
  bytes payload = 3;
  MessageMetadata metadata = 4;
  MessageTimestamps timestamps = 5;
}

Field	Type	Description
`id`	string	UUID assigned at enqueue
`headers`	map<string, string>	Headers set by the producer
`payload`	bytes	Message body
`metadata`	MessageMetadata	Broker-assigned scheduling metadata
`timestamps`	MessageTimestamps	Lifecycle timestamps

MessageMetadata

Scheduling metadata assigned by the broker (via Lua on_enqueue or defaults).

message MessageMetadata {
  string fairness_key = 1;
  uint32 weight = 2;
  repeated string throttle_keys = 3;
  uint32 attempt_count = 4;
  string queue_id = 5;
}

Field	Type	Description
`fairness_key`	string	DRR fairness group key
`weight`	uint32	DRR weight for this key
`throttle_keys`	repeated string	Token bucket keys checked before delivery
`attempt_count`	uint32	Number of delivery attempts
`queue_id`	string	Queue this message belongs to

MessageTimestamps

message MessageTimestamps {
  google.protobuf.Timestamp enqueued_at = 1;
  google.protobuf.Timestamp leased_at = 2;
}

Field	Type	Description
`enqueued_at`	Timestamp	When the message was first enqueued
`leased_at`	Timestamp	When the message was last delivered to a consumer

Benchmarks

This page presents Fila’s benchmark results: self-benchmarks measuring single-node performance, and competitive comparisons against Kafka, RabbitMQ, and NATS.

Results from commit 1e5bb0e on 2026-03-26. Run benchmarks on your own hardware for results relevant to your environment. See Reproducing results for instructions.

Self-benchmarks

Self-benchmarks measure Fila’s single-node performance across throughput, latency, scheduling, and resource usage. The benchmark suite is in crates/fila-bench/ and uses the Fila SDK as a blackbox client against a real server instance.

Throughput

Metric	Value	Unit
Enqueue throughput (1KB payload)	5,051	msg/s
Enqueue throughput (1KB payload)	4.93	MB/s

Single producer, sustained over a 3-second measurement window after 1-second warmup.

End-to-end latency

Round-trip latency: produce a message, consume it, measure the interval. 100 samples per load level.

Load level	Producers	p50	p95	p99
Light	1	0.00 ms	0.00 ms	0.00 ms

Fair scheduling overhead

Compares throughput with DRR fair scheduling enabled vs plain FIFO delivery.

Mode	Throughput (msg/s)
FIFO baseline	1,307
Fair scheduling (DRR)	1,247
Overhead	3.1%

The DRR scheduler adds minimal overhead compared to FIFO delivery (< 5% target).

Fairness accuracy

Messages enqueued across 5 fairness keys with weights 1:2:3:4:5. 2,000 messages per key (10,000 total), consuming a window of 5,000.

Key	Weight	Expected share	Actual share	Deviation
tenant-1	1	6.7%	6.7%	0.2%
tenant-2	2	13.3%	13.4%	0.2%
tenant-3	3	20.0%	20.0%	0.1%
tenant-4	4	26.7%	26.6%	0.1%
tenant-5	5	33.3%	33.3%	0.1%

The DRR scheduler distributes messages proportionally to weight within any delivery window. Max deviation is < 1%, well within the < 5% NFR target.

Lua script overhead

Measures per-message overhead of executing an on_enqueue Lua hook.

Metric	Value	Unit
Throughput without Lua	987	msg/s
Throughput with `on_enqueue` hook	943	msg/s
Per-message overhead	31.7	us

The Lua hook adds < 6 us per-message overhead, well within the < 50 us NFR target.

Fairness key cardinality scaling

Scheduling throughput as the number of distinct fairness keys increases.

Key count	Throughput (msg/s)
10	1,479
1,000	818
10,000	509

Consumer concurrency scaling

Aggregate consume throughput with increasing concurrent consumer streams.

Consumers	Throughput (msg/s)
1	66
10	1,009
100	1,863

Memory footprint

Metric	Value
RSS idle	351 MB
RSS under load (10K messages)	351 MB

Memory usage is dominated by the RocksDB buffer pool, not message count.

RocksDB compaction impact

Metric	p99 latency
Idle (no compaction)	0.00 ms
Active compaction	0.00 ms
Delta	< 0.39 ms

Compaction has no measurable negative impact on tail latency in single-node benchmarks.

Batch benchmarks

Batch benchmarks measure throughput and latency of multi-message Enqueue (multiple EnqueueMessage items per EnqueueRequest) and compare it against single-message enqueue. These benchmarks are gated behind FILA_BENCH_BATCH=1 because they exercise batch-specific code paths and take additional time.

Enable with FILA_BENCH_BATCH=1:

FILA_BENCH_BATCH=1 cargo bench -p fila-bench --bench system

Multi-message enqueue throughput

Measures multi-message Enqueue throughput at various batch sizes with 1KB messages. Reports both messages/s and batches/s.

Batch size	Throughput (msg/s)	Batches/s
1	—	—
10	—	—
50	—	—
100	—	—
500	—	—

Batch size scaling

Measures throughput as a function of batch size (1 to 1000) to identify the point of diminishing returns.

Batch size	Throughput (msg/s)
1	—
5	—
10	—
25	—
50	—
100	—
250	—
500	—
1000	—

Auto-batching latency

Measures end-to-end latency (multi-message enqueue to consume) at various producer concurrency levels. Simulates client-side auto-batching by accumulating messages and flushing via the Enqueue RPC with 50 messages per request.

Producers	p50	p95	p99	p99.9	p99.99	max
1	—	—	—	—	—	—
10	—	—	—	—	—	—
50	—	—	—	—	—	—

Batched vs unbatched comparison

Runs identical workloads (3,000 messages) with three approaches and reports throughput and speedup ratios.

Mode	Throughput (msg/s)	Speedup
Unbatched	—	1.0x
Explicit batch (size 100)	—	—x
Auto-batch (size 100)	—	—x

Speedup ratios are computed relative to the unbatched baseline.

Delivery batching throughput

Measures consumer throughput with varying concurrent consumer counts. Messages are pre-loaded and continuously produced via multi-message Enqueue.

Consumers	Throughput (msg/s)
1	—
10	—
100	—

Concurrent producer batching

Measures aggregate throughput with multiple concurrent producers all using multi-message Enqueue (batch size 100).

Producers	Throughput (msg/s)
1	—
5	—
10	—
50	—

Subsystem benchmarks

Subsystem benchmarks isolate and measure each internal component independently, bypassing the full server stack. This helps identify where time is spent and which component dominates in different workloads.

Enable with FILA_BENCH_SUBSYSTEM=1:

FILA_BENCH_SUBSYSTEM=1 cargo bench -p fila-bench --bench system

RocksDB raw write throughput

Measures raw put_message throughput directly against RocksDB, bypassing scheduler, FIBP, and serialization. Isolates storage engine performance.

Payload	Throughput (ops/s)	p50 latency	p99 latency
1KB	—	—	—
64KB	—	—	—

Protobuf serialization throughput

Measures protobuf encode and decode throughput for EnqueueRequest and ConsumeResponse at three payload sizes. Isolates serialization overhead.

Payload	Encode (MB/s)	Encode (ns/msg)	Decode (ns/msg)
64B	—	—	—
1KB	—	—	—
64KB	—	—	—

Reported for both EnqueueRequest (producer path) and ConsumeResponse (consumer path).

DRR scheduler throughput

Measures next_key() + consume_deficit() cycle throughput at varying active key counts. Isolates the scheduling algorithm from storage I/O.

Active keys	Throughput (sel/s)
10	—
1,000	—
10,000	—

FIBP round-trip overhead

Measures round-trip latency for a minimal (1-byte payload) Enqueue request. Quantifies the fixed per-call overhead of FIBP framing, separate from message processing.

Metric	Value	Unit
p50 latency	—	us
p99 latency	—	us
p99.9 latency	—	us
Throughput	—	ops/s

Lua execution throughput

Measures on_enqueue hook execution throughput for three script complexity levels, directly against the Lua VM (no server, no FIBP).

Script	Throughput (exec/s)	p50	p99
No-op (return defaults)	—	—	—
Header-set (read 2 headers)	—	—	—
Complex routing (string ops, conditionals, table insert)	—	—	—

Competitive comparison

Fila is compared against Kafka, RabbitMQ, and NATS on queue-oriented workloads. All brokers run in Docker containers and are benchmarked using native Rust clients via the bench-competitive binary. See Methodology for details.

How to run competitive benchmarks

cd bench/competitive
make bench-competitive

Results are written to bench/competitive/results/bench-{broker}.json.

Workloads

Each broker is tested with identical workloads using its recommended high-throughput configuration:

Workload	Description	Batching
Throughput	Sustained message production rate (64B, 1KB, 64KB payloads)	Each broker’s recommended batching
Latency	Produce-consume round-trip (p50/p95/p99)	Unbatched
Lifecycle	Full enqueue-consume-ack cycle (1,000 messages)	Unbatched
Multi-producer	3 concurrent producers aggregate throughput	Each broker’s recommended batching
Resources	CPU and memory during benchmark	—

Broker configurations

Broker	Version	Mode	Throughput batching
Fila	latest	Docker container, DRR scheduler	`AccumulatorMode::Auto` (4 concurrent producers)
Kafka	3.9	KRaft (no ZooKeeper), 1 partition	`linger.ms=5`, `batch.num.messages=1000`
RabbitMQ	3.13	Quorum queues, durable, manual ack	Per-message (no client-side batching)
NATS	2.11	JetStream, file storage, pull-subscribe	Per-message (no client-side batching)

All competitors use production-recommended settings, not development defaults. All brokers use native Rust client libraries (rdkafka, lapin, async-nats). Throughput scenarios use each broker’s recommended batching strategy for a fair comparison. Lifecycle scenarios are unbatched for all brokers.

Run make bench-competitive on your hardware to generate comparison tables.

Results

These are reference numbers from a single run. Your results will vary by hardware. All brokers run in Docker containers. Throughput uses each broker’s recommended batching; lifecycle is unbatched.

Throughput (messages/second, batched)

Payload	Fila	Kafka	RabbitMQ	NATS
64B	—	—	—	—
1KB	—	—	—	—
64KB	—	—	—	—

Previous unbatched results (Fila 2,637 msg/s vs Kafka 143,278 msg/s at 1KB) were an unfair comparison: Kafka used linger.ms=5 batching while Fila sent 1 message per RPC. The updated benchmark uses each broker’s recommended batching.

End-to-end latency (1KB payload)

Percentile	Fila	Kafka	RabbitMQ	NATS
p50	0.92 ms	101.62 ms	1.46 ms	0.29 ms
p95	2.82 ms	105.07 ms	3.32 ms	0.42 ms
p99	4.79 ms	105.30 ms	5.59 ms	0.79 ms

Lifecycle throughput (enqueue + consume + ack, 1KB, unbatched)

Broker	msg/s
NATS	25,763
Fila	2,724
RabbitMQ	658
Kafka	356

Multi-producer throughput (3 producers, 1KB)

Broker	msg/s
Kafka	186,708
NATS	150,676
RabbitMQ	63,660
Fila	6,769

Resource usage

Broker	CPU	Memory
NATS	1.3%	12 MB
Kafka	2.1%	1,276 MB
Fila	3.7%	874 MB
RabbitMQ	56.8%	654 MB

Methodology

Measurement parameters

Parameter	Value
Warmup period	1 second (discarded)
Measurement window	3 seconds
Latency samples	100 per level
Runs for CI regression	3 (median)
Competitive runs	1 (relative comparison)

Limitations

Single-node only. All brokers run as single instances. Clustering performance is not tested.
No network latency. Brokers run on localhost. Real deployments have network overhead.
Docker containers. All brokers run in Docker containers for a fair comparison.
Hardware-specific. Results will vary on different hardware. Always include hardware specs when citing numbers.

Reproducing results

Self-benchmarks:

# Build and run the full benchmark suite
cargo bench -p fila-bench --bench system

# Results written to crates/fila-bench/bench-results.json

Competitive benchmarks:

cd bench/competitive

# Run all brokers
make bench-competitive

# Or individual brokers
make bench-kafka
make bench-rabbitmq
make bench-nats
make bench-fila

# Clean up Docker containers
make bench-clean

See bench/competitive/METHODOLOGY.md for complete methodology documentation including broker configuration details and justifications.

CI regression detection

The bench-regression GitHub Actions workflow runs on every push to main and on pull requests:

Runs the self-benchmark suite 3 times, takes the median
On main pushes: saves results as the baseline
On PRs: compares against the baseline and flags regressions exceeding the threshold (default: 10%)
Results are uploaded as workflow artifacts for every run

Traceability

Results in this document are from commit 1e5bb0e (2026-03-26). Run cargo bench -p fila-bench --bench system to generate results for the current version. The JSON output includes the commit hash and timestamp for traceability.

Versioning & Compatibility Policy

Versioning

Fila uses semantic versioning (semver) for all releases:

MAJOR — Breaking proto/API changes. Existing SDK versions may not work.
MINOR — New features, backward compatible. Existing SDKs continue to work; new features require SDK update.
PATCH — Bug fixes only. No behavior changes.

Proto Backward Compatibility

The fila.v1 proto package follows strict additive-only rules within a MAJOR version:

New RPCs may be added
New fields may be added to existing messages
Existing fields are never removed, renamed, or retyped
Field numbers are never reused

A MAJOR version bump (e.g., v1 → v2) would introduce a new proto package (fila.v2) and may remove deprecated RPCs or fields.

Deprecation Policy

Features are deprecated with at least 1 MINOR version warning before removal
Deprecated RPCs and fields are documented in release notes and proto comments
Removal only occurs in a MAJOR version bump

Keyboard shortcuts

Fila Documentation