Benchmarking
This API is available since Fedify 2.3.0.
Fedify can run as a cooperative benchmark target by enabling benchmarkMode. This mode exposes local benchmark endpoints under /.well-known/fedify/bench/ and configures an in-process OpenTelemetry metrics reader so benchmark clients can collect server-side measurements without a separate metrics backend.
WARNING
Do not enable benchmarkMode in production. It is intended for benchmark targets that you control.
Enabling benchmark mode
Enable benchmarkMode when creating the Federation object. If you use the benchmark trigger endpoint, configure the sink inboxes on the server:
import { createFederation } from "@fedify/fedify";
const federation = createFederation<void>({
benchmarkMode: {
triggerSinks: ["https://sink.example/inbox"],
},
});When enabled, Fedify changes only benchmark-target defaults:
allowPrivateAddressdefaults totrue, unless a custom document loader factory is configured.signatureTimeWindowdefaults tofalse.- Explicit
allowPrivateAddressandsignatureTimeWindowvalues still win. - Inbox idempotency is unchanged. Benchmark clients that need repeated deliveries should mint unique activity IDs.
If you provide meterProvider together with benchmarkMode, Fedify throws a TypeError. OpenTelemetry metric readers have to be attached when a MeterProvider is constructed, so benchmark mode owns its in-process provider.
If the same application code sometimes runs with benchmark mode and sometimes runs with your normal OpenTelemetry pipeline, pass your application meterProvider only when benchmark mode is off:
import type { KvStore } from "@fedify/fedify";
import type { MeterProvider } from "@opentelemetry/api";
import { createFederation } from "@fedify/fedify";
const benchmarkEnabled = process.env.FEDIFY_BENCHMARK === "1";
const federation = createFederation<void>({
kv,
benchmarkMode: benchmarkEnabled
? { triggerSinks: ["https://sink.example/inbox"] }
: false,
meterProvider: benchmarkEnabled ? undefined : meterProvider,
});The fedify bench command
This command is available since Fedify 2.3.0.
Once a target runs in benchmark mode, the fedify bench command drives ActivityPub-specific load against it and reports latency, throughput, success rate, and errors. It acts as a synthetic remote actor: it generates keys, serves its own actor and key documents over loopback, and signs every inbox delivery with the same @fedify/fedify signer a real peer uses, so the measured crypto cost is real.
NOTE
This version runs the inbox, webfinger, actor, object, fanout, failure, and mixed scenario types. The collection scenario type is reserved by the suite format but is not executed yet. Within the runnable types, a few options the format accepts are also not implemented yet and are rejected up front with a clear message:
A scenario suite
A benchmark is described by a suite file in YAML (JSON works too, since YAML is a superset). The suite declares the target, shared defaults, the actors to sign as, and a list of scenarios, each with an optional expect block of pass/fail thresholds:
# yaml-language-server: $schema=https://json-schema.fedify.dev/bench/scenario-v2.json
version: 1
target: http://localhost:3000
defaults:
duration: 30s
warmup: 5s # excluded from results; also warms the key cache
load:
rate: 200/s # open-loop; or closed-loop with `concurrency: 50`
actors:
- count: 3
signatureStandards: [draft-cavage-http-signatures-12, ld-signatures]
scenarios:
- name: inbox-shared
type: inbox
recipient: "http://${{ target.host }}/users/alice"
inbox: shared
activity:
type: Create
object:
type: Note
content: { generate: lorem, size: 2KB }
expect:
successRate: ">= 99%"
latency.p95: "< 100ms"Run it against the target and read the terminal report:
fedify bench scenario.yamlThe # yaml-language-server: line gives editors autocomplete and validation against the published schema. Override the file's target with --target, choose the output with --format/--output, and inspect a run without sending anything with --dry-run. A dry run still probes the target's benchmark stats endpoint and resolves scenario discovery, such as WebFinger and actor inbox lookup, so the printed plan shows the concrete destinations a real run would use. It does not send benchmark load.
An inbox scenario's recipient may be a single value or a list. With a list, deliveries are rotated across the recipients (and across the synthetic actors signing them), modeling a server that receives from many peers into many local inboxes.
Scenario types
The runnable scenario types cover the main benchmark surfaces:
inbox: discovers recipient inboxes and sends signedCreate(Note)deliveries through the target's inbound ActivityPub path.webfinger: drives direct/.well-known/webfingerlookups on the target.actor: resolves actor URLs from the scenario recipients and fetches actor documents. Setauthenticated: trueto sign those GET requests.object: fetches object URLs fromsource. Setauthenticated: trueto sign those GET requests.fanout: posts to/.well-known/fedify/bench/triggerso the target callssendActivity()and drains its fanout/outbox queue to benchmark-owned sink inboxes. The command starts those sink inboxes locally. A non-loopback target therefore needs--advertise-hostunless the scenario setssinkBaseto a reachablehttp://host:port/URL. The target must either allow the generated sink inboxes throughtriggerSinksor run withallowUnsafeTriggerRecipientsin a controlled benchmark environment. UsesinkBasewhen you want those inboxes to be deterministic, for examplehttp://127.0.0.1:9090/inbox/0throughhttp://127.0.0.1:9090/inbox/4forfollowers: 5.fedify benchdoes not switch the target's queue backend; run the same suite against targets configured with the queue implementations you want to compare. Fanout triggers are serialized while the runner observes queue drain, so client latency includes time spent waiting for earlier fanout drains under high-rate or concurrent load. UsedeliveryThroughputandqueueDrainexpectations for delivery performance, and keep request latency expectations conservative for this scenario type.failure: records expected fault outcomes as successes. For this scenario type,successRatemeans “the expected failure was observed,” not “the HTTP request succeeded.” Theinvalid-signatureandmissing-actorfaults send malformed signed deliveries to a recipient inbox. Theremote-404,remote-410,slow-inbox, andnetwork-errorfaults post to the benchmark trigger endpoint withsender, so the target uses its normal outbound delivery path against controlled benchmark-owned sink inboxes. Likefanout, these remote failure faults need--advertise-hostfor a non-loopback target unlesssinkBasegives a reachable, fixed sink base URL that the target'striggerSinkscan preconfigure. Remote failure deliveries are also serialized while the runner waits for the target's queue to observe the expected failure or retry signal, so request latency can include earlier wait time when the configured load is concurrent or high-rate.mixed: runs referenced child scenarios concurrently, splitting themixedscenario's load by each entry'sweight. The referenced scenarios are named scenarios in the same suite and are still run as normal suite entries when listed. The mixed result merges client-side request, throughput, delivery throughput, latency, and error measurements; server-side metric snapshots are not merged across child runners.
Actors
You pick signature standards, not key algorithms; the key set is derived, because a Fedify actor is inherently multi-key. An actor uses exactly one HTTP request signature scheme, plus any document signature schemes:
| Standard | Layer | Algorithm |
|---|---|---|
draft-cavage-http-signatures-12 | HTTP request | RSA |
rfc9421 | HTTP request | RSA |
ld-signatures | document | RSA (RsaSignature2017) |
fep8b32 | document | Ed25519 (eddsa-jcs-2022) |
draft-cavage-http-signatures-12 and rfc9421 are mutually exclusive (one HTTP scheme per actor). Several actor groups with different standard sets model a heterogeneous fleet, which is what a server actually receives.
Templating
Values support GitHub-Actions-style ${{ … }} templating, kept logic-less (references and whitelisted helper calls only). For example ${{ target.host }} expands to the target's host. Generated payloads use typed directives such as content: { generate: lorem, size: 2KB } rather than string templates. The tool owns actor URLs and activity ids, so each request gets a unique activity id automatically (which Fedify's always-on inbox idempotency requires).
Load generation and signing
Open-loop (rate) is the default and the realistic model for incoming federation traffic: requests are launched on schedule regardless of when earlier responses return, and each request's latency is measured from its scheduled time (the coordinated-omission correction), so a stalled target shows up as latency instead of being hidden. Closed-loop (concurrency) runs a fixed number of virtual users. Arrival is constant (default) or poisson, and maxInFlight caps concurrent in-flight requests.
Signing is kept off the send critical path, set per scenario with signing:
pipeline(default): background signers keep a bounded buffer filled, and buffer starvation surfaces the client as the bottleneck.jit: sign in the send path, for a strict signature-time-window target.presign: pre-sign an estimated open-loop run before the timed window (open-loop only; Poisson arrivals may still sign a few extra during the run).
Repeated runs
Each scenario runs three times by default. Set runs in defaults to change the whole suite, or set runs on one scenario to override the default for that scenario:
defaults:
runs: 5
scenarios:
- name: ci-smoke
type: webfinger
runs: 1
recipient: acct:alice@localhostRepeated runs are aggregated for stable CI gates. Latency and throughput metrics use the median run, request totals and error buckets are summed, queue depth uses the worst observed maximum, and successRate uses the worst run so one bad run is not hidden by clean neighbors. The JSON report records runCount for every scenario and includes per-run measurements in runs when the scenario ran more than once.
Output
Choose the format with --format text (default), json, or markdown; --output only chooses the destination (a file instead of standard output) and does not infer the format, so pass both (for example --format json --output report.json). JSON is the canonical machine form: it validates against the report schema and carries its own $schema; the text and Markdown renderers derive from the same model, keeping client-measured and server-reported numbers distinct. Both sides are scoped to a measured window: client latency excludes warm-up samples, and the server-reported numbers are the difference between a stats snapshot taken when the measured window opens and one taken when it closes, so they exclude every earlier scenario in the suite and the scenario's own warm-up traffic (apart from warm-up requests still in flight at the boundary, a residue no larger than the number of requests in flight at that moment). In GitHub Actions, append the Markdown report to the job summary:
fedify bench scenario.yaml --format markdown >> "$GITHUB_STEP_SUMMARY"An expect gate that fails exits the command non-zero, so a suite doubles as a CI check. Keep CI gates on robust signals such as success rate, error counts, and gross throughput or latency floors; precise latency-percentile regression belongs in a controlled environment, not a shared CI runner.
Comparing two revisions
Use fedify bench compare when a CI job should compare a change against a base revision on the same runner instead of relying on an absolute threshold:
fedify bench compare \
--base origin/main \
--head HEAD \
--file scenario.yaml \
--start-command "pnpm dev" \
--ready-url http://127.0.0.1:3000/health \
--max-regression 15%The command creates temporary detached worktrees for the base and head refs, starts the target command inside each worktree, waits for --ready-url, then runs the same suite from the current checkout against that target. The two targets run sequentially, so they can use the same port. Dependencies are not installed automatically; either prepare both refs in the job before comparing or make --start-command perform the needed build/start steps.
If --target is omitted, the benchmark target defaults to the origin of --ready-url. Pass --target when readiness and benchmark traffic use different URLs. The comparison report can be written as text, JSON, or Markdown with the same --format and --output options; JSON validates against the comparison report schema.
--max-regression accepts either a ratio such as 0.15 or a percentage such as 15%. For each scenario, fedify bench compare compares performance metrics from the scenario's expect block when they are latency or rate metrics; if no such metric is present, it compares latency.p95 and throughputPerSec. A head result passes when the measured regression is within --max-regression plus the observed per-run noise band. The command exits with status 1 when the head run fails its own expect gate or a comparison exceeds that allowance; configuration and orchestration failures exit with status 2.
Use short, broad suites in shared CI:
defaults:
runs: 3
duration: 20s
warmup: 5s
scenarios:
- name: inbox-ci
type: inbox
# ...
expect:
successRate: ">= 99%"
latency.p95: "< 500ms"Use a controlled performance runner for narrower regression checks:
defaults:
runs: 7
duration: 2m
warmup: 20s
scenarios:
- name: inbox-lab
type: inbox
# ...
expect:
successRate: ">= 99.9%"
latency.p95: "< 120ms"
throughputPerSec: "> 250/s"Safety
fedify bench runs without friction against a loopback or private target, or any target that advertises benchmark mode. Hostnames are classified from their resolved addresses when possible, and DNS failures are treated as public so the gate stays conservative. A public target that does not advertise benchmark mode is refused unless you pass --allow-unsafe-target, which is mandatory (never prompted) in CI and any non-interactive context.
The unsafe override is deliberately narrow. It must be paired with an explicit --target on the command line, and every scenario must set its load (rate or concurrency) and duration explicitly, either in the scenario or in suite defaults. This prevents a public run from falling back to built-in defaults by accident.
Use --dry-run as the first step against an unfamiliar target. It performs the benchmark-mode probe and discovery requests needed to print the planned WebFinger resources and inbox destinations, but it does not send signed inbox deliveries or other benchmark load.
Local targets over HTTP
An inbox recipient given as an acct: handle is resolved through WebFinger, which goes over HTTPS, so against a plain-HTTP loopback target give the recipient as the actor's URI (for example http://localhost:3000/users/alice) instead. The webfinger scenario is unaffected: it requests /.well-known/webfinger on the target directly, so it can benchmark acct: lookups over plain HTTP.
Signed scenarios such as inbox make the target dereference the benchmark's synthetic actor server while verifying signatures, so that server must be reachable from the target. A loopback target reaches it automatically (both run on the same machine). For a non-loopback target, pass --advertise-host with an address the target can reach (for example the client's LAN IP); the synthetic server then binds every interface and advertises that host in the actor and key URLs. Without it, a non-loopback signed scenario is refused (use a read scenario such as webfinger, which needs no synthetic server).
Benchmark stats endpoint
GET /.well-known/fedify/bench/stats returns a versioned JSON snapshot of the server-side metrics collected by the benchmark mode reader:
{
"version": 1,
"source": "server",
"generatedAt": "2026-06-02T00:00:00.000Z",
"scopeMetrics": [],
"errors": []
}The scopeMetrics field contains serialized OpenTelemetry scope metrics. Observable queue depth is included when configured queues implement MessageQueue.getDepth().
Benchmark trigger endpoint
POST /.well-known/fedify/bench/trigger asks the target application to call Context.sendActivity() with an explicit sender, recipients, and activity. This exercises the target's normal outbox and queue path.
The request body has this shape:
{
"sender": { "identifier": "alice" },
"recipients": [
{
"@context": "https://www.w3.org/ns/activitystreams",
"type": "Service",
"id": "https://sink.example/actors/bob",
"inbox": "https://sink.example/inbox"
}
],
"activity": {
"@context": "https://www.w3.org/ns/activitystreams",
"type": "Create",
"id": "https://example.com/activities/bench-1",
"actor": "https://example.com/users/alice",
"object": {
"type": "Note",
"id": "https://example.com/notes/bench-1",
"content": "benchmark"
}
}
}The sender must be either { "identifier": string } or { "username": string }. Recipients are parsed as ActivityPub actors and must have id and inbox properties. The activity is parsed as an ActivityPub Activity.
By default, every recipient inbox must appear in the server-configured triggerSinks list. This keeps benchmark traffic pointed at benchmark sink inboxes and prevents callers from choosing their own allowlist. To bypass this guard for a controlled run, set allowUnsafeTriggerRecipients to true in the application configuration.
For fanout and remote failure scenarios, set a sinkBase value such as http://host:port/ in the scenario when the target keeps the safe default and you need stable sink URLs for triggerSinks. With followers: 5, the runner generates /inbox/0 through /inbox/4 under that base.
A successful trigger returns 202 Accepted:
{
"version": 1,
"activityId": "https://example.com/activities/bench-1",
"queueCorrelationId": "https://example.com/activities/bench-1",
"recipientCount": 1,
"inboxCount": 1
}The queueCorrelationId is the activity ID preserved on the queued fanout or outbox work.
Metrics
Benchmark mode uses the same Fedify metrics documented in OpenTelemetry, including queue task metrics, queue depth, HTTP server metrics, and signature verification histograms. The benchmark endpoints themselves are classified as fedify.endpoint=benchmark in fedify.http.server.request.* metrics.