Skip to main content

Overview

The Agents domain is Backside’s in-process agent executor. You register one agent_config per (tenant, kind), attach your own Anthropic API key, and POST to /api/v1/agents/{kind}/runs to enqueue work. A worker daemon picks the run up via Postgres LISTEN/NOTIFY, drives the LLM ↔ tool loop, and records every step. See the Agents guide for the narrative overview. This page is the reference.

Data Model

Entities

EntityDescription
Agent ConfigPer-(tenant, kind) configuration: system prompt overrides, tool set, budget, deadline, guardrail set, disable state. One row per kind.
Agent RunOne row per enqueue. Carries status, claim state, cost rollup, failure category.
Agent StepContent-hashed step in a run — either an LLM turn or a tool call. Token usage and cost per step.
Agent EventStructured observability row. Guardrail decisions, auto-disable flips, waiting transitions.

Agent kinds

Today, one kind is GA:
KindPurpose
contact_deduperScans contacts for likely duplicates and proposes merges.
New kinds ship as they graduate the internal catalog. Each kind declares its own default system prompt, default tool set, default budget, and default guardrail set.

Key Concepts

BYOK (Bring Your Own Key)

Every agent run authenticates to Anthropic with a key you provide. Backside encrypts it with your tenant’s DEK (AES-256-GCM) and loads it only at the worker boundary. Rotate with PUT /api/v1/agent-configs/{kind}/byok-key. Revoke with DELETE. There is no shared fallback key in production — if the tenant key is missing or revoked, runs fail with auth_failed. Probe a key with POST /api/v1/agent-configs/validate-key before wiring it into a config. The probe is a one-shot live call to Anthropic. The response is either {"status": "valid"} or {"status": "invalid", "category": ..., "message": ...} where category is one of invalid_key, revoked_or_expired, insufficient_permissions, quota_exhausted, provider_unavailable, or rate_limited. The first four are terminal (require operator action); the last two are transient and retry-safe.

Run state machine

queued → running ⇄ waiting

           └──► succeeded | failed | cancelled
StateWho owns the rowNotes
queuedNo workerFreshly inserted by enqueue_run
runningA worker (holds lease)worker_id + lease_expires_at set; renewed every 10s
waitingNo workerAn approval_gate guardrail paused the run. Lease cleared
succeededNo worker (terminal)Final output available; cost_usd_cents rolled up
failedNo worker (terminal)failure_category tells you why
cancelledNo worker (terminal)Operator or API caller killed it
Transitions are enforced at the database — the worker uses SQL UPDATE ... WHERE status = $old to ensure no two processes ever own the same run.

Failure categories

Every terminal failure carries a failure_category:
CategoryMeaning
auth_failedBYOK key was rejected by Anthropic
tool_failedA tool call returned an unrecoverable error
guardrail_blockedAn enforced guardrail rule blocked a call
config_errorThe config is malformed (missing tool, unknown kind, etc.)
timeoutThe deadline elapsed before the run could finish
budget_exhaustedRolled cost exceeded the configured budget cap

Guardrails

Every config carries a GuardrailSet — a list of rules enforced before every tool call. Five primitives:
  • allowlist / denylist — name-based tool gating
  • rate_limit — sliding window per-(run, tool) via Dragonfly
  • approval_gate — pauses the run to waiting on match
  • io_validation — JSON Schema check against tool input
  • quiet_hours — tenant-local time window block
Rules run in shadow (log only) or enforce (block) mode. A run with no guardrail set at all falls back to default-deny: no tools allowed. You can’t accidentally ship a brand-new config with mutating power.

Durable replay

Every step writes to agent_steps keyed by (run_id, seq) with a UNIQUE (tenant_id, run_id, content_hash) constraint. If a worker crashes mid-run, the next one replays every journaled step deterministically, skipping rather than re-executing. The hash is computed over canonical JSON of the step inputs, so replays are bit-identical. Large tool payloads spill out-of-line to agent_step_payloads — the main step row stays small, the payload table stores the bulk.

Cost tracking

Per-step cost is estimated from Anthropic’s published rates (input, output, cache-read, cache-write) and the model used. On terminal, the run’s cost_usd_cents is a sum over all steps. The agent_runs_billing_daily view rolls totals per (tenant_id, agent_kind) per day — use it for your own billing dashboards.

Auto-disable circuit breaker

If a (tenant, agent_kind) pair accumulates 10 failed runs in 48 hours, the worker stamps agent_configs.disabled_at = now() and writes a disable_reason. New POST /runs calls return 409 Conflict until an operator nulls both columns. There is no automatic recovery — the circuit stays open until human action re-closes it.

Required fields

Minimum config:
{
  "kind": "contact_deduper",
  "budget_usd_cents": 25,
  "deadline_secs": 300,
  "guardrails": [
    { "kind": "allowlist", "names": ["contacts_search", "contacts_merge"], "mode": "enforce" }
  ]
}
Minimum enqueue:
{ "input": { "max_candidates": 50 } }
Per-run overrides (budget, deadline) are optional and clip to the config defaults.

Multi-tenancy

Every agent table has RLS. Every worker query runs inside a transaction with SET LOCAL app.tenant_id = ... plus a belt-and-suspenders (run_id, tenant_id) check before acting. A bug in the tenancy layer cannot leak rows across tenants — the database enforces it.

Operational notes

  • Runs survive API and worker restarts. queued runs sit in the table until a worker wakes; running runs with expired leases get released to queued by the orphan recovery job
  • The worker exposes an internal health endpoint on :4500 for container healthchecks; it is not reachable over the public internet
  • Deployment is decoupled from the API — the worker is a separate Docker container (backside-agent-worker), and you can scale it horizontally without touching the API tier
  • Every run emits OpenTelemetry GenAI semconv spans: agent.run, gen_ai.chat, gen_ai.tool. Ship them to the observability stack of your choice

Limits

  • One GA kind (contact_deduper) as of April 2026 — more land as they graduate
  • Approval resume endpoint is not exposed in Phase 2. waiting runs need operator unblocking; the client-facing resume call lands in Phase 3
  • Mid-run budget action hooks are not exposed; the cap fires only at terminal rollup
  • Model routing is static per-run; automatic multi-model failover is a design-phase item