Agents

Overview

The Agents domain is Backside’s in-process agent executor. You register one agent_config per (tenant, kind), attach your own Anthropic API key, and POST to /api/v1/agents/{kind}/runs to enqueue work. A worker daemon picks the run up via Postgres LISTEN/NOTIFY, drives the LLM ↔ tool loop, and records every step. See the Agents guide for the narrative overview. This page is the reference.

Data Model

Entities

Entity	Description
Agent Config	Per-`(tenant, kind)` configuration: system prompt overrides, tool set, budget, deadline, guardrail set, disable state. One row per kind.
Agent Run	One row per enqueue. Carries status, claim state, cost rollup, failure category.
Agent Step	Content-hashed step in a run — either an LLM turn or a tool call. Token usage and cost per step.
Agent Event	Structured observability row. Guardrail decisions, auto-disable flips, waiting transitions.

Agent kinds

Today, one kind is GA:

Kind	Purpose
`contact_deduper`	Scans contacts for likely duplicates and proposes merges.

New kinds ship as they graduate the internal catalog. Each kind declares its own default system prompt, default tool set, default budget, and default guardrail set.

Key Concepts

BYOK (Bring Your Own Key)

Every agent run authenticates to Anthropic with a key you provide. Backside encrypts it with your tenant’s DEK (AES-256-GCM) and loads it only at the worker boundary. Rotate with PUT /api/v1/agent-configs/{kind}/byok-key. Revoke with DELETE. There is no shared fallback key in production — if the tenant key is missing or revoked, runs fail with auth_failed. Probe a key with POST /api/v1/agent-configs/validate-key before wiring it into a config. The probe is a one-shot live call to Anthropic. The response is either {"status": "valid"} or {"status": "invalid", "category": ..., "message": ...} where category is one of invalid_key, revoked_or_expired, insufficient_permissions, quota_exhausted, provider_unavailable, or rate_limited. The first four are terminal (require operator action); the last two are transient and retry-safe.

Run state machine

queued → running ⇄ waiting
           │
           └──► succeeded | failed | cancelled

State	Who owns the row	Notes
`queued`	No worker	Freshly inserted by `enqueue_run`
`running`	A worker (holds lease)	`worker_id` + `lease_expires_at` set; renewed every 10s
`waiting`	No worker	An `approval_gate` guardrail paused the run. Lease cleared
`succeeded`	No worker (terminal)	Final output available; `cost_usd_cents` rolled up
`failed`	No worker (terminal)	`failure_category` tells you why
`cancelled`	No worker (terminal)	Operator or API caller killed it

Transitions are enforced at the database — the worker uses SQL UPDATE ... WHERE status = $old to ensure no two processes ever own the same run.

Failure categories

Every terminal failure carries a failure_category:

Category	Meaning
`auth_failed`	BYOK key was rejected by Anthropic
`tool_failed`	A tool call returned an unrecoverable error
`guardrail_blocked`	An enforced guardrail rule blocked a call
`config_error`	The config is malformed (missing tool, unknown kind, etc.)
`timeout`	The deadline elapsed before the run could finish
`budget_exhausted`	Rolled cost exceeded the configured budget cap

Guardrails

Every config carries a GuardrailSet — a list of rules enforced before every tool call. Five primitives:

allowlist / denylist — name-based tool gating
rate_limit — sliding window per-(run, tool) via Dragonfly
approval_gate — pauses the run to waiting on match
io_validation — JSON Schema check against tool input
quiet_hours — tenant-local time window block

Rules run in shadow (log only) or enforce (block) mode. A run with no guardrail set at all falls back to default-deny: no tools allowed. You can’t accidentally ship a brand-new config with mutating power.

Durable replay

Every step writes to agent_steps keyed by (run_id, seq) with a UNIQUE (tenant_id, run_id, content_hash) constraint. If a worker crashes mid-run, the next one replays every journaled step deterministically, skipping rather than re-executing. The hash is computed over canonical JSON of the step inputs, so replays are bit-identical. Large tool payloads spill out-of-line to agent_step_payloads — the main step row stays small, the payload table stores the bulk.

Cost tracking

Per-step cost is estimated from Anthropic’s published rates (input, output, cache-read, cache-write) and the model used. On terminal, the run’s cost_usd_cents is a sum over all steps. The agent_runs_billing_daily view rolls totals per (tenant_id, agent_kind) per day — use it for your own billing dashboards.

Auto-disable circuit breaker

If a (tenant, agent_kind) pair accumulates 10 failed runs in 48 hours, the worker stamps agent_configs.disabled_at = now() and writes a disable_reason. New POST /runs calls return 409 Conflict until an operator nulls both columns. There is no automatic recovery — the circuit stays open until human action re-closes it.

Required fields

Minimum config:

{
  "kind": "contact_deduper",
  "budget_usd_cents": 25,
  "deadline_secs": 300,
  "guardrails": [
    { "kind": "allowlist", "names": ["contacts_search", "contacts_merge"], "mode": "enforce" }
  ]
}

Minimum enqueue:

{ "input": { "max_candidates": 50 } }

Per-run overrides (budget, deadline) are optional and clip to the config defaults.

Multi-tenancy

Every agent table has RLS. Every worker query runs inside a transaction with SET LOCAL app.tenant_id = ... plus a belt-and-suspenders (run_id, tenant_id) check before acting. A bug in the tenancy layer cannot leak rows across tenants — the database enforces it.

Operational notes

Runs survive API and worker restarts. queued runs sit in the table until a worker wakes; running runs with expired leases get released to queued by the orphan recovery job
The worker exposes an internal health endpoint on :4500 for container healthchecks; it is not reachable over the public internet
Deployment is decoupled from the API — the worker is a separate Docker container (backside-agent-worker), and you can scale it horizontally without touching the API tier
Every run emits OpenTelemetry GenAI semconv spans: agent.run, gen_ai.chat, gen_ai.tool. Ship them to the observability stack of your choice

Limits

One GA kind (contact_deduper) as of April 2026 — more land as they graduate
Approval resume endpoint is not exposed in Phase 2. waiting runs need operator unblocking; the client-facing resume call lands in Phase 3
Mid-run budget action hooks are not exposed; the cap fires only at terminal rollup
Model routing is static per-run; automatic multi-model failover is a design-phase item

Getting Started

Concepts

Domains

Payments

Overview

Data Model

Entities

Agent kinds

Key Concepts

BYOK (Bring Your Own Key)

Run state machine

Failure categories

Guardrails

Durable replay

Cost tracking

Auto-disable circuit breaker

Required fields

Multi-tenancy

Operational notes

Limits

Getting Started

Concepts

Domains

Payments

​Overview

​Data Model

​Entities

​Agent kinds

​Key Concepts

​BYOK (Bring Your Own Key)

​Run state machine

​Failure categories

​Guardrails

​Durable replay

​Cost tracking

​Auto-disable circuit breaker

​Required fields

​Multi-tenancy

​Operational notes

​Limits

Overview

Data Model

Entities

Agent kinds

Key Concepts

BYOK (Bring Your Own Key)

Run state machine

Failure categories

Guardrails

Durable replay

Cost tracking

Auto-disable circuit breaker

Required fields

Multi-tenancy

Operational notes

Limits