Model registry
What it does
Section titled “What it does”A single abstraction, ModelRegistryService, answers:
- “Given this role and this tenant, which model should I call?”
- “Which provider? Which key? Which base URL?”
- “Is the budget cap exhausted for this role this month?”
Every LLM call in the backend goes through resolveModel() →
createModelClient() → client.chat(). Nobody imports provider
SDKs directly.
Resolve flow
Section titled “Resolve flow”const { modelId, provider, apiKeyEnc, baseUrl } = await modelRegistry.resolveModel({ tenantId, role: "planner", preferredProvider: "anthropic", });Under the hood:
- Load the tenant’s
AgentRolerow for the given slug. - Check
monthlyBudgetTokens— if exhausted, return null (the caller falls back to a deterministic response). - Pick a provider by priority: explicit
preferredProvider→ tenant-configured default → first healthy AI Council member. - Pick a model: env-override (
MODEL_PLANNER_<PROVIDER>) → tenant-configured default → registry default for (provider, role). - Fetch the provider’s API key from the
ai_providerstable (decrypted on demand).
The ModelClient interface
Section titled “The ModelClient interface”Every provider client exposes:
interface ModelClient { chat(req: { model: string; systemPrompt?: string; messages: Message[]; tools?: ToolDef[]; temperature?: number; maxTokens?: number; }): Promise<{ content: string; toolCalls?: ToolCall[]; tokenUsage: { input: number; output: number; cacheHit?: number }; stopReason: string; }>;}Implementations in backend/src/services/agent-runtime/clients/:
anthropic-client.ts— native Anthropic SDK.openai-client.ts— OpenAI + Azure + custom OpenAI-compatible.google-client.ts— Gemini native SDK.custom-client.ts— for local Ollama / vLLM. Same shape as OpenAI.
Prompt caching
Section titled “Prompt caching”Anthropic’s prompt cache reduces cost on repeated system prompts. Our chief-of-staff prompt is stable across plans — caching it saves 80–90% of the input-token cost on the hot path.
Implementation:
CacheBreakpointtags in the system prompt at safe boundaries.- Cache key tied to the tenant (so one tenant’s cache can’t be read by another).
- Minimum 5-minute TTL per cache entry.
OpenAI and Gemini have their own caching models; we use them when available.
Tests: backend/src/services/model-registry/__tests__/prompt-caching.test.ts.
Per-role model overrides
Section titled “Per-role model overrides”Tenants tune model choice per role without restarting:
Settings → Roles → (pick role) → Model preference. Writes to
the AgentRole row’s preferredModel. The registry re-reads on
each call (caches for 30 s).
AI Council integration
Section titled “AI Council integration”The registry is aware of the AI Council:
resolveCouncil()returns an array ofModelClient+ modelId tuples, one per healthy provider.- Budget caps apply cumulatively across the Council members for the same role.
- Degraded-provider exclusion happens here; the Council caller never sees unhealthy providers.
Observability
Section titled “Observability”Every model call emits:
- Prometheus counter
wf0_ai_call_total{provider,model}. - Prometheus histogram
wf0_ai_call_duration_seconds{provider}. - Prometheus counter
wf0_ai_tokens_total{provider,direction,cache}. - One
api_usagerow in Postgres (with tenant / project / role attribution). - One structured log line at
info.
This means you can slice spend by role, by project, by provider at any time.
Failure handling
Section titled “Failure handling”- Network / 5xx — provider marked
degradedfor 60 s. Excluded from futureresolveModelcalls within that window. - 429 / rate-limited — same as degraded, with a logged reason.
- Invalid JSON / tool-call mismatch — not a client-layer problem; handled by the caller (chief-of-staff retries / structured re-ask).
Recovery probe runs every 60 s per degraded provider. On success,
degraded → healthy.
Swapping providers
Section titled “Swapping providers”When Anthropic ships a new Claude (Sonnet 4.7 say):
- Add the model id to
backend/src/services/model-registry/model-catalog.ts. - Optionally set
MODEL_PLANNER_ANTHROPIC=claude-sonnet-4-7in env for an immediate swap. - Restart. Tenants still on the old model continue unchanged until they update their preference.
Why this abstraction
Section titled “Why this abstraction”- BYOK tangle — without a registry, every service either repeats “which provider, which model, which key” or hardcodes one. Both are bad.
- Multi-provider routing — the AI Council can’t live without it.
- Cost caps — centralizing the gate means one place to enforce budgets.
- Swap model families cleanly — new SDKs drop in as new clients; no grep-and-replace.
Implementation pointers
Section titled “Implementation pointers”backend/src/services/model-registry/model-registry.service.ts— the service class.backend/src/services/model-registry/model-catalog.ts— model id → capabilities mapping.backend/src/services/agent-runtime/clients/*.ts— provider clients.backend/src/services/model-registry/__tests__/— 40+ tests covering edge cases (Azure, custom endpoints, rate-limit handling).