Cost caps
Why caps matter
Section titled “Why caps matter”Every AI-driven product has a “stuck in a loop” failure mode. Ours is
the replan cycle: a plan fails → replan → fails again → replan again →
burns tokens on every attempt. Workforce0 mitigates this with an
attempt cap (PLAN_ATTEMPT_CAP=3 by default), but caps at the
provider-cost layer are the defense-in-depth.
Cap variables
Section titled “Cap variables”| Var | Applies to |
|---|---|
PLANNER_MONTHLY_BUDGET_TOKENS | chief-of-staff plan + critique + revise |
AGENT_MONTHLY_BUDGET_TOKENS | each specialist (BA, architect, dev, QA) |
VOICE_MONTHLY_MINUTES | Twilio + Gemini Live voice minutes |
TRANSCRIPTION_MONTHLY_MINUTES | Whisper calls |
All are soft defaults: set them in .env or override per-role in
the Settings → Roles → Budget panel.
What happens when you hit a cap
Section titled “What happens when you hit a cap”The budget gate lives in LLMPlanner.budgetGate and runs before each
LLM call. When the month-to-date spend exceeds the cap:
- Planner: falls back to a deterministic single-step plan. Every brief becomes “Step 1: review and decide.” The exec is informed via Slack with a Budget exceeded badge.
- Specialists: refuse to claim new tickets; existing tickets finish but new ones stay in the queue. Team gets a Slack alert.
- Voice: dial-in answers with “we’ve hit this month’s quota; please upload a recording instead.”
No silent cost blow-outs. Your wallet survives the accident.
Setting caps
Section titled “Setting caps”# Conservative — single team, ~$50/moPLANNER_MONTHLY_BUDGET_TOKENS=500000AGENT_MONTHLY_BUDGET_TOKENS=2000000VOICE_MONTHLY_MINUTES=300
# Generous — power user, ~$200/moPLANNER_MONTHLY_BUDGET_TOKENS=2000000AGENT_MONTHLY_BUDGET_TOKENS=10000000VOICE_MONTHLY_MINUTES=1000
# No cap (you know what you're doing)# Unset any of the above.Reset on the 1st of each month automatically.
Per-role caps (finer grain)
Section titled “Per-role caps (finer grain)”For more control, cap individual roles:
Settings → Roles → (pick role) → Budget. Each role has its own
monthlyBudgetTokens. A brief that hits the dev_agent cap doesn’t
prevent ba_agent from running on other briefs.
Monitoring spend
Section titled “Monitoring spend”Three places:
- Provider dashboards — Anthropic / OpenAI / Google all show current-month spend. Set billing alerts in their UIs too.
- Workforce0 analytics — Analytics → AI spend. Same data, per-project and per-role rollup.
- Prometheus metric —
wf0_ai_tokens_total/wf0_ai_call_total.
Spend anomalies — what to look for
Section titled “Spend anomalies — what to look for”- Sudden spike in
claude-sonnet-4-6tokens. Often a replan loop on a specific brief. Audit in the Activity log. - Sustained high
wf0_critique_scorefailures. Critique is rejecting draft plans too often → more revisions → more tokens. Tune the planner prompt or switch model. - Voice minutes climbing without meetings. Possible inbound spam on the Twilio number. Add an allowlist.
Cost guard patterns
Section titled “Cost guard patterns”Pattern 1: “charge the exec once per brief”
Section titled “Pattern 1: “charge the exec once per brief””Set PLANNER_MONTHLY_BUDGET_TOKENS sized for the expected brief
volume × a 20% buffer. Caps kick in when volume jumps unexpectedly.
Pattern 2: “unlimited exploration, capped exploitation”
Section titled “Pattern 2: “unlimited exploration, capped exploitation””No cap on the planner, but a low AGENT_MONTHLY_BUDGET_TOKENS. Execs
can draft ten wild briefs; the team only ships what fits the cap.
Pattern 3: “hard stop”
Section titled “Pattern 3: “hard stop””Set all caps. Workforce0 falls back to deterministic mode when caps hit. Best for demos / regulated environments where overspend is worse than degraded UX.
Provider-side guardrails
Section titled “Provider-side guardrails”In addition to Workforce0 caps, set provider-side budgets:
- Anthropic: console.anthropic.com/settings/limits.
- OpenAI: platform.openai.com/account/billing/limits.
- Google: console.cloud.google.com/billing/…/budgets.
Provider budgets hard-stop at the API; they’re your last line of defense.
Debugging “I hit my cap unexpectedly”
Section titled “Debugging “I hit my cap unexpectedly””- Check Analytics → AI spend → by role for the month. Find the role with the unexpected burn.
- Click the role to see per-ticket attribution.
- Find the ticket that consumed most tokens.
- Look at its
ExecutionPlan.attemptfield. Values of 2 or 3 indicate replans — probably the culprit. - Read the failure messages. Fix the underlying issue (often a bad prompt or a missing integration); lifetime token spend stabilizes.
Zero-cost modes
Section titled “Zero-cost modes”If you’re set up entirely on:
- Gemini free tier (planner + specialists), AND
- Local models (specialists), AND
- Free-tier Twilio trial (voice)
…you can run Workforce0 at zero dollars. The free Gemini tier’s 1,500 req/day is enough for a small team.
The moment you cross a free-tier line, a cap is the friend that saves you from the unexpected invoice.