Ticket orchestration
The ticket lifecycle
Section titled “The ticket lifecycle”pending → in_progress → done ↘ failed → (retry | replan | escalate)Every plan step materialises as a Ticket row. Tickets transition
via the TicketService — a single abstraction that handles queue
writes, RLS scope, audit logging, and event broadcasting.
Ticket shape
Section titled “Ticket shape”id— stable.tenantId,projectId,goalId— scope.roleSlug— which queue consumer owns this.title,description— what to do.status— one ofpending | in_progress | done | failed.parentTicketId— up-link to the chief-of-staff ticket.dependsOn— upstream tickets that must finish first.payload— role-specific input (skills, subagent slug, plan id).claimedBy— the consumer currently working on it.result— role-specific output (diff, review text, test plan).error— populated onfailed.
The queue
Section titled “The queue”BullMQ (Redis-backed). One queue per roleSlug:
queue:chief_of_staffqueue:baqueue:architectqueue:devqueue:qaqueue:memory
Consumers are specialist agents — either server-side for roles that
talk to the LLM directly, or the AgentHub daemon for dev /
qa that code-gen locally.
Dispatching
Section titled “Dispatching”When a plan is approved:
ChiefOfStaffService.planTicket()writes N child ticket rows.- Each ticket’s
dependsOnlist is set from the plan’sstep.dependsOnfield. - Tickets with empty
dependsOnare enqueued immediately. The rest staypendingwithdependsOnpopulated. - As upstream tickets transition to
done, their downstream dependents get enqueued.
Retry vs replan vs escalate
Section titled “Retry vs replan vs escalate”On ticket.failed:
- Retry — if the failure looks transient (network, 5xx from
provider), the queue’s BullMQ retry policy handles it (3 attempts,
exponential backoff). Status stays
in_progressacross retries. - Replan — if the failure is structural (bad prompt, missing
context), the chief-of-staff subscribes to the event and creates a
new plan (attempt 2 or 3). Old plan + tickets transition to
superseded. - Escalate — same failure signature repeating → DM the exec with details. No further automatic action.
Parallelism
Section titled “Parallelism”Tickets in the same plan without dependsOn relationships run in
parallel. Max concurrency per role is set via queue concurrency
(BULLMQ_CONCURRENCY_<ROLE>=N, default 5).
Audit hooks
Section titled “Audit hooks”Every transition writes to audit_log:
ticket.createdticket.claimedticket.transitioned(with before/after status)ticket.failed(with error signature)ticket.replanned
Queryable from Activity page.
Legacy AgentTask table
Section titled “Legacy AgentTask table”Earlier versions had a separate AgentTask table that’s gradually
being retired (see
DEFERRED.md).
Tickets are the canonical rows; AgentTask mirrors are reverse-mirrored
for back-compat until BAAgentService is rewritten.
Do NOT introduce new callers that write AgentTask directly.
Priorities
Section titled “Priorities”Tickets can be given a priority (low / normal / high / urgent).
BullMQ’s priority queueing ensures urgent tickets cut the line.
Rarely used; most work is priority normal.
Timeouts
Section titled “Timeouts”Per-role timeouts via TICKET_TIMEOUT_<ROLE>_SECONDS:
chief_of_staff: 120 s (planner calls).ba,architect: 300 s (LLM-bound).dev,qa: 1800 s (30 min; AgentHub daemon does real work).memory: 60 s.
Timed-out tickets transition to failed with error = "timeout" and
are subject to the normal retry / replan path.
Querying
Section titled “Querying”# Tickets for this briefGET /api/tickets?parentTicketId=tk_…
# Tickets for this role, pendingGET /api/tickets?roleSlug=dev_agent&status=pending
# Activity feed (all ticket transitions)GET /api/activity?limit=50RLS scopes everything to the calling user’s tenant and active project.
Debugging a stuck ticket
Section titled “Debugging a stuck ticket”- Activity → filter by this ticket’s
parentTicketId. - Check
status.pendingwith nodependsOn→ queue consumer hasn’t claimed it. Is the consumer alive? in_progresswith oldclaimedAt→ consumer crashed mid-flight. BullMQ’s stalled-job recovery puts it back on the queue after 30 s; if it doesn’t, the ticket is truly orphaned — manually set topendingto re-dispatch.failedwith an error → read the error; usually obvious.