Skip to content

Project Graph

A project graph is a directed graph whose nodes are files / classes / interfaces / functions / methods and whose edges are contains, imports, calls, extends, implements.

It gives the planner structural context about the codebase it’s operating on without us shipping the full repo to the LLM.

Credit: the overall shape, the EXTRACTED vs INFERRED edge confidence tag, and the god-nodes concept come from safishamsi/graphify. The Workforce0 implementation is a native TypeScript rewrite with zero Python runtime dependency.

  • TypeScript / JavaScript — full TS compiler API. Files, classes, interfaces, functions, methods, enums, type aliases, imports, call expressions, extends / implements.
  • Python — regex-based. Files, classes, functions, methods (one level), from X import Y, bare-name call expressions. Nested classes are deliberately skipped.
  • Go, Rust, Java. Each can follow the Python pattern; first-class tree-sitter or language-native parsers for higher precision.

Nodes with the highest degree (incoming + outgoing edges) — the symbols with the most dependencies in and out of them. These are where most changes ripple through the repo.

Accessed via:

  • UI: Code Graph page.
  • API: GET /api/project-graph/:projectId/god-nodes?limit=10.

The top-5 god-nodes are piped into every chief-of-staff plan prompt as “Project landmarks.” The planner uses them to decompose around core abstractions rather than inventing new ones.

Every edge carries a confidence tag:

  • EXTRACTED — direct evidence from AST parsing. A contains edge between a file and a top-level class is EXTRACTED.
  • INFERRED — heuristic. A calls edge from foo() to bar() based on bare-name matching is INFERRED, because we don’t do full symbol resolution.

The UI renders this visually; the planner prompt uses EXTRACTED edges for harder claims.

Louvain community detection runs on the graph after build. Nodes are assigned to communities; tightly-coupled symbols cluster together.

The UI’s Community lookup in Code Graph shows you every symbol in the same cluster as the one you entered.

  1. Code Graph → Build graph.
  2. Paste a filesystem path on the server (/srv/repos/acme-app) and a repo label (acme/app).
  3. Build.

First build of ~10k files takes ~10 seconds. Incremental builds (contentHash match) are no-ops.

Terminal window
curl -X POST https://workforce0/api/project-graph/<projectId>/build \
-H "X-Project-Id: <projectId>" \
-H "Content-Type: application/json" \
-d '{ "repoPath": "/srv/repos/acme-app", "repoLabel": "acme/app" }'

After the first build, pushes to the default branch auto-refresh the graph — see GitHub integration.

Six REST endpoints:

EndpointPurpose
GET /api/project-graph/:projectIdSummary (counts, languages, timestamp)
GET /api/project-graph/:projectId/god-nodes?limit=NTop-N by degree
GET /api/project-graph/:projectId/callers/:symbolSymbols that call the target
GET /api/project-graph/:projectId/path?from=X&to=YShortest path in the dep graph
GET /api/project-graph/:projectId/community/:symbolAll symbols in the same Louvain community
POST /api/project-graph/:projectId/buildBuild / refresh

Each ProjectGraph row stores:

  • contentHash (SHA256 of the input file set). Incremental builds check this first.
  • graphJson — serialized full graph (nodes, edges, communities).
  • Denormalised counts (nodeCount, edgeCount, communityCount) for fast dashboard reads.
  • godNodeSlugs — cached top-N for the planner prompt.
  • languages — extractor coverage, for the audit UI (“TS + Python”).
  • repoPath — filesystem path for auto-refresh.
  • repoLabel — GitHub owner/name, for webhook matching.

ExecutionPlan rows snapshot the contentHash from the graph that fed the prompt (graphContentHash). When the graph rebuilds, older plans get flagged graphStale in the library UI. Stale plans may reference renamed / removed symbols.

See Graph-staleness badge.

Approved briefs (PRDs) are scanned for code-symbol mentions (“we’ll change the TaskRepository”) and links are persisted to PRDSymbolLink. The UI can then jump from brief text to the symbol’s file + line.

  • Extraction rate: ~5k files / second for TS (TS compiler dominates); ~20k files / second for Python (regex-only).
  • Louvain community detection: linear-ish in edge count.
  • Memory: ~1 GB transient during extraction of a 100k-file repo.
  • Regex-based Python misses nested classes, dynamic class creation, metaclasses.
  • No full symbol resolutioncalls edges are by bare name. Ambiguity (two save() methods in different classes) resolves to all matches; the planner tolerates this.
  • No cross-repo. Each graph is one repo. Multi-repo projects aggregate god-nodes at read time.

See DEFERRED.md for the language extractor backlog.