Project Graph
The idea
Section titled “The idea”A project graph is a directed graph whose nodes are files /
classes / interfaces / functions / methods and whose edges are
contains, imports, calls, extends, implements.
It gives the planner structural context about the codebase it’s operating on without us shipping the full repo to the LLM.
Credit: the overall shape, the EXTRACTED vs INFERRED edge
confidence tag, and the god-nodes concept come from
safishamsi/graphify. The
Workforce0 implementation is a native TypeScript rewrite with zero
Python runtime dependency.
What it extracts
Section titled “What it extracts”Supported today
Section titled “Supported today”- TypeScript / JavaScript — full TS compiler API. Files, classes, interfaces, functions, methods, enums, type aliases, imports, call expressions, extends / implements.
- Python — regex-based. Files, classes, functions, methods
(one level),
from X import Y, bare-name call expressions. Nested classes are deliberately skipped.
Not yet supported
Section titled “Not yet supported”- Go, Rust, Java. Each can follow the Python pattern; first-class tree-sitter or language-native parsers for higher precision.
Key concepts
Section titled “Key concepts”God-nodes
Section titled “God-nodes”Nodes with the highest degree (incoming + outgoing edges) — the symbols with the most dependencies in and out of them. These are where most changes ripple through the repo.
Accessed via:
- UI: Code Graph page.
- API:
GET /api/project-graph/:projectId/god-nodes?limit=10.
The top-5 god-nodes are piped into every chief-of-staff plan prompt as “Project landmarks.” The planner uses them to decompose around core abstractions rather than inventing new ones.
EXTRACTED vs INFERRED
Section titled “EXTRACTED vs INFERRED”Every edge carries a confidence tag:
EXTRACTED— direct evidence from AST parsing. Acontainsedge between a file and a top-level class isEXTRACTED.INFERRED— heuristic. Acallsedge fromfoo()tobar()based on bare-name matching isINFERRED, because we don’t do full symbol resolution.
The UI renders this visually; the planner prompt uses EXTRACTED
edges for harder claims.
Communities
Section titled “Communities”Louvain community detection runs on the graph after build. Nodes are assigned to communities; tightly-coupled symbols cluster together.
The UI’s Community lookup in Code Graph shows you every symbol in the same cluster as the one you entered.
Building a graph
Section titled “Building a graph”From the UI
Section titled “From the UI”- Code Graph → Build graph.
- Paste a filesystem path on the server (
/srv/repos/acme-app) and a repo label (acme/app). - Build.
First build of ~10k files takes ~10 seconds. Incremental builds
(contentHash match) are no-ops.
From the API
Section titled “From the API”curl -X POST https://workforce0/api/project-graph/<projectId>/build \ -H "X-Project-Id: <projectId>" \ -H "Content-Type: application/json" \ -d '{ "repoPath": "/srv/repos/acme-app", "repoLabel": "acme/app" }'Auto-refresh
Section titled “Auto-refresh”After the first build, pushes to the default branch auto-refresh the graph — see GitHub integration.
Querying the graph
Section titled “Querying the graph”Six REST endpoints:
| Endpoint | Purpose |
|---|---|
GET /api/project-graph/:projectId | Summary (counts, languages, timestamp) |
GET /api/project-graph/:projectId/god-nodes?limit=N | Top-N by degree |
GET /api/project-graph/:projectId/callers/:symbol | Symbols that call the target |
GET /api/project-graph/:projectId/path?from=X&to=Y | Shortest path in the dep graph |
GET /api/project-graph/:projectId/community/:symbol | All symbols in the same Louvain community |
POST /api/project-graph/:projectId/build | Build / refresh |
Storage
Section titled “Storage”Each ProjectGraph row stores:
contentHash(SHA256 of the input file set). Incremental builds check this first.graphJson— serialized full graph (nodes, edges, communities).- Denormalised counts (
nodeCount,edgeCount,communityCount) for fast dashboard reads. godNodeSlugs— cached top-N for the planner prompt.languages— extractor coverage, for the audit UI (“TS + Python”).repoPath— filesystem path for auto-refresh.repoLabel— GitHubowner/name, for webhook matching.
Graph staleness
Section titled “Graph staleness”ExecutionPlan rows snapshot the contentHash from the graph that
fed the prompt (graphContentHash). When the graph rebuilds, older
plans get flagged graphStale in the library UI. Stale plans may
reference renamed / removed symbols.
PRD ↔ code cross-links
Section titled “PRD ↔ code cross-links”Approved briefs (PRDs) are scanned for code-symbol mentions (“we’ll
change the TaskRepository”) and links are persisted to
PRDSymbolLink. The UI can then jump from brief text to the symbol’s
file + line.
Performance
Section titled “Performance”- Extraction rate: ~5k files / second for TS (TS compiler dominates); ~20k files / second for Python (regex-only).
- Louvain community detection: linear-ish in edge count.
- Memory: ~1 GB transient during extraction of a 100k-file repo.
Limitations (honest)
Section titled “Limitations (honest)”- Regex-based Python misses nested classes, dynamic class creation, metaclasses.
- No full symbol resolution —
callsedges are by bare name. Ambiguity (twosave()methods in different classes) resolves to all matches; the planner tolerates this. - No cross-repo. Each graph is one repo. Multi-repo projects aggregate god-nodes at read time.
See DEFERRED.md for the language extractor backlog.