The single execution surface for every workflow contract
The workflow runner is executeWorkflowContract in src/lib/workflow_runner.ts. It's the one function every workflow on the platform passes through — bid_price_update, vendor_cost_update, ar_aging_action_plan, all 22 of them. Reads a row from workflow_definitions, walks 7 stages, writes audit rows to workflow_step_log, workflow_run_log, and (conditionally) reflexion_log, proposed_actions, and workflow_verify_results.
It shipped in R552 as the Tier 1 "every role becomes a workflow-driven agent" move. R560 was the hardening pass — closing five bug classes the Codex audit found, including the silent-swallow on stage_proposed_action INSERT failures (the Cal-Maine-class bug: workflow reports "ok" with zero NS writes). R564 documented the workflow.completed event intent (not yet wired). R571 added Play-animation runtime hooks. R576 tuned the active-state dwell timer for the UI.
Diagram: substrate-workflow-runner.html. The runner is 615 LOC; it's small because the heavy lifting is in the contracts themselves.
Four entrypoints
- HTTP API —
POST /api/workflow/executewith?preview=true|confirm=true. Direct invocation. Preview runs all stages but skips writes inexecuteFanOut. Used by ops console + manual triggers. - Event-driven —
drainEvents()reads new events from the ledger, matches againstevent_subscriptions, transforms payload viainput_mapperJSONPath, invokes the runner withinvoked_by: "event:<type>:<id>". KV concurrency lock prevents double-fire. - Chat-tool invocation —
execute_workflowchat tool (admin-gated, currently STUB). Most chat surfaces stage aworkflow_*proposed_action first and run on approval (HITL invariant per ADR-031). - Cron-driven — scheduled workflows (e.g.
monthly_margin_review,annual_price_roll) fire from the cron handler. The verify-check scheduler at45 5 * * *reads pendingworkflow_verify_resultsand runs verify SQL.
What every workflow goes through
-
01
loadContract(workflowType)
Single SELECT against
workflow_definitions WHERE workflow_type=?1 AND enabled=1. Returns the contract row: precondition rules, fan-out targets, verify checks, risk level, reflexion flag. Null result →status='failed'withworkflow_type_not_found. -
02
loadContext() — N parallel SELECTs
For each query in
context_to_load_json: resolve binds (R560 fix: ordered positional bind to?placeholders, not string-substitution), optionally skip ifq.if = "X present"evaluates false. Errors no longer silently swallowed — they push tocontextErrors[]which surfaces inresult.errors. -
03
checkPreconditions()
Walks
preconditions_jsonand evaluates each rule. Grammar:"X present","X is null","X >= N"(also< <= = != >).severity:'block'+ (fail OR unevaluable) →blocked=true, returnsstatus='aborted'.severity:'warn'→ push to warnings, continue. -
04
stageHitlProposal() — if risk_level ≥ 3
If
risk_level ≥ 3 AND !opts.hitl_approved: INSERT proposed_actions withaction_type='workflow_<type>', entity_type='workflow_run', entity_ref=run_id, status='pending', proposed_by='workflow_runner'. Returnstatus='pending_hitl'. Stages 5-7 wait for approval. The decide endpoint re-invokes the runner withopts.hitl_approved=true. -
05
executeFanOut() — per target
For each target in
fan_out_targets_json: evaluatet.if→ skip / continue. Dispatch by kind. REAL kinds:kv_invalidate,stage_proposed_action(R560: throws on INSERT failure). STUB kinds (R560 marks status='stub', NOT counted as executed):d1_write, ns_push, http_call, chat_tool, hitl_email_draft, flag, workflow_class_invoke, loop_*, dispatch_workflow. Per-step rows INSERTed intoworkflow_step_logon entry & exit.on_failure='abort'→ break. -
06
scheduleVerifyChecks()
Per check in
verify_checks_json: INSERTworkflow_verify_resultswithstatus='pending', expected_json=<window+expected+sql_check>, notes='scheduled by runner'. The verify scheduler at cron45 5 * * *picks these up after the configured window and runs the SQL. -
07
executePostActions()
Always: INSERT
workflow_run_log(run_id, status, duration_ms, errors_count, output_json). If reflexion_enabled=1: INSERTreflexion_logwithentity_type='workflow_run', tags='workflow:<type>,status:<status>'. Return status:completed | partial | failed | aborted | pending_hitl. Note:workflow.completedevent documented (R564) but NOT yet wired.
Mike approves Driscoll's $0.06 bump on B5875
Mike taps Approve on a bid_price_update proposed_action for SKU 10472, $1.42 → $1.48 on bid B5875. The runner fires.
Stage 1 loads the bid_price_update contract from workflow_definitions (~2ms). Stage 2 loads context: the bid row, the customer row, the item row, the current price (~80ms, 4 parallel SELECTs). Stage 3 checks preconditions: "bid_id present" — pass; "item_id present" — pass; "new_price > 0" — pass. Stage 4 sees risk_level=3 but Mike's approval came in as opts.hitl_approved=true — skip. Stage 5 fans out 7 targets: kv_invalidate (real, deletes HUB_CACHE keys) + 6 STUB pushes (ns_push, d1_write, etc. — marked status='stub', NOT counted). 14 rows in workflow_step_log. Stage 6 schedules 2 verify checks ("NS price matches after 5 min", "hub page reflects new price after 60s"). Stage 7 writes workflow_run_log with status='completed', executed=1, stubbed=6 and a reflexion_log entry.
Mike sees: ok with executed=1. He knows the 6 stubs mean the actual NS push runs through the legacy NS_PUSH_QUEUE path, not the runner's stub dispatcher. That's the migration runway — gradually wire each stub to its real implementation without changing contract definitions.
What the substrate enables
- New workflow = new row in
workflow_definitions, no new code path. - Every run leaves a per-step audit trail in
workflow_step_log. - R560 makes stubs visible — can't ship "fake green" runs anymore.
- Verify checks decouple "did it happen" from "did it happen and the downstream system shows it".
The five bug classes closed
| Bug class (pre-R560) | Fix |
|---|---|
| loadContext silently swallowed query errors | errors push to contextErrors[] → result.errors |
| checkPreconditions treated unevaluable as 'pass' | block-severity unevaluable now blocks; warn downgrades with marker |
stage_proposed_action INSERT used .catch(()=>null) — HITL bypass | try/catch → explicit throw; outer catch marks step 'failed' |
Stub kinds returned ok & incremented executed | marked status='stub', NOT counted — surfaces wiring gaps |
| Concurrent drainer firings could double-invoke runner | KV concurrency lock per drainer (TTL 300s) |
What can go wrong
loadContract returns null → status='failed' with workflow_type_not_found. Detection: ops console "failed" filter. Recovery: migration to add the contract row; re-trigger.
This is by design (R560). It tells Mike a contract is wired in shape but not in implementation. Detection: workflow_run_log.steps_executed=0 AND steps_total>0. Recovery: wire the stub kinds for that contract's fan-out targets.
R564 documented the intent; not yet wired in stage 7. Downstream consumers waiting for workflow.completed currently poll workflow_run_log instead. Tracked in punch list.
Stage 4 returns pending_hitl; stages 5-7 never fire until decide. Detection: workflow_run_log rows in pending_hitl > N hours. Recovery: ping Mike via recap.
Adjacent substrate
Code paths + invariants
| Concern | Where |
|---|---|
| Runner | src/lib/workflow_runner.ts (615 LOC) |
| Schema | migrations/schema/117_workflow_definitions_v2.sql, 118_workflow_contracts_v2.sql |
| run_log | migrations/schema/121_workflow_run_log.sql |
| reflexion_log | migrations/schema/122_reflexion_log.sql |
| HTTP entry | src/index.ts:14304 — POST /api/workflow/execute |
| Cron entry | src/index.ts:33255 — 45 5 * * * verify scheduler |
| Event drainer | src/lib/events.ts drainEvents() |
| R552 commit | substrate ship: workflow runner + 7-stage execution |
| R560 hardening | 5 bug classes closed, stub-vs-real surface |
| R564 event intent | workflow.completed (documented, not wired) |
| R571 Play animation | runtime hooks for diagram play-through |
| R576 dwell timer | active-state dwell tuning in UI |