HITL approval flow · wiki guide

What this is

Mike is always the loop step

HITL means human-in-the-loop. Every AI-proposed action that would write to NetSuite, send a customer email, place an order hold, or change business state must first land in proposed_actions and wait for Mike's approval. This is the load-bearing invariant of the platform — encoded in ADR-031 and surfaced via /proposed-actions.html.

The reason is not paranoia about AI. The reason is that GFS runs on Mike's judgment. A 22% gross margin floor is a Mike rule. Which customers get softer collections treatment is a Mike rule. When to push back on a vendor versus eat a 5-cent cost increase is a Mike rule. The platform exists to amplify Mike's throughput — not to replace his judgment with a model's confidence interval.

Practically: every chat tool that mutates state writes a row to proposed_actions with a structured preview of what would change, a calculated risk_level, and a list of cascade targets. The work doesn't happen until Mike approves.

When it engages

What requires approval

Any push to NetSuite — price updates, customer record changes, vendor updates, item changes.
Any outbound customer email — quotes, dunning letters, statements, notifications.
Order holds and credit changes — anything that affects a customer's ability to transact.
Vendor cost commits — they cascade too far to auto-apply.
Spec deviation acceptances — these affect bid eligibility.

What does not require approval: read-only chat queries, internal D1 syncs from NS, sync reconciliation, log writes, event emission, draft generation (the draft itself is the proposed action — sending it is the gated step).

Risk model

L1 through L5 + cumulative cap

Tier	What it means	UX treatment	Bulk-decide allowed?
L1	Trivial reversible (draft email created)	Green band, single-tap	Yes — unlimited
L2	Low-impact reversible (quote line edit)	Blue band, single-tap	Yes — up to 20
L3	Medium (single price change < $1K impact)	Amber band, confirm dialog	Yes — up to 10
L4	High (vendor cost cascade, large quote)	Orange band, typed-confirmation	No — single only
L5	Critical (credit hold lift, NS record delete)	Red band, typed-confirmation + X-Edit-Token	No — single only

Cumulative cap

Even when bulk-decide is allowed, the sum of approval impact must stay under CUMULATIVE_CAP_USD (currently $50K). If a single bulk-decide tap would exceed it, the UI breaks the batch into chunks and requires a second tap for each chunk. This is the guardrail against catastrophic fat-finger events.

The /proposed-actions.html UX

How Mike approves things

The page polls GET /api/proposed-actions?status=pending every 10 seconds. Pending rows render as cards stacked by risk tier — L5 at top, L1 at bottom. Each card shows:

Header — workflow type, customer/vendor/entity name, time created, risk badge.
Diff preview — current value vs proposed value, side-by-side for prices; full email body for outbound.
Cascade targets — list of tables and external systems that would receive the change.
Edit link — for emails and quotes, Mike can edit the draft inline before approving.
Buttons — Approve, Reject, Defer (writes a note), Edit (opens inline editor).

A floating "select all in tier" control lets Mike bulk-approve same-tier rows when appropriate. Keyboard shortcuts: A approve, R reject, D defer, E edit, ↑/↓ navigate, Cmd+Enter confirm typed-confirmation modal.

Worked example

Single-decide vs bulk-decide

Scenario

Monday 8am. Mike opens /proposed-actions.html. There are 23 pending rows: 14 are L1 (overnight email drafts from the inbound triage), 6 are L2 (quote line adjustments from the weekend draft_quote runs), 2 are L3 (vendor cost changes from Bongards and Driscoll), 1 is L5 (a request to lift Cardinal's credit hold).

Mike taps "select all L1", reviews the 14 email subjects in a single condensed list, taps Approve — they fire as a batch. Then "select all L2", same drill. Then the two L3 rows individually because they're cost cascades — he scans the impact reports, approves Bongards, defers Driscoll pending a phone call. The L5 row he treats deliberately: types CONFIRM, enters X-Edit-Token, approves. Total time: 6 minutes. Pre-platform, this morning was 90 minutes.

Step-by-step what happens

From propose to commit

01

Chat tool writes the proposed action

Any chat tool that mutates state calls the shared writeProposedAction helper rather than the mutator directly. The helper inserts a row with action_type, payload_json (full preview), proposer (tool name), entity refs, and a calculated risk_level via the BEFORE INSERT trigger from 114_risk_level_trigger.sql.

Writes proposed_actions (status='pending')

Time ~100ms
02

UI polls + renders

/proposed-actions.html polls every 10s and renders pending rows. New rows fade in with a tufts-bordered ring for the first 30 seconds so Mike spots fresh items.

Reads GET /api/proposed-actions?status=pending
03

Mike reviews + decides

Single-decide or bulk-decide. For L4/L5, a typed-confirmation modal prevents accidental approval. X-Edit-Token (from Mike's session) must accompany L5 approvals.

UI /proposed-actions.html

Time seconds to minutes
04

Atomic claim (R560)

The approve handler does UPDATE proposed_actions SET status='approved', claimed_by=?, claimed_at=? WHERE id=? AND status='pending'. The AND status='pending' is the load-bearing clause — it prevents the double-approve race that hit us in R559 when Mike fat-fingered bulk-decide.

Writes proposed_actions.status='approved'

Invariant R560 atomic claim
05

Cascade executes

The originating workflow's on_approval handler fires: NS push enqueued, D1 writes, KV invalidations, R2 artifact regen, event emission. The handler is idempotent on proposed_action_id — if it runs twice, the second run is a no-op.

Writes per the workflow's cascade_tables_json
06

Audit + event

Approval is logged to approval_audit with full payload. An event hitl.approved (or hitl.rejected, hitl.deferred) is emitted with proposed_action_id. Reflexion picks it up to evaluate the quality of the AI's proposal.

Writes approval_audit, events

Emits hitl.approved

Outcomes

What the substrate guarantees

Audit coverage

100%

every write traces to an approval

Race safety

Atomic

R560 claim

Reversibility

Defer ≠ reject

three-way decision

Reflexion

Self-improving

rejections improve future proposals

Every state-changing action has a traceable approval row with timestamp, approver, and full pre-state.
No double-execution possible — the atomic claim guarantees exactly-once application.
Rejections inform reflexion which steers future AI proposals away from rejected patterns.
Compliance review can replay the entire decision history from approval_audit.

Failure modes

What can go wrong and how to recover

Approved but cascade fails

Approval succeeded but the NS push or D1 cascade errored. The action sits in status='approved' AND cascade_status='failed'. Reconciliation cron retries hourly; manual retry via POST /admin/proposed-actions//retry-cascade.

Backlog accumulates

If pending count exceeds 50, the daily digest highlights it. The longer Mike defers, the more workflows stall behind their HITL gate. Cleanup: triage in bulk first, individual decisions last.

Tool tries to bypass HITL

The "HITL TOOL TEMPLATE" near line 3947 of src/index.ts enforces the pattern at code-review time. New write tools that don't call writeProposedAction get rejected in PR.

Adjacent substrate + workflows

For developers

Code paths + invariants

Concern	Where
Doctrine	data/decisions.json ADR-031
Schema	migrations/schema/113_risk_tiers_proposed_actions.sql
Risk trigger	migrations/schema/114_risk_level_trigger.sql
HITL TOOL TEMPLATE	src/index.ts ~line 3947
Atomic claim	R560 — approve handler in src/index.ts
UI	/proposed-actions.html at repo root
Edit token	X-Edit-Token header — required for L5
Preview/confirm	CLAUDE.md invariant #2 — ?preview=true\|confirm=true

// The HITL TOOL TEMPLATE (paraphrased from src/index.ts ~3947) async function myWriteTool(args) { // 1. Compute the proposed change (no writes yet) const preview = computePreview(args); // 2. Write to proposed_actions — risk_level set by trigger const action_id = await writeProposedAction({ action_type: 'my_write', payload_json: preview, cascade_targets: ['pricing_master', 'netsuite'], }); // 3. Return — wait for Mike. Cascade fires from approve handler. return { status: 'pending_approval', action_id }; }

Changelog

Dated trail · spot stale claims

Dated trail of when this doc was last touched, what changed, and what to look at if it feels stale.

Date	Round	Change	Touched by
`2026-05-26`	`R586`	Added CHANGELOG · SCHEMA · RUNBOOK · BACKLOG sections — wiki became best-in-class operating documentation.	Mike + Claude
`2026-05-25`	`R584/R585`	Wiki originally shipped — 8-section structure (hero / what / when / steps / outcomes / failure-modes / related / for-developers).	Mike + Claude

If today is more than 60 days past the latest changelog row, treat live system behavior as the source of truth. The doc may have drifted — verify against the workflow contract in workflow_definitions WHERE workflow_type='hitl_approval_lifecycle' before acting on these claims.

Schema · data contract

The machine-readable spec

Canonical fields, table names, endpoint signatures. What code should match, what tests should assert. workflow_type · hitl_approval_lifecycle · risk_level · N/A (substrate).

Inputs (required + optional)

Field	Type	Description
`action_id`	`integer`	proposed_actions.id — the row being decided. Required.
`decision`	`string`	'approve' \| 'reject'. Required.
`decided_by`	`string`	Usually 'mike' (single-admin). Required.
`edit_token`	`string`	X-Edit-Token header — prevents accidental clicks. Required.
`note`	`string?`	Optional human reasoning attached to the decision.

D1 tables written

Table	Operation	Trigger
`proposed_actions`	UPDATE (status=approved\|rejected, claimed_by, claimed_at)	Atomic-claim WHERE status='pending'
`hitl_audit_log`	INSERT	Every decision, regardless of outcome
`events`	INSERT (hitl.approved \| hitl.rejected)	Triggers downstream workflow execution

Endpoints called

Method	Path	Purpose
`POST`	`/api/proposed-actions/:id/decide`	Atomic claim + decision
`GET`	`/proposed-actions.html`	Mike's queue (polls every 10s)
`POST`	`/api/proposed-actions/bulk-decide`	Batch approve with cumulative-cap check

Events fired

event_type	When	Subscribers
`hitl.approved`	After atomic claim succeeds	workflow_runner — triggers fan-out
`hitl.rejected`	After atomic claim succeeds with status=rejected	audit only
`hitl.cap_hit`	Bulk approve over CUMULATIVE_CAP_USD	Mike notification

Runbook · when it breaks

It broke at 2am — what now

Different from "how do I use this." This is the page Mike pulls up when something is wrong: logs to check, recovery steps, who to escalate to.

Scenario · Double-approve race — Mike tapped twice and got two cascades

Should be impossible since R560 atomic claim. If it happened, the WHERE status='pending' guard failed.

Verify claim guard: SELECT id, status, claimed_at, claimed_by FROM proposed_actions WHERE id=<id>
Check NS push count: SELECT COUNT(*) FROM ns_pending_pushes WHERE related_action_id=<id> — should be 1.
If >1: Rollback the second NS push; file an ADR; investigate why the atomic claim let it through.

Scenario · Action stuck in pending for >24h

Mike missed it or the queue UI didn't poll.

Check the action: SELECT id, action_type, risk_level, created_at FROM proposed_actions WHERE status='pending' AND created_at < datetime('now', '-24 hours')
Notify Mike: If it's blocking, ping in chat or call.
Decision: Either approve, reject (with reason in note), or extend an SLA flag if it's blocked on something.

Scenario · Cumulative cap hit — bulk approve rejected

Sum of dollar-impact across selected actions exceeded CUMULATIVE_CAP_USD.

Inspect cap: Constant in src/index.ts, currently ~$50K default.
Split: Approve smaller batches under the cap, or use multi-step confirmation prompt.
Adjust: If the cap is too low for the role, file an ADR and bump it.

Scenario · X-Edit-Token missing — 403 on decide

The decide endpoint requires the header. Page may have lost it.

Refresh: Hard refresh /proposed-actions.html. Token is set in localStorage at login.
Verify: Browser devtools → Application → Local Storage → check edit_token key.
Re-auth: Hit /admin/auth/refresh to rotate the token.

Logs to check

workflow_run_log · top-level run audit
workflow_step_log · per-step trace
workflow_verify_results · post-window verify outcomes
cron_locks · stuck cron lock detection
events · workflow.completed / workflow.failed event trail
reflexion_log · per-run narrative (if reflexion_enabled)
npx wrangler tail · live Worker logs

Kill switch · emergency stop

If this workflow is misbehaving in a high-impact way (creating bad proposed_actions in volume, pushing wrong things to NS), flip a kill switch:

kill:ns_writes · stops every NS push platform-wide
kill:proposed_apply · stops HITL approvals from executing fan-out
kill:high_risk_ops · stops risk_level >= 4 fan-out

See kill-switches-state-machine.html for the full state machine + recovery procedure.

Escalation

Primary: Mike Levine (single-admin) · mikelevine@globalfoodsolutions.co. For prolonged outage during business hours, notify warehouse lead + accounting lead so they can defer dependent work.

Backlog · open questions

What's not done · what's uncertain

What's not done, what's uncertain, what we punted. Captured so it survives context switches and doesn't die in someone's head.

OPEN
Should risk_level < 3 still stage in proposed_actions?
Today only risk >= 3 hits HITL. Risk 1-2 auto-execute. Doctrinal question: stage everything (with auto-approve for low risk) or skip the table?
DEFER
Multi-approver flow
Today single-admin (Mike). If GFS scales to multiple admins, need approver roles + escalation if approver A unavailable.
STUB
SLA tracking on pending decisions
Today no SLA. Could add 'expected_decision_within' per action_type and alert on breach.
DECISION
Auto-reject after N days?
Pending rows accumulate. Auto-reject at 30d? Mike's call — might lose important draft work.