HITL lifecycle · wiki guide

What this is

Every AI write goes through a human first

The HITL lifecycle is the proposed_actions state machine. It's the platform's load-bearing safety invariant: no AI-proposed write reaches NetSuite without Mike approving it. ADR-031 codifies this; every tool that mutates business state stages a row here first.

Pre-HITL, AI tools wrote directly. The Cal-Maine incident (silent fraud flag, wrong customer flagged on a phone call) made this non-negotiable. Now the state machine is: pending (AI proposed it) → approved or rejected (Mike decided) → applied (drainer pushed to NS). Every transition writes an audit row.

The diagram lives at substrate-hitl-lifecycle.html. Schema in migration 113 (R537) with risk tier columns. The R560 fix added the atomic UPDATE...RETURNING that closes the double-approve race.

State machine

The three-state lifecycle

01

pending — AI proposed, awaiting human

A chat tool or workflow fan-out emits a row with status='pending'. Payload contains the full intended write (entity_type, entity_ref, action_type, payload_json, risk_level). It lands at /proposed-actions.html, which polls every 10s and surfaces it to Mike with full context.

Writes proposed_actions
Trigger 114_risk_level_trigger sets risk_level
02

approved or rejected — the atomic claim

Mike taps Approve or Reject. The decide endpoint runs an atomic UPDATE proposed_actions SET status=?, decided_by=?, decided_at=? WHERE id=? AND status='pending' RETURNING *. The WHERE status='pending' clause is the R560 fix. If two requests race (Mike double-taps the bulk-approve button, or a network retry fires twice), only the first sees RETURNING non-empty. The second becomes a no-op and the UI shows "already decided".

Writes proposed_actions.status, decided_by, decided_at
R560 atomic UPDATE…RETURNING
03

applied — drainer pushed to NetSuite

Approved rows drop into ns_pending_pushes. The push drainer (cron-triggered or queue-fed) reads pending rows, transforms payload to the NS RESTlet contract, calls the customscript_gfs_platform_query RESTlet via OAuth1, and on success flips proposed_actions.status='applied'. Rejected rows skip the drainer; status='rejected' is terminal.

Writes NetSuite + proposed_actions.status='applied'
Drainer PushMutexDO + NS_PUSH_QUEUE

Risk tiers

L1–L5 — not all writes are equal

Migration 113 (R537) added the L1–L5 risk tier system. The tier determines the UI band color, whether Mike's typical "approve" gesture suffices, or whether a multi-step confirmation is required. Tiers are set by the 114_risk_level_trigger.sql trigger when the row inserts — not by the proposing tool.

Tier	Meaning	Examples	Approval UX
L1	Trivial	email draft, KV cache bust, note attachment	auto-approve eligible
L2	Low risk	SO line price change < $50, AR note	single tap
L3	Medium	bid line price change, inventory adjustment < $1K, BOM variance	single tap with context preview
L4	High	large inventory adjustment, new customer credit limit, assembly build > $5K	tap + confirm modal
L5	Critical	customer credit revoke, bulk price roll, vendor cost roll	X-Edit-Token + multi-step

Kill switches

Three big red buttons

Mike can disable categories of writes globally via flags in the kill_switches table (or via env vars at startup). Each switch is a hard precondition the decide endpoint and the drainer both check.

ns_writes — master switch. When off, the drainer halts and no ns_pending_pushes rows are picked up. Approvals still land but pile up in pending-apply.
proposed_apply — affects the decide endpoint. When off, Mike can still see and triage cards but the Approve button is disabled. Useful for "read-only mode" during incident response.
high_risk_ops — auto-rejects any incoming proposed_action with risk_level ≥ 4. Lets L1–L3 ops continue while we pause L4–L5.

When to flip them

During incident response, mass-recovery, audit windows, or right after a code deploy that touched the HITL path. The R560 audit found the kill switches were never tested end-to-end; we now have a smoke check that toggles each one and confirms the expected halts.

Worked example

Mike approves Driscoll's $0.06 bump

Tuesday 14:00. propose_price_change chat tool fires for SKU 10472, $1.42 → $1.48 on B5875. INSERT into proposed_actions with action_type='bid_price_update', entity_type='item', entity_ref='10472', payload_json={..., bid_id:'B5875'}, status='pending'. The 114_risk_level_trigger evaluates: under 5% movement, dollar impact < $1K cumulative cap → risk_level=3.

Mike sees the card with yellow band at 14:04. Taps Approve. The decide endpoint fires UPDATE proposed_actions SET status='approved', decided_by='mike', decided_at=NOW() WHERE id=8421 AND status='pending' RETURNING *. Returns the row — first hit wins. (If Mike's network blipped and the front-end retried, the second UPDATE returns empty — UI shows "already approved", no double-cascade.) The endpoint then enqueues a row in ns_pending_pushes targeting the NS pricing record.

PushMutexDO picks up the row 200ms later. Calls the NS RESTlet via OAuth1. NS returns 200 with the updated record. The drainer flips proposed_actions.status='applied' and writes a row to reflexion_log with the approved_at ↔ applied_at delta. Event hitl.approved fires onto the event ledger. Customer health watcher consumes it (Driscoll is 36.4% of revenue; price moves matter). Hub KV cache busts. Total clock: ~12 seconds approve-to-applied.

Outcomes

What the invariant gives us

NS writes

100%

via proposed_actions

Audit trail

Complete

per decision

Race fixed

R560

atomic UPDATE…RETURNING

Kill switches

ns_writes, proposed_apply, high_risk_ops

Every NS write is traceable to a Mike approval timestamp.
Bulk decide (R532) lets Mike clear 20 cards in a tap when the context is uniform.
The Cal-Maine-class bug (silent wrong-customer write) is structurally impossible.
Kill switches give us a recovery surface that doesn't require code deploy.

Failure modes

What can go wrong

NS push fails after approval

proposed_actions=approved but drainer can't reach NS. Retry policy: 3 attempts exponential. After exhaustion, status stays 'approved' and the recon cron at 0 */15 * * * re-enqueues. Manual recovery: POST /admin/ns-push/retry?action_id=<id>.

Double-approve race (closed in R560)

Before R560, two concurrent approves could each see status='pending' and both fire the push. R560's atomic UPDATE…WHERE status='pending' RETURNING guarantees only one wins. Second sees empty RETURNING — no-op.

Stale proposed action

A proposed_action references an entity that's since been deleted/modified in NS. Drainer detects and marks status='stale'. Mike sees a "stale" filter on /proposed-actions.html.

Pending pile-up

Mike out for the day; cards accumulate. Detection: pending count on admin-dashboard. Recovery: bulk-decide pattern (R532) when context permits; defer kill switches for batches that can wait.

Adjacent substrate

For developers

Code paths + invariants

Concern	Where
Schema	migrations/schema/113_proposed_actions_risk_tier.sql (R537)
Risk tier trigger	migrations/schema/114_risk_level_trigger.sql
Atomic claim	src/index.ts decide handler — UPDATE…WHERE status='pending' RETURNING
Bulk decide	R532 — POST /api/proposed-actions/bulk-decide
Drainer	PushMutexDO + ns_pending_pushes → NS_PUSH_QUEUE
HITL invariant	ADR-031 (data/decisions.json)
Tool template	src/index.ts ~line 3947 — "HITL TOOL TEMPLATE"
Kill switches	kill_switches table: ns_writes, proposed_apply, high_risk_ops
Event emit	events.event_type='hitl.approved' / 'hitl.rejected' / 'hitl.applied'
Audit	reflexion_log entity_type='proposed_action'

// R560 atomic claim — the heart of the lifecycle const row = await db .prepare(` UPDATE proposed_actions SET status = ?, decided_by = ?, decided_at = ? WHERE id = ? AND status = 'pending' RETURNING * `) .bind(decision, who, now, action_id) .first(); if (!row) { // Lost the race — another request decided first. return { ok: true, status: 'already_decided' }; } // Drainer chain: approved → ns_pending_pushes → PushMutexDO → NS RESTlet → applied if (decision === 'approved') { await enqueueNsPush(row); await emitEvent({ type: 'hitl.approved', entity_id: action_id }); }

Changelog

Dated trail · spot stale claims

Dated trail of when this doc was last touched, what changed, and what to look at if it feels stale.

Date	Round	Change	Touched by
`2026-05-26`	`R586`	Added CHANGELOG · SCHEMA · RUNBOOK · BACKLOG sections — wiki became best-in-class operating documentation.	Mike + Claude
`2026-05-25`	`R584/R585`	Wiki originally shipped — 8-section structure (hero / what / when / steps / outcomes / failure-modes / related / for-developers).	Mike + Claude

If today is more than 60 days past the latest changelog row, treat live system behavior as the source of truth. The doc may have drifted — verify against the workflow contract in workflow_definitions WHERE workflow_type='hitl_lifecycle_substrate' before acting on these claims.

Schema · data contract

The machine-readable spec

Canonical fields, table names, endpoint signatures. What code should match, what tests should assert. workflow_type · hitl_lifecycle_substrate · risk_level · N/A (substrate).

Inputs (required + optional)

Field	Type	Description
`action_id`	`integer`	proposed_actions.id
`decision`	`string`	'approve' \| 'reject'
`decided_by`	`string`	'mike' (single-admin)
`edit_token`	`string`	X-Edit-Token header

D1 tables written

Table	Operation	Trigger
`proposed_actions`	UPDATE (atomic claim)	WHERE status='pending'
`hitl_audit_log`	INSERT	Every decision recorded
`events`	INSERT (hitl.approved \| hitl.rejected)	Triggers downstream

Endpoints called

Method	Path	Purpose
`POST`	`/api/proposed-actions/:id/decide`	Single decide
`POST`	`/api/proposed-actions/bulk-decide`	Batch decide w/ cumulative cap
`GET`	`/proposed-actions.html`	Queue UI

Events fired

event_type	When	Subscribers
`hitl.approved`	Post-claim	workflow_runner triggers fan-out
`hitl.rejected`	Post-claim	audit only
`hitl.cap_hit`	Bulk approve over cap	multi-step confirmation

Runbook · when it breaks

It broke at 2am — what now

Different from "how do I use this." This is the page Mike pulls up when something is wrong: logs to check, recovery steps, who to escalate to.

Scenario · Atomic claim failed — action stuck claiming

Race or DB error.

Check: SELECT id, status, claimed_at, claimed_by FROM proposed_actions WHERE id=?
If status pending but claimed_at set: Inconsistent state; UPDATE to clear claimed_at OR set status to approved.

Scenario · Mike approved but fan-out didn't trigger

hitl.approved event drained but no subscriber fired.

Check event: SELECT * FROM events WHERE event_type='hitl.approved' AND entity_ref LIKE 'action:<id>%' ORDER BY created_at DESC LIMIT 1
Check subscription: SELECT * FROM event_subscriptions WHERE event_type_pattern IN ('hitl.approved','hitl.*','*')
Manual trigger: POST /api/workflow/execute with the workflow_type.

Scenario · Cumulative cap blocked a needed batch

Cap is $50K default. Batch was $60K.

Split: Approve in two batches under the cap.
Multi-step confirm: If platform supports it, confirm-via-OTP for over-cap batches.
Adjust: Bump cap via ADR if business need justifies.

Logs to check

workflow_run_log · top-level run audit
workflow_step_log · per-step trace
workflow_verify_results · post-window verify outcomes
cron_locks · stuck cron lock detection
events · workflow.completed / workflow.failed event trail
reflexion_log · per-run narrative (if reflexion_enabled)
npx wrangler tail · live Worker logs

Kill switch · emergency stop

If this workflow is misbehaving in a high-impact way (creating bad proposed_actions in volume, pushing wrong things to NS), flip a kill switch:

kill:ns_writes · stops every NS push platform-wide
kill:proposed_apply · stops HITL approvals from executing fan-out
kill:high_risk_ops · stops risk_level >= 4 fan-out

See kill-switches-state-machine.html for the full state machine + recovery procedure.

Escalation

Primary: Mike Levine (single-admin) · mikelevine@globalfoodsolutions.co. For prolonged outage during business hours, notify warehouse lead + accounting lead so they can defer dependent work.

Backlog · open questions

What's not done · what's uncertain

What's not done, what's uncertain, what we punted. Captured so it survives context switches and doesn't die in someone's head.

OPEN
Per-action_type cap (not just global)
Today single cumulative cap. Some action_types are high-volume / low-risk and could have higher caps.
DEFER
Multi-approver
Single-admin today. Multi-admin would need approver routing.
STUB
SLA tracking on pending
No SLA enforcement.
DECISION
Soft vs hard reject
Reject today is permanent. Soft reject (with re-propose) might be useful.