Kill switches — state machine

5 KV-backed kill switches · admin-only X-Edit-Token · 24h auto-expire · audited in chat_decision_audit · cite R547

The platform exposes 5 admin-only kill switches as KV keys. Every write path (NS push, drainer, email pipeline, public portal, high-risk action) checks them before proceeding. Switches have 3 states (off / armed / on) and auto-expire after 24h as a safety net so we never permanently brick ourselves. Every flip lands in chat_decision_audit. Cite R547.

5 switches · live 3 states each 24h auto-expire

0 · State machine 3 states per switch · 5 switches · admin only

Kill switch state machine (per switch)
Kill switches state machine — 5 switches: ns_writes · proposed_apply · email_intake · external_portals · high_risk_ops. Each has 3 states off/armed/on. Admin flips with X-Edit-Token. 24h auto-expire. 01 / State machine (per switch) 02 / The 5 switches 03 / Audit + recovery Default resting state. KV key absent or empty. All gated paths proceed normally. No alarms; no admin attention required. STATE: off kv_value: null / absent effect: gated paths allowed default_at_boot: yes flip_to: armed (preview/dry-run) or on (immediate block) i OFF KV key absent DEFAULT Admin has flipped the switch but in dry-run mode. Code logs "would block" events to chat_decision_audit without actually denying. Used to preview impact before going hot. STATE: armed kv_value: 'dryrun' (extension) effect: log-only · DOES NOT block use_case: preview impact before flipping on next: flip to on, or back to off i ARMED dry-run · log-only PREVIEW MODE Switch is ON. KV value = 'true'. Every gated path checks env.CACHE.get(key) === 'true' and refuses to proceed. 24-hour TTL auto-expiry as safety net; admin re-confirms by re-flipping. STATE: on kv_value: 'true' effect: HARD BLOCK on all gated paths ttl: 86400 sec (24h auto-expire) flip_to: off (manual) or auto-expires cite: R547 line 14235 i ON kv='true' · BLOCK ALL DANGER · 24h TTL arm (preview) flip on recover (off) 24h auto-expire Blocks the NS_PUSH_QUEUE producer AND drain. When ON, no writes leave the Worker toward NetSuite — every NS-bound mutation buffers in proposed_actions instead. SWITCH: kill:ns_writes blocks: NS_PUSH_QUEUE producer + drain on_breach: writes buffer in proposed_actions use: NS upgrade window · TBA token rotation · NS outage recovery: flip off + drain queue via /api/ns-push/drain i kill:ns_writes blocks NS_PUSH_QUEUE producer + drain · buffers in proposed_actions Blocks the proposed_actions drain handler EVEN if Mike has approved a row. Stronger than ns_writes — also prevents D1-side commits, not just NS push. SWITCH: kill:proposed_apply blocks: proposed_actions drain regardless of approval on_breach: approved rows stay in pending state use: bad action handler shipped · before fix lands recovery: deploy fix + flip off + drain i kill:proposed_apply blocks all approved-action drains regardless of approval Blocks inbound email processing across all 5 mailboxes. Inbound mail bounces; CF Email Routing logs the rejection. Used when spam loop or rogue extractor needs containment. SWITCH: kill:email_intake blocks: src/email.ts inbound processing on_breach: mail bounces with 5xx use: spam loop · rogue extractor · email infra change recovery: fix extractor + flip off · replay from CF Email logs i kill:email_intake blocks src/email.ts · all 5 mailboxes · returns 5xx bounce Blocks public portal endpoints — customer portal, vendor self-serve, public spec sheet lookup. Used when scraper attack or unauthorized data exposure suspected. SWITCH: kill:external_portals blocks: customer portal · vendor portal · public spec lookup on_breach: HTTP 503 + maintenance message use: scraper attack · unauthorized exposure investigation recovery: investigate + close attack vector + flip off i kill:external_portals blocks customer/vendor portals + public spec endpoints Blocks all action_types with risk_level >= 4 (writes to NS that move money or affect inventory). Lower-risk reads + previews still allowed. Most surgical of the 5 switches. SWITCH: kill:high_risk_ops blocks: proposed_actions with risk_level >= 4 on_breach: high-risk action skipped; logged with reason use: pricing pipeline anomaly · payment apply suspect recovery: investigate + fix + flip off cite: R538 i kill:high_risk_ops blocks all risk_level >= 4 actions · most surgical · R538 GET /api/kill-switches returns current state of all 5. POST /api/kill-switches/flip { key, on } toggles one. X-Edit-Token header required for both. Bound to admin role only. ENDPOINT: kill-switch admin API list: GET /api/kill-switches flip: POST /api/kill-switches/flip { key, on } auth: X-Edit-Token (admin role) audit: every flip → chat_decision_audit event_type='kill_switch_flip' source: src/index.ts:14208-14245 i admin API GET /api/kill-switches · POST /api/kill-switches/flip X-Edit-Token required · admin role only FRONTEND · src/index.ts:14208 Every flip lands as a row in chat_decision_audit with event_type='kill_switch_flip' and payload { key, on, by, expires_in }. Append-only; never edited. AUDIT: chat_decision_audit event_type: kill_switch_flip fields: key · on · by · expires_in_hours table: chat_decision_audit retention: forever (append-only) i audit log chat_decision_audit event_type='kill_switch_flip' DATABASE · append-only KV key TTL = 86400 sec. After 24h, the KV layer evicts; switch reverts to OFF automatically. Safety net so we never permanently brick the platform. SAFETY: 24h auto-expire ttl_seconds: 86400 effect: KV evicts → switch reverts to OFF rationale: prevents permanent self-lockout re-confirm: admin must re-flip if needed beyond 24h i 24h TTL safety net KV expirationTtl=86400 auto-revert to OFF CLOUD · KV TTL Admin dashboard surface shows current state of all 5 switches with a one-click flip. Backed by GET /api/kill-switches polled every 30s. RECOVERY: admin dashboard surface: /admin-dashboard.html polls: GET /api/kill-switches every 30s one-click flip: POST /api/kill-switches/flip visible: state pill per switch (off/on/expiring-in) i admin dashboard surface /admin-dashboard.html poll 30s · one-click flip FRONTEND · admin only When a switch flips ON, the daily digest email + admin Slack ping include the event. Recovery procedure linked inline. RECOVERY: alert & runbook channels: daily digest email · admin Slack fires_on: flip to ON body: switch name + reason + recovery URL runbook: docs/KILL_SWITCH_RUNBOOK.md i alert + runbook daily digest + Slack ping runbook auto-linked MESSAGEBUS · on-flip

Switch detail · 5 KV-backed kill switches

K1 kill:ns_writes DANGER

Blocks the NS_PUSH_QUEUE producer AND drain. When ON, no writes leave the Worker toward NetSuite; every NS-bound mutation buffers in proposed_actions instead.
KV key
kill:ns_writes
Blocks
NS_PUSH_QUEUE producer + drain · all NS RESTlet writes
Use case
NS upgrade window · TBA token rotation · NS outage
Recovery
flip off + drain queue via POST /api/ns-push/drain

K2 kill:proposed_apply DANGER

Blocks the proposed_actions drain handler EVEN if Mike has approved a row. Stronger than ns_writes — also prevents D1-side commits, not just NS push.
KV key
kill:proposed_apply
Blocks
proposed_actions drain regardless of approval state
Use case
bad action handler shipped · race / data corruption hunt
Recovery
deploy fix · flip off · resume drain

K3 kill:email_intake CAUTION

Blocks inbound email processing across all 5 mailboxes (src/email.ts). Mail bounces with 5xx so senders see it.
KV key
kill:email_intake
Blocks
src/email.ts inbound across 5 mailboxes
Use case
spam loop · rogue extractor · email-routing change
Recovery
fix extractor · flip off · replay from CF Email logs

K4 kill:external_portals CAUTION

Blocks public portal endpoints — customer portal, vendor self-serve, public spec sheet lookup. Returns 503 + maintenance message.
KV key
kill:external_portals
Blocks
customer portal · vendor portal · public spec lookup
Use case
scraper attack · unauthorized exposure investigation
Recovery
close attack vector · flip off · review CF WAF rules

K5 kill:high_risk_ops SURGICAL

Blocks all action_types with risk_level >= 4 (writes that move money or affect inventory). Lower-risk reads + previews still allowed.
KV key
kill:high_risk_ops
Blocks
proposed_actions with risk_level >= 4
Use case
pricing pipeline anomaly · payment apply suspect
Recovery
investigate · fix · flip off
Cite
R538

Admin API contract

methodendpointbodyheadersresponse
GET/api/kill-switchesX-Edit-Token{ ok, kill_switches: { key: boolean, ... }, hint }
POST/api/kill-switches/flip{ key, on }X-Edit-Token{ ok, key, on, expires_in_hours }
# Example: arm + flip on ns_writes for 24h
curl -X POST https://api.ai-globalfoodsolutions.co/api/kill-switches/flip \
  -H "X-Edit-Token: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"key":"kill:ns_writes","on":true}'
# → { "ok":true, "key":"kill:ns_writes", "on":true, "expires_in_hours":24 }

# Recover: flip off
curl -X POST https://api.ai-globalfoodsolutions.co/api/kill-switches/flip \
  -H "X-Edit-Token: $TOKEN" -H "Content-Type: application/json" \
  -d '{"key":"kill:ns_writes","on":false}'

Source: src/index.ts:14208-14245 · R547

Open gaps