01 User input SSE
User posts a message to POST /api/chat from chat.html. X-Role-Id header determines tool palette.
Endpoint
POST /api/chat
Headers
X-Role-Id (admin · pricing · ar · bid · nutrition · production · ops · relationship · order_mgmt · all)
Body
{ messages, session_id, attachment_refs }
02 Canonical regex pre-classifier R556
~25 regex patterns. Match → invoke forced tool, skip LLM entirely. Cheapest path; saves ~$0.007 per matched query.
Table
canonical_intents
Effect
short-circuit LLM, jump straight to executeChatTool
Cite
R556
03 SYSTEM_FORCED_TOOLS allowlist
Tools that the LLM is guaranteed to be allowed to call. Prevents the LLM from suppressing critical lookups.
Source
src/chat_tools/prompt.ts
Effect
bypasses scope guard for listed tools
04 Role gate + auto-context 2 parallel checks
Role gate filters tools palette via X-Role-Id. Auto-context regex-extracts entity names and pulls cached entity summary (5min KV TTL) into system prompt.
Role table
tool_role_palettes · 10 roles
Auto-context
src/lib/auto_context.ts · KV 5min TTL
Function
getCachedAutoContext()
05 3-model dispatch Council v2
Claude Sonnet 4.6 (lead) + Kimi K2.5 (peer, free) + Workers AI Llama 3.3 70B (peer). All run in parallel with the same input.
Lead
claude-sonnet-4.6 · Anthropic · ~$0.003/q
Peer · Kimi
kimi-k2.5 · Moonshot via Cursor · $0 (free tier)
Peer · Llama
@cf/meta/llama-3.3-70b-instruct · ~$0.0001/q
Cite
R39 commit
06 Anonymized peer review R39
After parallel answers, each model receives the other two answers WITHOUT attribution. Rates which is best + revises. Prevents groupthink.
Anonymization
model names stripped from peer outputs
Goal
independent reasoning + groupthink prevention
Cite
R39 commit (council v2 + peer review)
07 Chairman synthesis 2nd Claude call
A 4th call (Claude Sonnet) sees the 3 peer-reviewed answers + ratings, picks the best or synthesizes a hybrid. Emits final answer + tool_calls[].
Model
claude-sonnet-4.6 (2nd call)
Input
3 peer-reviewed answers + ratings
Output
final answer text + tool_calls[]
Cost
~$0.003/query (the 2nd Claude call · biggest single cost driver)
08 executeChatTool + withSources 175+ tools
Each tool_call dispatched via executeChatTool. Returns are wrapped with withSources for citation chips. HITL writes land in proposed_actions not executed inline.
Source
src/chat_tools/impls.ts
Catalog
175+ tools, role-gated
withSources fields
_meta.sources[] · _meta.as_of · _meta.retrieval_path
09 Response + telemetry SSE
JSON streamed back via SSE. One row per chat call in chat_decision_audit + 8-12 rows in routing_layer_telemetry + 2 rows in chat_messages.
Stream
SSE · { answer, tool_results, sources, _meta }
Telemetry tables
chat_decision_audit · routing_layer_telemetry · chat_messages
Used by