Chat pipeline — Council v2 refresh

REPLACES stale R91 chat-pipeline.html · POST /api/chat · ~$0.007/query target · cite R39 commit (council v2 + peer review)

The full Council v2 pipeline from POST /api/chat to streamed response. User input → canonical regex pre-classifier → SYSTEM_FORCED_TOOLS allowlist → role gate → auto-context (entity name regex + cached summaries) → 3-model dispatch (Claude Sonnet 4.6 + Kimi K2.5 + Workers AI Llama 3.3 70B) → anonymized peer review → chairman synthesis → tool dispatch via executeChatToolwithSources wrap → response + telemetry. Cost target $0.007/query. The original chat-pipeline.html stays as historical R91 reference.

9 lanes · 14 nodes cite R39 commit ~$0.007/query target (current 7d council avg: $0.0165)

0 · Council v2 pipeline 9 lanes · 14 nodes · vertical layered

Chat — Council v2 routing
Chat pipeline Council v2 refresh — user input → regex pre-classifier → SYSTEM_FORCED_TOOLS → role gate → auto-context → 3-model dispatch → peer review → chairman → executeChatTool → withSources → response · cite R39 01 / User input 02 / Canonical regex pre-classifier (R556) 03 / SYSTEM_FORCED_TOOLS allowlist 04 / Role gate + auto-context 05 / 3-model dispatch (Council v2) 06 / Anonymized peer review 07 / Chairman synthesis 08 / executeChatTool dispatch + withSources wrap 09 / Response + telemetry User submits chat message via chat.html. POST /api/chat with X-Role-Id header. Body: { messages, session_id, attachment_refs }. SOURCE: user input surface: chat.html endpoint: POST /api/chat headers: X-Role-Id (role-gated tool palette) body: { messages, session_id, attachment_refs } i User input POST /api/chat · X-Role-Id FRONTEND · chat.html Regex match against ~25 canonical intents (R556). If a query matches like "show me AR aging for X", the pre-classifier short-circuits the LLM entirely and invokes the forced tool directly. Cheapest path. GATE: canonical regex pre-classifier table: canonical_intents count: ~25 regex patterns effect: SKIP LLM · invoke forced tool directly cost_saving: ~$0.007 per matched query cite: R556 i canonical regex pre-classifier ~25 patterns · match → forced tool SECURITY · skips LLM entirely · R556 Allowlist of tool calls that MUST always succeed if the LLM emits them. Prevents the LLM from accidentally suppressing critical lookups. Bypasses scope guard for listed tools. GATE: SYSTEM_FORCED_TOOLS table: system_forced_tools effect: bypasses scope guard for listed tools source: src/chat_tools/prompt.ts i SYSTEM_FORCED_TOOLS allowlist always-call tools · bypasses scope guard SECURITY · src/chat_tools/prompt.ts Maps X-Role-Id header to tool palette via tool_role_palettes. 10 roles each gated. filterToolsForRole removes tools not in palette before LLM sees the catalog. GATE: role gate · filterToolsForRole header: X-Role-Id table: tool_role_palettes roles: 10 (admin · pricing · ar · bid · nutrition · production · ops · relationship · order_mgmt · all) effect: filters tool catalog before LLM dispatch source: src/chat_tools/prompt.ts i Role gate · filterToolsForRole X-Role-Id → tool palette · 10 roles SECURITY · src/chat_tools/prompt.ts Regex-extracts entity names from user message (customer, vendor, item code) via name_synonyms. Pulls cached entity summary block from D1 + Vectorize. Cached in KV with 5min TTL. CONTEXT: getCachedAutoContext() extractor: regex against name_synonyms · items cache: KV (5m TTL) output: 1-2 paragraph entity summary appended to system prompt source: src/lib/auto_context.ts i getCachedAutoContext() customer/vendor/item entity extract · KV 5m BACKEND · src/lib/auto_context.ts Lead model. Claude Sonnet 4.6 (Anthropic). Sees full system prompt + auto-context + memory layers + user message. Cost ~$0.003/query average. COUNCIL: lead model: claude-sonnet-4.6 provider: Anthropic role: primary answerer cost: ~$0.003/query cite: R39 i Claude Sonnet 4.6 LEAD · Anthropic · ~$0.003/q parallel · same input CLOUD · primary answerer Moonshot Kimi K2.5 via Cursor (free via API key). Same input. Runs in parallel. Cost $0 (free tier). COUNCIL: peer (Kimi) model: kimi-k2.5 provider: Moonshot (via Cursor) role: peer reviewer cost: $0 (free tier) cite: R39 i Kimi K2.5 PEER · Moonshot (free) parallel · same input CLOUD · peer reviewer Cloudflare Workers AI Llama 3.3 70B Instruct. Same input. Runs in parallel. Cost ~$0.0001/query. COUNCIL: peer (Llama) model: @cf/meta/llama-3.3-70b-instruct provider: Cloudflare Workers AI role: peer reviewer cost: ~$0.0001/query cite: R39 i Workers AI Llama 3.3 70B PEER · CF Workers AI · ~$0.0001/q parallel · same input CLOUD · peer reviewer After parallel answers come back, each model receives the other two answers WITHOUT attribution (model names stripped). Models rate which is best + revise if needed. Prevents groupthink + forces independent reasoning. COUNCIL: peer review stage: post-initial-answer anonymization: model names stripped from peer outputs goal: prevent groupthink + force independent reasoning cite: R39 i Anonymized peer review each model sees the other 2 anonymously · prevents groupthink BACKEND · R39 commit A 4th call (Claude Sonnet) sees the 3 peer-reviewed answers + ratings, picks the best or synthesizes a hybrid. Emits the final answer text + tool_calls[]. COUNCIL: chairman model: claude-sonnet-4.6 input: 3 peer-reviewed answers + ratings output: final answer text + tool_calls[] cost: ~$0.003/query (2nd Claude call) cite: R39 i Chairman synthesis Claude Sonnet 4.6 (2nd call) · synthesizes 3 answers + ratings BACKEND · final answer + tool_calls[] Each tool_call in the final answer is dispatched via executeChatTool. 175+ tools registered. Each is gated by the role palette. TOOLS: executeChatTool dispatch count: 175+ tools source: src/chat_tools/impls.ts gating: role palette applied at filterToolsForRole hitl: write tools land in proposed_actions (not executed inline) i executeChatTool dispatch 175+ tools · src/chat_tools/impls.ts writes land in proposed_actions (HITL) BACKEND · role-gated dispatch Every tool return is wrapped with withSources(): { data, _meta: { sources: [{table, ref, ...}], as_of, retrieval_path } }. Drives citation rendering in chat UI. WRAP: withSources fields: _meta.sources[] · _meta.as_of · _meta.retrieval_path drives: citation chips in chat UI · audit trail source: src/chat_tools/impls.ts i withSources wrap _meta.sources[] · as_of · retrieval_path drives citation chips in chat UI BACKEND · audit trail Final response JSON: { answer, tool_results, sources, _meta: { cost_usd, latency_ms, routing_mode } }. Streamed back via SSE. RESPONSE: streamed back surface: chat.html format: { answer, tool_results, sources, _meta } streamed: yes (Server-Sent Events) fields: cost_usd · latency_ms · routing_mode i Response to user SSE stream · { answer, tool_results, sources, _meta } FRONTEND · chat.html One row per chat call. cost_usd · latency_ms · routing_mode · role · intent. Source-of-truth for /api/ai/metrics + the live cost surface diagram. TELEMETRY: chat_decision_audit table: chat_decision_audit per_chat_rows: 1 also: routing_layer_telemetry (8-12 rows / call) + chat_messages used_by: cost analysis · grade tracking · /api/ai/metrics i chat_decision_audit 1 row/chat + routing_layer_telemetry (8-12) + chat_messages DATABASE · cost SOT short-circuit · regex match skips LLM

Cost target $0.007/query · 7-day live council_v2 avg $0.0165

Target $0.007/query Lead $0.003 + Kimi $0 + Llama $0.0001 + Chairman $0.003 + Tools/Embed ~$0.0001 = $0.0062
7d live $0.0165/query council_v2 · 564 queries in 7 days · 2.4× target (chairman re-context inflation)

1 · Lane detail · 9 stages

01 User input SSE

User posts a message to POST /api/chat from chat.html. X-Role-Id header determines tool palette.
Endpoint
POST /api/chat
Headers
X-Role-Id (admin · pricing · ar · bid · nutrition · production · ops · relationship · order_mgmt · all)
Body
{ messages, session_id, attachment_refs }

02 Canonical regex pre-classifier R556

~25 regex patterns. Match → invoke forced tool, skip LLM entirely. Cheapest path; saves ~$0.007 per matched query.
Table
canonical_intents
Effect
short-circuit LLM, jump straight to executeChatTool
Cite
R556

03 SYSTEM_FORCED_TOOLS allowlist

Tools that the LLM is guaranteed to be allowed to call. Prevents the LLM from suppressing critical lookups.
Source
src/chat_tools/prompt.ts
Effect
bypasses scope guard for listed tools

04 Role gate + auto-context 2 parallel checks

Role gate filters tools palette via X-Role-Id. Auto-context regex-extracts entity names and pulls cached entity summary (5min KV TTL) into system prompt.
Role table
tool_role_palettes · 10 roles
Auto-context
src/lib/auto_context.ts · KV 5min TTL
Function
getCachedAutoContext()

05 3-model dispatch Council v2

Claude Sonnet 4.6 (lead) + Kimi K2.5 (peer, free) + Workers AI Llama 3.3 70B (peer). All run in parallel with the same input.
Lead
claude-sonnet-4.6 · Anthropic · ~$0.003/q
Peer · Kimi
kimi-k2.5 · Moonshot via Cursor · $0 (free tier)
Peer · Llama
@cf/meta/llama-3.3-70b-instruct · ~$0.0001/q
Cite
R39 commit

06 Anonymized peer review R39

After parallel answers, each model receives the other two answers WITHOUT attribution. Rates which is best + revises. Prevents groupthink.
Anonymization
model names stripped from peer outputs
Goal
independent reasoning + groupthink prevention
Cite
R39 commit (council v2 + peer review)

07 Chairman synthesis 2nd Claude call

A 4th call (Claude Sonnet) sees the 3 peer-reviewed answers + ratings, picks the best or synthesizes a hybrid. Emits final answer + tool_calls[].
Model
claude-sonnet-4.6 (2nd call)
Input
3 peer-reviewed answers + ratings
Output
final answer text + tool_calls[]
Cost
~$0.003/query (the 2nd Claude call · biggest single cost driver)

08 executeChatTool + withSources 175+ tools

Each tool_call dispatched via executeChatTool. Returns are wrapped with withSources for citation chips. HITL writes land in proposed_actions not executed inline.
Source
src/chat_tools/impls.ts
Catalog
175+ tools, role-gated
withSources fields
_meta.sources[] · _meta.as_of · _meta.retrieval_path

09 Response + telemetry SSE

JSON streamed back via SSE. One row per chat call in chat_decision_audit + 8-12 rows in routing_layer_telemetry + 2 rows in chat_messages.
Stream
SSE · { answer, tool_results, sources, _meta }
Telemetry tables
chat_decision_audit · routing_layer_telemetry · chat_messages
Used by
GET /api/ai/metrics · admin dashboard · ai-cost-surface-live

2 · 3 council models · cost reference

ModelProviderRoleCost / query
claude-sonnet-4.6AnthropicLead~$0.003
kimi-k2.5Moonshot (via Cursor)Peer reviewer$0.000 (free tier)
@cf/meta/llama-3.3-70b-instructCloudflare Workers AIPeer reviewer~$0.0001
claude-sonnet-4.6 (chairman)AnthropicSynthesis (2nd Claude call)~$0.003

3 · What changed from R91

R91 (stale)R39 Council v2 (current)
Single LLM call3 parallel models + chairman
No peer reviewAnonymized peer review pass
No regex pre-classifierCanonical regex short-circuits LLM
No forced-tool allowlistSYSTEM_FORCED_TOOLS guarantees critical lookups
Static role checkfilterToolsForRole + 10-role palette
No auto-contextgetCachedAutoContext() injects entity summary
Plain returnwithSources wrap → citation chips
Single audit rowchat_decision_audit + routing_layer_telemetry (8-12 rows)

4 · Open gaps