Chat — Council v2 routing pipeline Council v2replaces R91

POST /api/chat · 3 LLMs in parallel · anonymized peer review · ~$0.007/query

Council v2 is the default chat mode (CLAUDE.md invariant #1). This shows the full pipeline: user message → pre-LLM short-circuits (canonical intent regex, forced-tool allowlist, out-of-scope guard) → role gate (X-Role-Id palette) + auto-context (entity summaries) → 3 LLMs in parallel (Claude Sonnet lead + Kimi peer + Llama peer) → anonymized peer review → chairman synthesis → tool dispatch + withSources wrapping → response, with telemetry rows written at every layer.

0 · Visual flow 7 lanes · 17 nodes

System flow
01 / User input 02 / Pre-LLM short-circuits (canonical intent · forced tool · scope guard) 03 / Role gate + auto-context (X-Role-Id palette · entity summaries) 04 / Council v2 dispatch (3 LLMs in parallel) 05 / Peer review + chairman synthesis 06 / Tools + withSources + response 07 / Telemetry (always-on) User submits a chat message via chat.html. POST /api/chat with X-Role-Id header. Body: { messages: [...], session_id }. SOURCE: user input surface: chat.html endpoint: POST /api/chat headers: X-Role-Id (role-gated tool palette) body: { messages, session_id, attachment_refs } i User input POST /api/chat Pre-LLM regex match for canonical intents (e.g., 'show me AR aging for X' → tool=ar_aging_summary). Short-circuits LLM call entirely for high-confidence patterns. R556. SHORT-CIRCUIT: canonical_intent matches: 25+ regex patterns effect: skip LLM, invoke forced tool directly table: canonical_intents i Canonical intent regex matches → forced tool Allowlist of tool calls that must always succeed if the LLM emits them. Prevents the LLM from accidentally suppressing critical lookups. SHORT-CIRCUIT: forced tool allowlist table: system_forced_tools effect: bypasses scope guard for listed tools i SYSTEM_FORCED_TOOLS allowlist for guaranteed-call Pre-LLM classifier (cheap embedding match). If user asks about a topic outside the platform scope (e.g., 'help me with my taxes'), responds with a polite refusal + suggests in-scope topics. SHORT-CIRCUIT: out-of-scope guard classifier: embedding similarity to in-scope topic set threshold: <0.4 similarity effect: refuse, no LLM call i Out-of-scope guard topic classifier → polite refusal Maps X-Role-Id header to a tool palette via tool_role_palettes (R556). 50+ tools each gated to admin/pricing/ar/bid/nutrition/production/ops/relationship/order_mgmt/all. GATE: role header: X-Role-Id table: tool_role_palettes roles: 10 effect: removes tools not in palette before LLM sees catalog i Role gate · filterToolsForRole X-Role-Id → tool palette Regex-extracts entity names from user message (customer, vendor, item code). Pulls cached entity summary block from D1 + Vectorize. Cached in KV with 5min TTL. CONTEXT: auto_context extractor: regex against name_synonyms + items cache: KV (5m TTL) output: 1-2 paragraph entity summary appended to system prompt i getCachedAutoContext() customer/vendor/item entity extract Claude Sonnet 4.6 (Anthropic). The lead model in Council v2. Sees system prompt + auto-context + memory layers + user message. Cost: ~$0.003/query average. COUNCIL: lead model: claude-sonnet-4.6 provider: Anthropic role: primary answerer cost: ~$0.003/query i Claude Sonnet 4.6 lead model parallel · same input Moonshot Kimi K2.5 via Cursor (free via API key). Same input. Runs in parallel. Cost: $0 (free tier). COUNCIL: peer (Kimi) model: kimi-k2.5 provider: Moonshot (via Cursor) role: peer reviewer cost: $0 (free tier) i Kimi K2.5 peer reviewer (free) parallel · same input Cloudflare Workers AI Llama 3.3 70B Instruct. Same input. Runs in parallel. Cost: $0.0001/query. COUNCIL: peer (Llama) model: @cf/meta/llama-3.3-70b-instruct provider: Cloudflare Workers AI role: peer reviewer cost: ~$0.0001/query i Workers AI Llama 3.3 70B peer reviewer (CF) parallel · same input After parallel answers come back, each model receives the other two answers WITHOUT attribution (model names stripped). Models rate which is best + revise if needed. COUNCIL: peer review stage: post-initial-answer anonymization: model names stripped from peer outputs goal: prevent groupthink + force independent reasoning i Anonymized peer review each model sees the other 2 anonymously A 4th call (Claude Sonnet) sees the 3 (revised) answers + ratings, picks the best or synthesizes a hybrid. Emits the final answer text + tool_calls. COUNCIL: chairman model: claude-sonnet-4.6 input: 3 peer-reviewed answers + ratings output: final answer text + tool_calls[] i Chairman synthesis final answer assembled Each tool_call in the final answer is dispatched via executeChatTool. 175+ tools registered. Each is gated by the role palette. TOOLS: dispatch count: 175+ tools source: src/chat_tools/impls.ts gating: role palette applied at filterToolsForRole i executeChatTool dispatches 175+ tools src/chat_tools/impls.ts Every tool return is wrapped with withSources(): { data, _meta: { sources: [{table, ref, ...}], as_of, retrieval_path } }. Drives citation rendering in chat UI. WRAP: withSources fields: _meta.sources[], _meta.as_of, _meta.retrieval_path drives: citation chips in chat UI, audit trail i withSources wrapping _meta.sources array every tool result wrapped Final response JSON: { answer, tool_results: [...], sources: [...] }. Streamed back to chat.html for display. RESPONSE: chat surface: chat.html format: { answer, tool_results, sources } streamed: yes (SSE) i Response to user answer + tool results + sources streamed back to chat.html Every layer (short-circuit, role gate, auto-context, council, chairman, tool) writes a row to routing_layer_telemetry with layer_name, decision, latency_ms, model. TELEMETRY: routing table: routing_layer_telemetry per_chat_rows: 8-12 rows / chat call used_by: chat decision audit + perf analysis i routing_layer_telemetry every layer logs One row per chat call: question, final answer, model costs, total tokens, retrieval set, peer review scores. TELEMETRY: decision audit table: chat_decision_audit per_chat_rows: 1 used_by: cost analysis + grade tracking i chat_decision_audit per-chat cost + grade Plain chat thread history. session_id, role, content, timestamp. TELEMETRY: chat_messages table: chat_messages used_by: chat UI history pane, recap i chat_messages thread history

1 · What this is

goal
Document the Council v2 chat routing pipeline that replaced the R91 single-model flow.
layout
vertical layered (user → short-circuits → role gate → council → chairman → tools → telemetry)
lanes
7 lanes · 17 nodes
cost
~$0.007/query average (Claude Sonnet $0.003 + Kimi $0 + Llama $0.0001 + chairman $0.003 + tools)
replaces
flows-diagrams/chat-pipeline.html (R91, stale)

2 · The 3 council models

ModelProviderRoleCost / query
claude-sonnet-4.6AnthropicLead~$0.003
kimi-k2.5Moonshot (via Cursor)Peer reviewer$0.000 (free tier)
@cf/meta/llama-3.3-70b-instructCloudflare Workers AIPeer reviewer~$0.0001
claude-sonnet-4.6 (2nd call)AnthropicChairman synthesis~$0.003

3 · Cost breakdown ~$0.007/query target

ComponentAvg cost / queryNotes
Lead model (Claude Sonnet)$0.003~5k input + ~500 output tokens
Kimi K2.5 peer$0.000Cursor free tier
Llama 3.3 70B peer$0.0001Workers AI billing
Chairman synthesis (Claude)$0.003Same model, 2nd call
Tool calls (D1 reads)~$0.0002Mostly within free tier
Embedding for retrieval~$0.00005~3 embedding calls
Total avg~$0.007Per chat turn

4 · How to read it

ColorMeaning
frontendUser-facing surface (chat UI, admin HTML pages)
backendWorker logic / agent code / business rules
databaseD1 table / R2 object / KV key / Vectorize index
cloudExternal system (NetSuite, Anthropic, etc.)
securityGate / policy / HITL approval / kill switch
messagebusEvent ledger, Queues, async fan-out
externalInbound source (email, webhook, cron tick, user input)
→ solidSynchronous call (request → response)
→ greenApproved / happy-path
→ red dashedPolicy or security check
→ grey dashedOptional / conditional / async