AI cost surface — live

7-day live D1 query · CostCapDO state · per-pillar spend · free-tier ceilings · refresh of cost-surface.html (build-time pull 2026-05-25)

This refresh pulls the actual 7-day chat_decision_audit numbers (council_v2 + single routing modes) at build time, lays them out as a 4-lane funnel (AI tokens / NetSuite API / Vectorize / D1+KV+R2 → CostCapDO daily cap → total $/day), and exposes per-pillar spend, free-tier ceilings, and the kill-switch braking surface. Replaces the static rounded estimates in cost-surface.html.

live · 7-day pull CostCapDO active council_v2 = $0.0165/query (above $0.007 target)

0 · 7-day live spend SELECT routing_mode, SUM(cost_usd) FROM chat_decision_audit WHERE created_at >= date('now','-7 days')

council_v2 spend
$9.32
564 queries · avg $0.0165
single spend
$1.87
551 queries · avg $0.0034
7-day total
$11.20
1,115 chat turns
avg / day
$1.60
below CostCapDO $5/day cap

1 · Visual funnel 4 source lanes → CostCapDO brake → $/day total

Cost surface — 4-lane funnel
AI cost surface — live: 4 source lanes (AI tokens · NS API · Vectorize · D1+KV+R2) → CostCapDO daily cap → total $/day 01 / Source lanes (4) 02 / Metering & estimate 03 / CostCapDO brake 04 / Output ($/day · per pillar) Anthropic + Cursor + Workers AI token calls. Council v2 fires 3 models in parallel + chairman synthesis. Avg cost $0.0165/query (7-day live) — above the $0.007 target. Single mode avg $0.0034. LANE 1: AI tokens models: claude-sonnet-4.6 · kimi-k2.5 · llama-3.3-70b council_v2: $0.0165/q (564 q in 7d = $9.32) single: $0.0034/q (551 q in 7d = $1.87) ceiling: Anthropic 60 req/min per workspace data: chat_decision_audit.cost_usd i AI tokens 3 models · council + chairman $0.0165/q council · $0.0034/q single CLOUD · live 7d NetSuite RESTlet OAuth1 calls via src/lib/ns.ts. Each ns_suiteql call ~250ms · 0.1c per call. Pull rate dominated by sync tiers (warm 2min · hot 5min · cold 60min). LANE 2: NetSuite API endpoint: customscript_gfs_platform_query cost: ~$0.001/call (TBA OAuth) ceiling: 1000 governance units/sec; NS_PUSH_QUEUE drain fires_per_day: ~120 syncs · 21 tables i NetSuite API customscript_gfs_platform_query ~120 syncs/day · TBA EXTERNAL · TBA OAuth1 Cloudflare Vectorize index for knowledge_chunks + embeddings. Top-k retrieval per chat call. Free tier covers most queries; embedding write batches charged separately. LANE 3: Vectorize index: gfs-knowledge (3,371 chunks live) per_query: ~$0.00005 (top-k retrieval) ceiling: 5M queries/month free; $0.04 per million queried data: vectorize bindings + R555 nightly write i Vectorize 3,371 chunks · top-k retrieval $0.00005/q · mostly free tier DATABASE · index gfs-knowledge D1 reads/writes + KV reads + R2 reads/writes. Almost entirely within Workers Paid free tier (25M D1 reads/day, 10M KV reads/day, 10M R2 Class A ops/month). Charged costs nominal. LANE 4: D1 + KV + R2 d1_rows: 311K mirrored · 162 tables d1_cost: ~$0.0002/chat (tool reads) kv_cost: ~$0/chat (well within free tier) r2_cost: ~$0/chat (mostly reads) ceiling: Workers Paid plan free allowances i D1 + KV + R2 311K rows · 162 tables $0.0002/chat · mostly free tier DATABASE · Workers Paid Each tool execution + each LLM call gets metered. Token counts × per-1k-token rate computes estimated_cost_usd, written into chat_decision_audit per request. METER: per-call estimate source: src/index.ts cost estimator fields: estimated_cost_usd · prompt_tokens · completion_tokens written_to: chat_decision_audit.cost_usd refreshed_every: chat turn i per-call estimator tokens × rate → cost_usd writes chat_decision_audit BACKEND · src/index.ts Append-only audit table. One row per chat turn. Cost + latency + routing_mode + role + intent. Source-of-truth for this diagram's numbers. TABLE: chat_decision_audit rows_last_7d: 1,115 turns cost_last_7d: $11.20 schema: cost_usd · latency_ms · routing_mode · role · intent used_by: this diagram + admin grade dashboard i chat_decision_audit append-only · 1 row / chat turn $11.20 last 7 days · 1,115 turns DATABASE · cost SOT Rollup of chat_decision_audit grouped by role. 7-day window. Drives the per-pillar lane to the right and helps you see which job functions are most expensive. TABLE: spend by role (7d) admin: $11.20 · 1,115 q others: $0 (admin is sole role in window) query: GROUP BY role · ORDER BY spend DESC i spend by pillar GROUP BY role admin = 100% · others $0 DATABASE · 7d window Durable Object that maintains a 24h sliding-window of cost_usd writes. Reads chat_decision_audit on each /api/chat. Returns 429 + kill switch hint when cap exceeded. Default cap $5/day; reset endpoint exists. GATE: CostCapDO binding: env.COST_CAP_DO source: src/durable_objects.ts daily_cap_usd: 5.00 (configurable via /api/ai/caps) state: 24h sliding window on_breach: HTTP 429 + force route to single-model emergency_bump: /api/ai/caps/emergency-bump i CostCapDO 24h sliding window cap=$5/day · current ~$1.60 SECURITY · DO · /api/ai/caps 5 KV-backed kill switches. On breach, force-fallback to single model OR refuse new chat. Flippable only by admin with X-Edit-Token. GATE: kill switches surface: /api/kill-switches keys: kill:ns_writes · kill:proposed_apply · kill:email_intake · kill:external_portals · kill:high_risk_ops cite: R547 see: kill-switches-state-machine.html i kill switches 5 KV-backed toggles admin · X-Edit-Token SECURITY · R547 Admin dashboard renders today's cost + CostCapDO headroom. Updates every 30s on poll. Lives at /admin-dashboard.html. OUTPUT: admin dashboard surface: /admin-dashboard.html refresh: 30s poll shows: today $ · CostCapDO % used · top 5 expensive intents i admin dashboard today $ + headroom % FRONTEND · 30s poll This live diagram. Build-time pull on every regeneration via npx wrangler d1 execute. Pinned to the date in the meta line. OUTPUT: this diagram refresh: build-time D1 pull pinned: 2026-05-25 7d window file: ai-cost-surface-live.html i this diagram build-time pull · 2026-05-25 FRONTEND · refreshable GET /api/ai/metrics returns JSON: total spend today, headroom, per-model breakdown, latency p50/p95. Consumed by external monitors. OUTPUT: /api/ai/metrics endpoint: GET /api/ai/metrics format: JSON consumers: admin dash · external monitors · cron health-check i /api/ai/metrics today $ + p50/p95 latency JSON BACKEND · GET endpoint Daily digest email at 08:00 Mon-Fri. Spend, top 5 expensive intents, regression alerts. Reads from chat_decision_audit. OUTPUT: daily digest email trigger: 0 8 * * 1-5 reads: chat_decision_audit · training_loop_runs body: spend + top intents + regressions i daily digest email 0 8 * * 1-5 MESSAGEBUS · digest

2 · 7-day live numbers by routing mode

routing_modequeriesspendavg / queryavg latency
council_v2564$9.32$0.016533,838 ms
single551$1.87$0.00345,118 ms
total1,115$11.20$0.0100

As of 2026-05-25 · SELECT routing_mode, COUNT(*) AS n, SUM(cost_usd) AS spend FROM chat_decision_audit WHERE created_at >= date('now','-7 days') GROUP BY routing_mode

3 · Free-tier ceilings & headroom

ServiceCeilingCurrent usageHeadroom
Vectorize queries5M / month free~30K queries/month99% headroom
D1 row reads25M / day on Workers Paid~50K / day99% headroom
KV reads10M / day free~10K / day99% headroom
R2 Class A ops10M / month on Workers Paid~5K / month99% headroom
Workers AI Llama 3.3 70B10K neurons/day free~3K / day70% headroom
Anthropic API60 req/min workspace limit~0.3 RPS peak97% headroom
CostCapDO daily cap$5 / day (configurable)~$1.60 / day average68% headroom

4 · Per-source stage detail

L1 AI tokens live 7d

Council v2 fires 3 models in parallel + chairman synthesis. 7-day avg $0.0165/query — currently above the $0.007 target. Driver: chairman call adds ~$0.003.
Models
claude-sonnet-4.6 (Anthropic) · kimi-k2.5 (free via Cursor) · @cf/meta/llama-3.3-70b-instruct
Live 7d
564 council queries × $0.0165 = $9.32
Live 7d
551 single queries × $0.0034 = $1.87
Action
Investigate why council avg is 2.4× target — likely the chairman call is reusing too much context.

L2 NetSuite API stable

Custom RESTlet via TBA OAuth1. ~120 sync touches/day across 21 tables. Per-call cost ~$0.001 in NS governance unit terms; CF egress free.
RESTlet
customscript_gfs_platform_query
Push queue
NS_PUSH_QUEUE · drain via /api/ns-push/drain
Tiers
hot 2min · warm 5min · cold 60min

L3 Vectorize free tier

Top-k retrieval per chat call. 3,371 chunks live. Mostly within free tier; embedding-write batches charged separately during nightly corpus build.
Index
gfs-knowledge
Per query
~$0.00005 (top-k retrieval)
Cite
R555 corpus build

L4 D1 + KV + R2 free tier

Almost entirely within Workers Paid plan free allowances. Per-chat ~$0.0002 for D1 reads; KV/R2 negligible.
D1 rows
311K mirrored across 162 tables
KV
5min entity-summary cache · kill-switch flags
R2
spec PDFs · pricing exports · backup snapshots

G1 CostCapDO DO active

Durable Object maintains 24h sliding window. Forces route to single-model + returns 429 when daily cap exceeded.
Binding
env.COST_CAP_DO
Source
src/durable_objects.ts
Cap
$5/day default · configurable via /api/ai/caps
Reset
POST /api/ai/caps/reset
Bump
POST /api/ai/caps/emergency-bump

G2 Kill switches 5 toggles

KV-backed global switches that braking paths check. See kill-switches-state-machine.html for state machines.
Keys
kill:ns_writes · kill:proposed_apply · kill:email_intake · kill:external_portals · kill:high_risk_ops
Endpoint
GET /api/kill-switches · POST /api/kill-switches/flip
Cite
R547