AI cost surface — live

7-day live D1 query · CostCapDO state · per-pillar spend · free-tier ceilings · refresh of cost-surface.html (build-time pull 2026-05-25)

This refresh pulls the actual 7-day chat_decision_audit numbers (council_v2 + single routing modes) at build time, lays them out as a 4-lane funnel (AI tokens / NetSuite API / Vectorize / D1+KV+R2 → CostCapDO daily cap → total $/day), and exposes per-pillar spend, free-tier ceilings, and the kill-switch braking surface. Replaces the static rounded estimates in cost-surface.html.

live · 7-day pull CostCapDO active council_v2 = $0.0165/query (above $0.007 target)

1 · Visual funnel 4 source lanes → CostCapDO brake → $/day total

Cost surface — 4-lane funnel

2 · 7-day live numbers by routing mode

routing_mode	queries	spend	avg / query	avg latency
`council_v2`	564	$9.32	$0.0165	33,838 ms
`single`	551	$1.87	$0.0034	5,118 ms
total	1,115	$11.20	$0.0100	—

As of 2026-05-25 · SELECT routing_mode, COUNT(*) AS n, SUM(cost_usd) AS spend FROM chat_decision_audit WHERE created_at >= date('now','-7 days') GROUP BY routing_mode

3 · Free-tier ceilings & headroom

Service	Ceiling	Current usage	Headroom
Vectorize queries	5M / month free	~30K queries/month	99% headroom
D1 row reads	25M / day on Workers Paid	~50K / day	99% headroom
KV reads	10M / day free	~10K / day	99% headroom
R2 Class A ops	10M / month on Workers Paid	~5K / month	99% headroom
Workers AI Llama 3.3 70B	10K neurons/day free	~3K / day	70% headroom
Anthropic API	60 req/min workspace limit	~0.3 RPS peak	97% headroom
CostCapDO daily cap	$5 / day (configurable)	~$1.60 / day average	68% headroom

4 · Per-source stage detail

L1 AI tokens live 7d

Council v2 fires 3 models in parallel + chairman synthesis. 7-day avg $0.0165/query — currently above the $0.007 target. Driver: chairman call adds ~$0.003.

Models

claude-sonnet-4.6 (Anthropic) · kimi-k2.5 (free via Cursor) · @cf/meta/llama-3.3-70b-instruct

Live 7d

564 council queries × $0.0165 = $9.32

Live 7d

551 single queries × $0.0034 = $1.87

Action

Investigate why council avg is 2.4× target — likely the chairman call is reusing too much context.

L2 NetSuite API stable

Custom RESTlet via TBA OAuth1. ~120 sync touches/day across 21 tables. Per-call cost ~$0.001 in NS governance unit terms; CF egress free.

RESTlet

customscript_gfs_platform_query

Push queue

NS_PUSH_QUEUE · drain via /api/ns-push/drain

Tiers

hot 2min · warm 5min · cold 60min

L3 Vectorize free tier

Top-k retrieval per chat call. 3,371 chunks live. Mostly within free tier; embedding-write batches charged separately during nightly corpus build.

Index

gfs-knowledge

Per query

~$0.00005 (top-k retrieval)

Cite

R555 corpus build

L4 D1 + KV + R2 free tier

Almost entirely within Workers Paid plan free allowances. Per-chat ~$0.0002 for D1 reads; KV/R2 negligible.

D1 rows

311K mirrored across 162 tables

5min entity-summary cache · kill-switch flags

spec PDFs · pricing exports · backup snapshots

G1 CostCapDO DO active

Durable Object maintains 24h sliding window. Forces route to single-model + returns 429 when daily cap exceeded.

Binding

env.COST_CAP_DO

Source

src/durable_objects.ts

Cap

$5/day default · configurable via /api/ai/caps

Reset

POST /api/ai/caps/reset

Bump

POST /api/ai/caps/emergency-bump

G2 Kill switches 5 toggles

KV-backed global switches that braking paths check. See kill-switches-state-machine.html for state machines.

Keys

kill:ns_writes · kill:proposed_apply · kill:email_intake · kill:external_portals · kill:high_risk_ops

Endpoint

GET /api/kill-switches · POST /api/kill-switches/flip