GFS Data Tagger — master

Intake → Tag + Train → Apply + Push · 9 extraction strategies · per-customer templates · HITL on every NS write

The 5th platform pillar — a platform-level layer that sits between inbound documents and downstream actions. Mike uploads a sample PDF (e.g. Driscoll customer PO), parses to markdown, visually tags each region or span to a target NetSuite field (e.g. SalesOrd.bodyFields.otherrefnum), saves a per-customer template, then future inbound docs from that customer auto-extract using the saved template. Output is clean NS-ready records staged via HITL for write to NetSuite via NS_PUSH_QUEUE. Three worked use cases: Path 1 customer PO → SO (the first deployed use case — Driscoll Foods); Path 2 vendor COA → compliance; Path 3 bid RFP → pipeline (bridges to the Bid Center pillar). Operator modes: visual tagger at /data-tagger.html (Agent BB-2 owns) and chat-driven (Agent BB-3 adds chat tools).

migration 142 - 9 extraction strategies (Agent BB-1) HITL on every NS write (ADR-031) Lane 1 · Intake Lane 2 · Tag + Train Lane 3 · Apply + Push Live tool: /data-tagger.html (in flight, Agent BB-2)

Pipeline — 3 lanes (intake / tag+train / apply+push) · (customer + doc_type + ns_record_type) threading

idle
Data Tagger lifecycle - 3 intake channels (UI upload / inbound_email_log / chat) - parse to markdown via document_converter - identify customer+doc_type+ns_record_type - lookup or train per-customer template - apply 9 extraction strategies (regex_after_label, table_with_headers, etc.) - compute confidence - stage proposed_action - HITL approve - NS_PUSH_QUEUE writes NS record - reflexion updates template metrics LANE 1 / INTAKE · PDF arrives (UI upload OR inbound_email_log OR chat) LANE 2 / TAG + TRAIN · parse, identify, lookup-or-train, save template LANE 3 / APPLY + PUSH · run 9 strategies, confidence, HITL, NS_PUSH_QUEUE, reflexion WHAT THIS DOES: Operator opens /data-tagger.html and drag-drops a PDF (or sample of one). This is the primary visual-tagger surface owned by Agent BB-2. Used both for training (first sample of a new doc type) and ad-hoc one-off extractions. D1 TABLE: data_tagger_uploads (raw upload audit) R2 BUCKET: gfs-data-tagger-samples/<customer>/<doc_type>/<timestamp>.pdf TECHNICAL DETAILS: CHANNEL 1 - UI UPLOAD SURFACE /data-tagger.html (Agent BB-2 owns) WIDGETS drop-zone, preview pane, span-overlay canvas ACTION upload PDF -> R2 POST /api/data-tagger/upload insert data_tagger_uploads row USE primary path for FIRST-EVER sample of a new customer or doc type STATUS: in flight (Agent BB-2) Channel 1 · UI upload /data-tagger.html (Agent BB-2) drag-drop PDF · preview · span overlay PDF → R2 + data_tagger_uploads primary path for NEW customer / doc FRONTEND · VISUAL TAGGER UI i WHAT THIS DOES: A PDF lands in one of the 5 inbound mailboxes (orders@, bids@, vendors@, etc.). src/email.ts logs to inbound_email_log + saves attachment to R2. If sender domain matches a known customer with a saved template - auto-route to the Data Tagger apply path. D1 TABLE: inbound_email_log (existing, R558+) R2 BUCKET: gfs-inbound-attachments/ TECHNICAL DETAILS: CHANNEL 2 - INBOUND EMAIL AUTO MAILBOXES orders@, bids@, vendors@, claims@, contact@ ACTION src/email.ts parses headers + attachments document_converter -> markdown sender domain -> customers.email_domain if (customer found) AND (template exists for customer+doc_type+ns_record) auto-route to LANE 3 apply path else stage proposed_action for Mike review WORKED EXAMPLE orders@ai-globalfoodsolutions.co receives PO from purchasing@driscoll-foods.com -> matches Driscoll Foods customer -> finds template tpl_driscoll_po_so_v3 -> auto-extract STATUS: REAL (intake) + REAL (template lookup) + STUB (auto-extract handoff) SEE: ns-data-tagger-path-1-customer-po-to-so.html Channel 2 · inbound email (auto) 5 mailboxes · src/email.ts PDF attachment -> R2 + inbound_email_log sender domain match → template lookup e.g. purchasing@driscoll-foods.com -> Driscoll EXTERNAL · EMAIL ROUTING · PRIMARY PROD PATH i WHAT THIS DOES: Operator drops a PDF into the chat surface and chats with the Data Tagger agent. Chat-driven mode lets the operator instruct the system in plain English ("tag this region as the PO number", "this column is item quantity") instead of clicking on the visual tagger. CHAT TOOLS: data_tagger_train, data_tagger_apply, data_tagger_save_template (Agent BB-3 adds these) TECHNICAL DETAILS: CHANNEL 3 - CHAT UPLOAD SURFACE chat.ai-globalfoodsolutions.co or admin-dashboard chat TRIGGER Mike: "tag this PDF for ACC Distributors customer PO" drag-drop PDF into chat input TOOLS (Agent BB-3 owns) data_tagger_train(customer_id, doc_type, ns_record_type, span_to_field[]) data_tagger_apply(customer_id, doc_type, pdf_url) data_tagger_save_template(template_id, strategies[]) USE power-user mode + remote training without opening /data-tagger.html STATUS: in flight (Agent BB-3) Channel 3 · chat upload chat.ai-globalfoodsolutions.co data_tagger_* chat tools (Agent BB-3) "tag this PDF for Customer X" power-user mode · remote training EXTERNAL · CHAT UI · AGENT BB-3 i WHAT THIS DOES: src/document_converter.ts converts PDF / DOCX / XLSX to markdown - preserving span coordinates so the visual tagger can overlay tag boxes on top of the rendered preview. CODE: src/document_converter.ts (existing, R524+) TECHNICAL DETAILS: PARSE TO MARKDOWN INPUT PDF / DOCX / XLSX from R2 OUTPUT markdown text + span_coordinates[{x,y,w,h,text}] USE feeds both visual tagger overlay AND extraction strategies STATUS: REAL (dual-converter validated R46) parse to markdown document_converter.ts PDF/DOCX/XLSX -> md + spans BACKEND · SHARED PARSER i WHAT THIS DOES: The trace thread of the Data Tagger pillar is (customer_id + doc_type + ns_record_type). For UI uploads operator picks from dropdowns; for inbound emails sender domain resolves customer, subject + body classifier picks doc_type, and target ns_record_type comes from the doc_type mapping. D1 TABLES: customers, name_synonyms, data_tagger_doc_types TECHNICAL DETAILS: IDENTIFY (customer + doc_type + ns_record_type) THREAD KEY customer_id + doc_type + ns_record_type example = "Driscoll Foods / po_inbound / SalesOrd" LOGIC customer_id = resolveCustomer(sender_domain | operator_input, name_synonyms) doc_type = classify(markdown.firstPage | operator_pick) e.g. po_inbound, coa, bid_rfp ns_record = map(doc_type) e.g. po_inbound -> SalesOrd, coa -> vendor_coas, bid_rfp -> bid_external_pipeline INVARIANT every downstream row carries the 3-key thread STATUS: REAL identify (customer + doc_type + ns_record_type) name_synonyms + doc classifier + record map example: "Driscoll Foods / po_inbound / SalesOrd" DATABASE · THE 3-KEY THREAD i WHAT THIS DOES: With the 3-key thread, look up an existing saved template. Templates are versioned per customer+doc+record; hit_count and success_count track how often they fire. D1 TABLE: data_tagger_templates (migration 142) TECHNICAL DETAILS: LOOKUP TEMPLATE QUERY SELECT * FROM data_tagger_templates WHERE customer_id=? AND doc_type=? AND ns_record_type=? AND status='active' ORDER BY version DESC LIMIT 1 RETURNS template with field_tags[] + strategy per field OR null (means: train a new template) STATUS: REAL (table) - migration 142 (Agent BB-1) lookup template data_tagger_templates (mig 142) customer+doc+record · latest active DATABASE · PER-CUSTOMER TEMPLATE i WHAT THIS DOES: The branch point. If a template exists for the 3-key thread, jump straight to LANE 3 apply. If no template - train a new one (operator visually tags the regions, picks strategy per field, saves). TECHNICAL DETAILS: DECISION - TEMPLATE FOUND? YES path go to LANE 3 (apply + push) NO path go to NODE 8 visual tag UI -> NODE 9 strategy picker -> NODE 10 save template NOTE first-time customer always lands on NO path 10th-time customer for same doc always lands on YES path STATUS: REAL template found? YES → LANE 3 apply NO → train new template (nodes 8 - 10) MESSAGEBUS · BRANCH POINT i WHAT THIS DOES: Operator at /data-tagger.html draws boxes on the rendered PDF preview, each box assigned to a target NS field (e.g. SalesOrd.bodyFields.otherrefnum, lineFields.item, etc.). Side panel shows the parsed markdown + cursor highlights the span as you hover the box. TECHNICAL DETAILS: VISUAL TAG UI SURFACE /data-tagger.html (Agent BB-2 owns) GESTURES drag-rect-on-preview to create a tag click-tag → open field picker (NS schema autocomplete) click-tag → preview extracted value live FIELD PICKER reads NS field catalog (records / bodyFields / lineFields) e.g. SalesOrd.bodyFields.otherrefnum / .shipdate / .memo SalesOrd.lineFields.item / .quantity / .rate HITL saving a template is a Mike-only action (ADR-031 read-side gate) STATUS: in flight (Agent BB-2) visual tag UI (operator draws boxes) /data-tagger.html · drag-rect on preview each box -> NS field picker (autocomplete) side panel shows md spans live e.g. SalesOrd.bodyFields.otherrefnum FRONTEND · HITL · AGENT BB-2 i WHAT THIS DOES: Per tagged field the operator picks ONE of 9 extraction strategies. Each strategy is a different machine for getting the value back out of future documents. The choice is what makes a template robust vs brittle. THE 9 STRATEGIES (migration 142, Agent BB-1): 1. regex_after_label - find label, capture text after (e.g. "P.O. #") 2. regex_before_label - find label, capture text before 3. fixed_region - coordinates always at same x,y,w,h 4. table_with_headers - locate table by header row, grab column 5. multi_line_span - span starts here and runs N lines (address block) 6. whole_section - everything between two anchors (memo body) 7. formula - compute (e.g. quantity * rate) 8. llm_with_schema - last resort, expensive, schema-constrained 9. literal_constant - just hardcode (entity = "Driscoll" always) TECHNICAL DETAILS: STRATEGY PICKER SURFACE /data-tagger.html field side panel PER FIELD pick strategy + capture pattern + confidence threshold EXAMPLE - Driscoll PO otherrefnum -> regex_after_label("P.O. #") entity -> literal_constant("Driscoll Foods") shipaddress -> multi_line_span(start="SHIP TO", lines=4) shipdate -> regex_after_label("Delivery Date") lineFields.item -> table_with_headers(header="Item #") STATUS: schemas REAL (migration 142, Agent BB-1) · UI in flight (BB-2) strategy picker (1 of 9 per field) regex_after_label · regex_before_label · fixed_region table_with_headers · multi_line_span · whole_section formula · llm_with_schema · literal_constant migration 142 schemas (Agent BB-1) BACKEND · 9 EXTRACTION STRATEGIES i WHAT THIS DOES: Save the trained template. Versioned (v1, v2, ... ) so improvements don't break already-working extractions. status='active' flips the previous version to status='superseded'. D1 TABLE: data_tagger_templates (migration 142) TECHNICAL DETAILS: SAVE TEMPLATE INSERT data_tagger_templates template_id = ulid() customer_id = Driscoll Foods doc_type = po_inbound ns_record_type = SalesOrd version = 1 (or prior_max + 1) status = 'active' field_tags = JSON array of { ns_field, strategy, pattern, confidence_threshold } created_by = mikelevine@globalfoodsolutions.co created_at = now hit_count = 0 success_count = 0 PRIOR VERSION UPDATE prev SET status='superseded' STATUS: REAL (table contract migration 142, Agent BB-1) save template (versioned) INSERT data_tagger_templates field_tags[] = strategy+pattern per field prior version -> status=superseded hit_count + success_count for reflexion DATABASE · MIGRATION 142 i WHAT THIS DOES: Run each tagged field through its chosen strategy against the inbound markdown. Field-by-field extraction. Strategies are pure functions of (markdown, pattern) so they're testable and deterministic (except llm_with_schema). D1 TABLE: data_tagger_extractions (write the result) TECHNICAL DETAILS: APPLY 9 STRATEGIES FOR each field_tag in template.field_tags switch (strategy) regex_after_label -> match label, capture next group regex_before_label -> match label, capture prior group fixed_region -> slice md by span coords table_with_headers -> locate header row, walk column multi_line_span -> anchor line + N whole_section -> between two anchors formula -> eval against prior fields llm_with_schema -> Workers AI JSON-schema call literal_constant -> just return the constant store { ns_field, value, confidence } in extraction WRITE INSERT data_tagger_extractions row STATUS: REAL (strategies migration 142, Agent BB-1) apply 9 strategies (per field) pure functions of (markdown, pattern) testable + deterministic (except llm_with_schema) writes data_tagger_extractions row BACKEND · PURE EXTRACTION FUNCTIONS i WHAT THIS DOES: Compute per-field confidence + overall confidence. Drives the auto-stage vs HITL-first branch. Confidence per strategy is well-defined: regex match yes/no, fixed_region got non-empty value, table had expected columns, llm returned schema-valid result. TECHNICAL DETAILS: COMPUTE CONFIDENCE PER FIELD regex_after_label -> match.found ? 0.95 : 0.0 fixed_region -> non_empty ? 0.9 : 0.3 table_with_headers -> column_aligned ? 0.92 : 0.5 multi_line_span -> anchor_found ? 0.88 : 0.4 llm_with_schema -> model self-report (clamp 0.6 .. 0.9) literal_constant -> 1.0 (always) OVERALL weighted average over required fields BRANCH overall > 0.85 -> auto-stage SO draft (still HITL approve) overall <= 0.85 -> HITL review extraction FIRST STATUS: REAL (formulas migration 142, Agent BB-1) compute confidence per field + overall (weighted) threshold 0.85 -> auto vs HITL-first data_tagger_extractions.confidence column DATABASE · BRANCH CONDITION i WHAT THIS DOES: Every extraction surfaces as a proposed_action - either "Approve extracted PO data and draft SO?" (high confidence) or "Review extracted PO before SO draft?" (low confidence). Mike approves, corrects, or rejects in admin-dashboard. ADR-031 invariant. D1 TABLE: proposed_actions (kind='data_tagger_extraction') TECHNICAL DETAILS: STAGE PROPOSED ACTION INSERT proposed_actions kind = 'data_tagger_extraction' payload = JSON { extraction_id, template_id, customer_id, doc_type, ns_record_type, extracted_fields[], confidence, target_record_payload } status = 'pending' created_at = now TWO MODES high conf - title: "Auto-extracted ... approve to draft SO?" low conf - title: "Review extracted ... before SO draft?" HITL INVARIANT ADR-031: every NS write needs a row here STATUS: REAL (table existed pre-R598) ★ stage proposed_action (HITL) kind='data_tagger_extraction' high conf → auto-draft · low conf → review-first ADR-031: no NS write without this row SECURITY · HITL GATE i WHAT THIS DOES: Mike opens admin-dashboard / proposed-actions, sees the extraction side-by-side with the source PDF. Can approve as-is, edit specific fields, or reject. Corrections feed back to reflexion (Node 17). SURFACE: admin-dashboard.html proposed_actions panel + side-by-side extraction-review widget TECHNICAL DETAILS: OPERATOR REVIEW SURFACE admin-dashboard.html WIDGETS left: PDF preview with tag boxes overlay right: extracted fields editable form ACTIONS approve -> status=approved (fire NS_PUSH_QUEUE write) edit + approve -> corrections stored, then approve reject -> status=rejected (no NS write) reassign -> this was wrong customer/doc, re-thread STATUS: existing (proposed_actions UI) - extraction widget in flight operator review (Mike approves/edits) admin-dashboard side-by-side PDF preview + editable fields approve / edit+approve / reject / reassign FRONTEND · HITL APPROVAL i WHAT THIS DOES: On approval, NS_PUSH_QUEUE drains the write. ns_record_type from the template decides which endpoint (Sales Order, Vendor COA insert, bid_external_pipeline insert, etc.). PushMutexDO prevents collisions per customer. D1 TABLE: ns_pending_pushes (drained) TECHNICAL DETAILS: NS_PUSH_QUEUE DRAIN ROUTING ns_record_type='SalesOrd' -> POST /api/ns/push/sales-order ns_record_type='vendor_coas' -> INSERT D1 vendor_coas (not in NS) ns_record_type='bid_external_pipeline' -> INSERT D1 (no NS write) GUARDS PushMutexDO per customer_id X-Edit-Token preview=false confirm=true ON SUCCESS proposed_actions.status=done fire event events.data_tagger.extracted_to_ns ON FAIL retry with backoff, surface partial-failure STATUS: REAL (NS_PUSH_QUEUE) - SO push endpoint REAL NS_PUSH_QUEUE drain routes by ns_record_type PushMutexDO per customer · X-Edit-Token retry-on-fail · partial-failure surface MESSAGEBUS · DURABLE QUEUE + MUTEX i WHAT THIS DOES: NetSuite now holds the extracted record. For Path 1 (customer PO -> SO): SO is created with otherrefnum = customer PO# - establishing the PO# trace thread that carries into invoices, payments, etc. For Path 2: vendor COA logged. For Path 3: bid pipeline updated. NS RECORD: per ns_record_type (SalesOrd / etc.) TECHNICAL DETAILS: NS RECORD CREATED · LOOP CLOSED FOR THIS DOC WRITE ns_record_type drives the destination EXAMPLES Path 1 - SO created with otherrefnum = PO# (Driscoll's PO#) -> otherrefnum threads to Invoice, Payment, CashSale downstream Path 2 - D1 vendor_coas row + events.coa.received fires Path 3 - bid_external_pipeline row + bid_pipeline_review workflow starts EVENTS events.data_tagger.extracted_to_ns (always) events.so.created_from_po (Path 1) events.coa.received (Path 2) events.bid.rfp_logged (Path 3) STATUS: REAL (per ns_record_type endpoints) ★ NS record created · loop closed routed per ns_record_type PO# -> SO.otherrefnum (Path 1 trace thread) events.data_tagger.extracted_to_ns fires FINANCE-KEY · NS RECORD WRITTEN i WHAT THIS DOES: The template gets smarter over time. Every approved extraction increments hit_count and (on no-correction) success_count. Operator corrections feed back into strategy weights - if regex_after_label keeps producing values Mike corrects, the system suggests upgrading to llm_with_schema or fixed_region. D1 TABLE: data_tagger_templates (UPDATE) + data_tagger_template_corrections (INSERT) TECHNICAL DETAILS: REFLEXION LOOP ON APPROVE NO-EDIT UPDATE data_tagger_templates SET hit_count = hit_count+1, success_count = success_count+1, last_success_at = now ON APPROVE WITH EDITS INSERT data_tagger_template_corrections (field, prior_value, corrected_value, strategy) UPDATE data_tagger_templates SET hit_count = hit_count+1, last_correction_at = now ON REJECT UPDATE data_tagger_templates SET miss_count = miss_count+1 SUGGESTION if miss_count / hit_count > 0.2 - surface "train new version?" STATUS: REAL (schemas migration 142) reflexion (template gets smarter) hit_count, success_count, miss_count corrections logged per field miss_count/hit_count > 0.2 -> suggest retrain BACKEND · SELF-IMPROVING LOOP i WHAT THIS DOES: Every successful extraction fires events into the event ledger (R549 substrate) so downstream consumers (customer health, bid pipeline review, COA compliance) can react. D1 TABLE: events (R549 event ledger) TECHNICAL DETAILS: EVENTS FIRED events.data_tagger.extracted_to_ns (always) events.so.created_from_po (Path 1) events.coa.received (Path 2) events.bid.rfp_logged (Path 3) events.data_tagger.template_used (every apply) events.data_tagger.template_corrected (on edit) SUBSCRIBERS customer_health watcher bid_pipeline_review workflow compliance/COA monitor STATUS: REAL (event ledger R549) events fired (event ledger) events.data_tagger.extracted_to_ns events.so.created_from_po (Path 1) events.coa.received (Path 2) MESSAGEBUS · EVENT LEDGER (R549) i WHAT THIS DOES: The Data Tagger pillar serves three worked use cases today. Path 1 is the deployed reference (Driscoll PO -> SO). Path 2 brings vendor COAs into compliance. Path 3 bridges into Bid Center. SEE: 3 path docs TECHNICAL DETAILS: THREE USE CASES PATH 1 - CUSTOMER PO -> SO (deployed reference) Driscoll Foods emails PO -> auto-extract via tpl_driscoll_po_so_v3 -> high conf 0.92 -> auto-draft SO -> Mike approves -> NS SO created with otherrefnum = customer PO# (establishes the SO PO# trace thread) PATH 2 - VENDOR COA -> COMPLIANCE Vendor sends COA -> extract lot, item_code, test_date, parameters[], pass/fail -> INSERT vendor_coas (TBD table) -> events.coa.received PATH 3 - BID RFP -> PIPELINE Bid arrives at bids@ -> extract bid_id, customer/district, items[], decision_date, bid_value -> INSERT bid_external_pipeline -> bid_pipeline_review workflow (migration 136) BRIDGE TO BID CENTER PILLAR STATUS: Path 1 REAL, Path 2 + 3 schemas REAL, UI in flight three worked use cases (path docs) Path 1 customer PO -> SO (deployed reference) Path 2 vendor COA -> compliance Path 3 bid RFP -> pipeline (bridges Bid Center) CLOUD · SEE PATH DETAIL DOCS i WHAT THIS DOES: /data-tagger.html is the live operator surface for the Data Tagger pillar - both for training new templates and reviewing extractions. Owned by Agent BB-2. URL: /data-tagger.html TECHNICAL DETAILS: LIVE OPERATOR SURFACE URL https://gfs-netsuite.pages.dev/data-tagger.html OWNER Agent BB-2 STATUS: in flight live operator surface · /data-tagger.html (Agent BB-2) CLOUD · CLOUDFLARE PAGES · THE VISUAL TAGGER ITSELF i PDF -> R2 attachment -> R2 chat upload -> R2 md + spans 3-key thread YES · template found NO · train new template tag boxes strategy/field first-apply · new template extracted conf HITL surface approve NS write reflexion metrics subscribers UPDATE hit_count / success_count LEGEND Lane 1 intake Lane 2 tag+train Lane 3 apply+push HITL gate ★ NS write
Glossary · cluster colors + thread terms
Database (templates, extractions, identity)
Backend (parser, strategies, reflexion)
Cloud (tagger UI, live surface)
Messagebus (events, NS_PUSH_QUEUE, branch)
HITL gate (proposed_actions)
★ NS write
customer + doc_type + ns_record_type: the trace thread for the Data Tagger pillar
9 strategies: regex_after_label, regex_before_label, fixed_region, table_with_headers, multi_line_span, whole_section, formula, llm_with_schema, literal_constant
data_tagger_templates: per-(customer, doc, record) template; versioned
data_tagger_extractions: one row per inbound document processed

Phase detail — 3 lanes

L1 Intake — 3 channels REAL

UI upload via /data-tagger.html, inbound email auto-route (5 mailboxes), or chat upload (Agent BB-3 tools).
UI surface
/data-tagger.html (Agent BB-2)
Email pipeline
src/email.ts + 5 mailboxes
Chat tools
data_tagger_train, data_tagger_apply, data_tagger_save_template (Agent BB-3)

L2 Tag + Train migration 142 + Agent BB-2 in flight

Parse to markdown, identify the 3-key thread, lookup template; if absent, operator visually tags each field and picks a strategy from the 9; template saved versioned.
Thread key
customer_id + doc_type + ns_record_type (e.g. Driscoll Foods / po_inbound / SalesOrd)
Parser
src/document_converter.ts
Strategies
regex_after_label · regex_before_label · fixed_region · table_with_headers · multi_line_span · whole_section · formula · llm_with_schema · literal_constant

L3 Apply + Push REAL for SO; STUB for COA / bid

Per-field strategies run, per-field + overall confidence, HITL stage, operator approve, NS_PUSH_QUEUE writes, reflexion updates template metrics, events fire.
HITL
ADR-031 invariant - every NS write needs a proposed_actions row
Confidence
threshold 0.85 -> auto-draft path; below -> review-first
Reflexion
hit_count / success_count / miss_count per template

Tables, files, endpoints, code paths

kindnamepurpose
Live tool/data-tagger.htmlvisual tagger UI (Agent BB-2)
D1 tabledata_tagger_templatesper-(customer,doc,record) template; versioned (migration 142)
D1 tabledata_tagger_extractionsone row per inbound document processed
D1 tabledata_tagger_template_correctionsoperator edits for reflexion
D1 tabledata_tagger_doc_typesdoc_type -> ns_record_type mapping
D1 tabledata_tagger_uploadsraw upload audit
D1 tableinbound_email_logexisting email audit
D1 tableproposed_actionsHITL queue (kind=data_tagger_extraction)
D1 tableeventsevent ledger (R549)
R2 bucketgfs-data-tagger-samplesuploaded sample PDFs
R2 bucketgfs-inbound-attachmentsemail attachments
EndpointPOST /api/data-tagger/uploadUI upload
EndpointPOST /api/data-tagger/trainsave tagged template
EndpointPOST /api/data-tagger/applyrun template against inbound doc
EndpointPOST /api/proposed-actions/decideapprove / reject extraction
EndpointPOST /api/ns/push/sales-orderNS SO write-back (Path 1)
Code pathsrc/document_converter.tsPDF/DOCX/XLSX -> markdown
Code pathsrc/email.ts5-mailbox inbound pipeline
Code pathsrc/chat_tools/impls.tsdata_tagger_* tools (Agent BB-3)
Migration142_data_tagger.sqlAgent BB-1 - 9 strategies + templates + extractions schema
Durable ObjectPushMutexDOper-customer NS write mutex

Open gaps — honest punch list

Path detail docs