Data Tagger — Path 1 · customer PO to SO

Driscoll Foods PO -> orders@ -> tpl_driscoll_po_so_v3 -> 8 field tags -> conf 0.92 -> auto-draft -> HITL -> SO with otherrefnum = customer PO#

The first deployed Data Tagger use case. Driscoll Foods emails a purchase order to orders@ai-globalfoodsolutions.co. The system parses the PDF, recognizes the sender as Driscoll, looks up the saved template tpl_driscoll_po_so_v3, and applies 8 field tags: regex_after_label for P.O. #, literal_constant for entity, multi_line_span for shipaddress, regex_after_label for Delivery Date, whole_section for memo, and three table_with_headers for line items (item / qty / price). Overall confidence 0.92 (above 0.85 auto-draft threshold) so an SO draft is staged as a proposed_action. Mike approves in admin-dashboard. NS_PUSH_QUEUE writes the SO with otherrefnum set to the customer PO# — establishing the PO# trace thread that carries through Invoice and CashSale downstream.

deployed reference HITL on SO draft (ADR-031) template versioned (v3 current)

Pipeline — email arrives → auto-extract → HITL approve → SO created

idle
Driscoll PO email arrives - PDF parsed - sender domain resolves customer - template tpl_driscoll_po_so_v3 found - 8 field strategies run (regex_after_label PO# - literal_constant entity - multi_line_span shipaddress - regex_after_label shipdate - whole_section memo - table_with_headers item/qty/price) - confidence 0.92 - auto-stage SO draft - Mike approves - NS_PUSH_QUEUE writes SO with otherrefnum=PO# - reflexion increments hit_count - events fire LANE 1 / INTAKE · Driscoll PO arrives at orders@ LANE 2 / IDENTIFY · sender domain -> customer -> template LANE 3 / APPLY 8 STRATEGIES · field by field extraction LANE 4 / CONFIDENCE + HITL · auto-draft vs review-first LANE 5 / NS SO WRITE + REFLEXION + EVENTS WHAT THIS DOES: Driscoll Foods purchasing sends a PO PDF as an email attachment to orders@ai-globalfoodsolutions.co. Cloudflare Email Routing delivers it to src/email.ts. TECHNICAL DETAILS: EMAIL ARRIVES FROM purchasing@driscoll-foods.com TO orders@ai-globalfoodsolutions.co SUBJECT PO 8801772 - Global Food Solutions ATTACHMENT PO_8801772.pdf DELIVERY Cloudflare Email Routing -> src/email.ts STATUS: REAL Driscoll emails PO purchasing@driscoll-foods.com orders@ai-globalfoodsolutions.co attachment: PO_8801772.pdf EXTERNAL · CF EMAIL ROUTING i WHAT THIS DOES: src/email.ts logs the email to inbound_email_log (existing R558+ table) and saves the PDF attachment to the gfs-inbound-attachments R2 bucket. Email metadata + R2 key linked. D1 TABLE: inbound_email_log R2 BUCKET: gfs-inbound-attachments TECHNICAL DETAILS: LOG + R2 INSERT inbound_email_log email_id = ulid() mailbox = orders@ from_addr = purchasing@driscoll-foods.com subject = PO 8801772 - Global Food Solutions received_at = 2026-05-27T13:42:18Z r2_key = inbound/2026-05-27/PO_8801772.pdf PUT R2 gfs-inbound-attachments/inbound/2026-05-27/PO_8801772.pdf STATUS: REAL log + save PDF to R2 inbound_email_log INSERT gfs-inbound-attachments/inbound/2026-05-27/PO_8801772.pdf r2_key linked to email_id DATABASE · AUDIT TRAIL i WHAT THIS DOES: document_converter.ts parses the PDF to markdown plus span coordinates. The span coordinates let later strategies reference precise positions (for fixed_region) or anchor to label text (for regex strategies). CODE: src/document_converter.ts TECHNICAL DETAILS: PARSE TO MARKDOWN INPUT R2 key inbound/2026-05-27/PO_8801772.pdf OUTPUT { markdown: "...", spans: [{x,y,w,h,text}, ...] } USE feeds both regex strategies AND fixed_region strategies STATUS: REAL parse to markdown document_converter.ts PDF -> markdown + span[{x,y,w,h}] feeds regex + fixed_region BACKEND · SHARED PARSER i WHAT THIS DOES: Match the sender email domain to a customer. customers.email_domain column holds the matched substring. If multiple matches, pick the most-specific. D1 TABLE: customers TECHNICAL DETAILS: RESOLVE CUSTOMER FROM SENDER DOMAIN SQL SELECT customer_id, customer_name FROM customers WHERE 'purchasing@driscoll-foods.com' LIKE '%' || email_domain ORDER BY length(email_domain) DESC LIMIT 1 RESULT customer_id = 478 customer_name = Driscoll Foods FALLBACK if no match -> name_synonyms fuzzy on body text STATUS: REAL resolve customer sender domain -> customers.email_domain customer_id = 478 (Driscoll Foods) fallback: name_synonyms fuzzy on body DATABASE · CUSTOMER MATCH i WHAT THIS DOES: Classify the doc_type from the parsed markdown - first-page features (keywords: PURCHASE ORDER, P.O. #, INVOICE, COA, RFP). Maps to ns_record_type. D1 TABLE: data_tagger_doc_types (migration 142) TECHNICAL DETAILS: CLASSIFY DOC_TYPE INPUT markdown first-page text LOGIC keyword scan + scoring "PURCHASE ORDER" / "P.O. #" -> po_inbound "CERTIFICATE OF ANALYSIS" / "COA" -> coa "BID" / "RFP" -> bid_rfp RESULT doc_type = po_inbound ns_record_type = SalesOrd (via data_tagger_doc_types map) STATUS: REAL classify doc_type keyword scan: PURCHASE ORDER / P.O. # doc_type = po_inbound ns_record_type = SalesOrd (via map) BACKEND · DOC CLASSIFIER i WHAT THIS DOES: Find the saved template for the 3-key thread. Returns tpl_driscoll_po_so_v3 with 8 field tags + strategy per field. D1 TABLE: data_tagger_templates (migration 142) TECHNICAL DETAILS: LOOKUP TEMPLATE SQL SELECT * FROM data_tagger_templates WHERE customer_id=478 AND doc_type='po_inbound' AND ns_record_type='SalesOrd' AND status='active' ORDER BY version DESC LIMIT 1 RESULT template_id = tpl_driscoll_po_so_v3 version = 3 field_tags = [8 tagged fields] hit_count = 47 success_count = 45 (95.7% so far) STATUS: REAL (template) - migration 142 (Agent BB-1) lookup template data_tagger_templates LIMIT 1 tpl_driscoll_po_so_v3 (8 field tags) hit_count=47 · success_count=45 (95.7%) DATABASE · TEMPLATE V3 i WHAT THIS DOES: The pivotal extraction. Capture the customer PO number which becomes SO.bodyFields.otherrefnum - the trace thread for the PO -> SO -> Invoice -> CashSale chain. TECHNICAL DETAILS: STRATEGY: regex_after_label NS FIELD: SalesOrd.bodyFields.otherrefnum PATTERN /P\.?O\.?\s*#\s*([0-9A-Z\-]{4,20})/i INPUT markdown line: "P.O. # 8801772" OUTPUT value = "8801772" confidence = 0.97 (match found, length 7 within range) THIS IS THE TRACE-THREAD FIELD PO.po_number -> SO.otherrefnum -> Invoice.otherrefnum -> CashSale.otherrefnum example value "8801772" carries through Driscoll's entire downstream order chain STATUS: REAL [1] regex_after_label P.O. # SalesOrd.bodyFields.otherrefnum /P\.?O\.?\s*#\s*([0-9A-Z\-]{4,20})/i value = "8801772" confidence = 0.97 ★ THE TRACE-THREAD FIELD PO# carries through SO -> Inv -> CashSale BACKEND · STRATEGY 1 / 9 i WHAT THIS DOES: We already know who sent this email - the entity is locked to Driscoll Foods. literal_constant is the right strategy when you trust the identity from outside the document. TECHNICAL DETAILS: STRATEGY: literal_constant NS FIELD: SalesOrd.bodyFields.entity VALUE "Driscoll Foods" (resolved from customer_id=478 in step 4) WHY the sender domain ALREADY identified the customer no need to extract entity from PDF body text (and risk a parsing error) CONFIDENCE 1.0 (literal) STATUS: REAL [2] literal_constant entity SalesOrd.bodyFields.entity value = "Driscoll Foods" confidence = 1.0 trusted from sender domain match no need to parse PDF body BACKEND · STRATEGY 9 / 9 i WHAT THIS DOES: The ship address is a block of 4 lines starting after the SHIP TO anchor. multi_line_span grabs the lines and joins. TECHNICAL DETAILS: STRATEGY: multi_line_span NS FIELD: SalesOrd.bodyFields.shipaddress ANCHOR "SHIP TO:" LINES 4 (street1, street2/optional, city/state/zip, country) INPUT SHIP TO: Driscoll Foods Distribution Center 450 Industrial Blvd Trenton, NJ 08611 USA OUTPUT joined string "Driscoll Foods Distribution Center\n450 Industrial Blvd\nTrenton, NJ 08611\nUSA" confidence = 0.88 STATUS: REAL [3] multi_line_span shipaddress SalesOrd.bodyFields.shipaddress anchor "SHIP TO:" + 4 lines 450 Industrial Blvd, Trenton NJ confidence = 0.88 BACKEND · STRATEGY 5 / 9 i WHAT THIS DOES: Capture the requested delivery date. Sets SO.shipdate. TECHNICAL DETAILS: STRATEGY: regex_after_label NS FIELD: SalesOrd.bodyFields.shipdate PATTERN /Delivery Date\s*:?\s*([0-9\/\-\.]{8,10})/i INPUT "Delivery Date: 06/03/2026" OUTPUT value = "2026-06-03" (normalized to ISO) confidence = 0.95 STATUS: REAL [4] regex_after_label Delivery Date SalesOrd.bodyFields.shipdate value = "2026-06-03" confidence = 0.95 BACKEND · STRATEGY 1 / 9 i WHAT THIS DOES: The memo holds special instructions for the warehouse and CS team. whole_section grabs everything between two anchors. TECHNICAL DETAILS: STRATEGY: whole_section NS FIELD: SalesOrd.bodyFields.memo ANCHORS start = "SPECIAL INSTRUCTIONS:" end = "END NOTES" OR end-of-page INPUT (between anchors) "Please deliver before 10 AM. Inside delivery to receiving dock B. Driver must call ahead 30 minutes." OUTPUT value = the section text confidence = 0.83 (slightly below threshold for memo - operator will spot-check) STATUS: REAL [5] whole_section memo SalesOrd.bodyFields.memo SPECIAL INSTRUCTIONS .. END NOTES value = delivery instructions text confidence = 0.83 BACKEND · STRATEGY 6 / 9 i WHAT THIS DOES: Locate the line-item table by header row, walk the Item # column. TECHNICAL DETAILS: STRATEGY: table_with_headers NS FIELD: SalesOrd.lineFields.item HEADER "Item #" WALK for each row below header: capture text in column map to NS item_id via items.customer_item_alias[Driscoll] INPUT rows: SKU-419, SKU-2014, SKU-77, ... OUTPUT array of NS item IDs confidence = 0.94 (column aligned, all rows mapped) STATUS: REAL [6] table_with_headers Item # SalesOrd.lineFields.item header "Item #" -> column walk map via items.customer_item_alias confidence = 0.94 BACKEND · STRATEGY 4 / 9 i WHAT THIS DOES: Qty column. TECHNICAL DETAILS: STRATEGY: table_with_headers NS FIELD: SalesOrd.lineFields.quantity HEADER "Qty" OUTPUT array of integers confidence = 0.96 STATUS: REAL [7] table_with_headers Qty SalesOrd.lineFields.quantity header "Qty" confidence = 0.96 BACKEND · STRATEGY 4 / 9 i WHAT THIS DOES: Price column. Mapped to NS lineFields.rate. TECHNICAL DETAILS: STRATEGY: table_with_headers NS FIELD: SalesOrd.lineFields.rate HEADER "Price" OR "Unit Price" OUTPUT array of decimals confidence = 0.92 STATUS: REAL [8] table_with_headers Price SalesOrd.lineFields.rate header "Price" or "Unit Price" confidence = 0.92 BACKEND · STRATEGY 4 / 9 i WHAT THIS DOES: Compute the weighted overall confidence across all 8 fields. 0.92 is above the 0.85 auto-draft threshold so an SO draft is auto-staged (still HITL-approved). D1 TABLE: data_tagger_extractions TECHNICAL DETAILS: COMPUTE CONFIDENCE PER FIELD otherrefnum 0.97 (weight 2.0 - critical) entity 1.00 (weight 1.0) shipaddress 0.88 (weight 1.0) shipdate 0.95 (weight 1.0) memo 0.83 (weight 0.5) item 0.94 (weight 2.0 - line) quantity 0.96 (weight 2.0 - line) rate 0.92 (weight 2.0 - line) OVERALL weighted_avg = 0.92 BRANCH 0.92 > 0.85 -> AUTO-STAGE SO DRAFT INSERT data_tagger_extractions extraction_id, template_id, customer_id, doc_type, ns_record_type, extracted_fields_json, confidence=0.92, status='pending' STATUS: REAL compute confidence 8 fields · weighted overall = 0.92 0.92 > 0.85 -> AUTO-STAGE SO DRAFT INSERT data_tagger_extractions DATABASE · ABOVE THRESHOLD i WHAT THIS DOES: INSERT proposed_actions with the draft SO payload. Mike sees an "Auto-extracted Driscoll PO 8801772 - approve to create SO?" card. D1 TABLE: proposed_actions (kind='data_tagger_extraction') TECHNICAL DETAILS: STAGE SO DRAFT (HITL) INSERT proposed_actions kind = 'data_tagger_extraction' title = "Driscoll PO 8801772 - auto-extracted - approve to draft SO?" payload_json = { extraction_id, template_id: 'tpl_driscoll_po_so_v3', customer_id: 478, doc_type: 'po_inbound', ns_record_type: 'SalesOrd', confidence: 0.92, so_payload: { otherrefnum: '8801772', entity: 478, ... } } status = 'pending' ADR-031 INVARIANT - no NS write without this row STATUS: REAL ★ stage SO draft (HITL) proposed_actions INSERT title: "PO 8801772 - approve to draft SO?" ADR-031: no NS write without this row SECURITY · HITL INVARIANT i WHAT THIS DOES: Mike opens admin-dashboard.html / proposed-actions. Side-by-side: left pane shows the original PDF with tag-box overlays, right pane shows the 8 extracted fields as editable form. Mike spot-checks (memo confidence 0.83 was below 0.85) - text looks fine - taps Approve. SURFACE: admin-dashboard.html TECHNICAL DETAILS: MIKE REVIEWS + APPROVES LEFT PANE PDF preview + overlay boxes from template RIGHT PANE 8 extracted fields editable red highlight on memo (conf 0.83) ACTION Mike: approve POST /api/proposed-actions/decide body: { id, decision: 'approve' } RESULT proposed_actions.status = approved triggers NS_PUSH_QUEUE drain STATUS: REAL (proposed_actions UI exists) - side-by-side widget in flight Mike approves admin-dashboard side-by-side PDF overlay + 8 editable fields POST /api/proposed-actions/decide CLOUD · HITL APPROVAL i WHAT THIS DOES: NS_PUSH_QUEUE drains. PushMutexDO per customer_id prevents double-writes if another SO for Driscoll is being pushed concurrently. NS RESTlet creates the SO with otherrefnum = customer PO#. D1 TABLE: ns_pending_pushes NS ENDPOINT: /api/ns/push/sales-order TECHNICAL DETAILS: NS_PUSH_QUEUE DRAIN ENDPOINT POST /api/ns/push/sales-order?preview=false&confirm=true GUARDS PushMutexDO per customer_id 478 X-Edit-Token header PAYLOAD { entity: 478, otherrefnum: "8801772", shipaddress: "...", shipdate: "2026-06-03", memo: "...", items: [{ item, quantity, rate } x N] } NS RESTlet customscript_gfs_platform_push_so returns NS SO internal_id (e.g. 1842738) ON 2XX proposed_actions.status = done data_tagger_extractions.ns_record_id = 1842738 STATUS: REAL (NS_PUSH_QUEUE) + REAL (SO push) NS_PUSH_QUEUE drain POST /api/ns/push/sales-order PushMutexDO per customer 478 otherrefnum = "8801772" NS SO internal_id = 1842738 MESSAGEBUS · DURABLE QUEUE + MUTEX i WHAT THIS DOES: Sales Order created in NetSuite with otherrefnum set to the customer PO#. This establishes the trace thread - the same value will appear on the resulting Invoice and CashSale records as those flow through the SO lifecycle. NS RECORD: SalesOrd (transaction) TECHNICAL DETAILS: NS SO CREATED · PO# TRACE THREAD ESTABLISHED SO ID NS internal_id = 1842738 KEY FIELDS entity = 478 (Driscoll Foods) otherrefnum = "8801772" <-- THE THREAD trandate = today shipdate = 2026-06-03 shipaddress = (extracted) memo = (extracted) line items = 12 rows THREAD CARRIES THROUGH -> Invoice.otherrefnum = "8801772" -> CashSale.otherrefnum = "8801772" -> downstream chat search by PO# returns the whole chain RELATED FLOWS ns-sales-order-master.html ns-sales-order-path-1-inventory.html STATUS: REAL ★ NS SO created (otherrefnum=8801772) NS internal_id = 1842738 entity=478 · otherrefnum="8801772" THE TRACE THREAD IS ESTABLISHED PO# carries to Invoice + CashSale FINANCE-KEY · SEE ns-sales-order-master.html i WHAT THIS DOES: Increment template metrics. Fire events into the event ledger (R549). customer_health watcher reads events.so.created_from_po to recompute Driscoll's health score. D1 TABLES: data_tagger_templates (UPDATE), events (INSERT) TECHNICAL DETAILS: REFLEXION + EVENTS UPDATE data_tagger_templates SET hit_count = 48 (was 47) success_count = 46 (was 45) last_success_at = now EVENTS FIRED events.data_tagger.extracted_to_ns events.so.created_from_po events.data_tagger.template_used PAYLOAD { template_id: 'tpl_driscoll_po_so_v3', extraction_id, ns_record_id: 1842738, customer_id: 478, po_number: '8801772', confidence: 0.92 } SUBSCRIBERS customer_health watcher chat session context (Driscoll's latest SO surface) STATUS: REAL reflexion + events hit_count: 47 -> 48 success_count: 45 -> 46 (95.8%) events.so.created_from_po fires customer_health recomputes BACKEND · SELF-IMPROVING LOOP i email parsed PDF to R2 md + spans -> identify customer doc_type template -> apply 8 strategies in parallel 8 extracted values -> weighted confidence stage HITL surface approve -> drain queue NS write success UPDATE hit_count + success_count CUSTOMER PO# THREADING · the trace thread (sample: PO# 8801772 from Driscoll Foods) PO.po_number = "8801772" SO.otherrefnum = "8801772" Invoice.otherrefnum = "8801772" CashSale.otherrefnum = "8801772" chat search by "%8801772" -> whole chain
Glossary · cluster colors + thread terms
Database (templates, extractions, audit)
Backend (strategies, classifier, reflexion)
Cloud (admin-dashboard, surfaces)
Messagebus (NS_PUSH_QUEUE)
HITL gate (proposed_actions)
★ NS SO write
tpl_driscoll_po_so_v3: the saved template for Driscoll's customer PO -> SO mapping (8 field tags, version 3)
otherrefnum: the NS SO body field that holds the customer PO# (the trace thread)

Field tag list — the 8 strategies for Driscoll PO -> SO

#strategyns fieldpattern / anchorexample valueconf
1regex_after_labelSalesOrd.bodyFields.otherrefnum/P\.?O\.?\s*#\s*([0-9A-Z\-]{4,20})/i88017720.97
2literal_constantSalesOrd.bodyFields.entitycustomer_id from sender domain478 (Driscoll Foods)1.00
3multi_line_spanSalesOrd.bodyFields.shipaddressanchor SHIP TO: + 4 lines450 Industrial Blvd, Trenton NJ ...0.88
4regex_after_labelSalesOrd.bodyFields.shipdate/Delivery Date\s*:?\s*([0-9\/\-\.]{8,10})/i2026-06-030.95
5whole_sectionSalesOrd.bodyFields.memobetween SPECIAL INSTRUCTIONS: and END NOTESdelivery instructions text0.83
6table_with_headersSalesOrd.lineFields.itemheader Item #[SKU-419, SKU-2014, ...]0.94
7table_with_headersSalesOrd.lineFields.quantityheader Qty[24, 12, 36, ...]0.96
8table_with_headersSalesOrd.lineFields.rateheader Price or Unit Price[18.50, 22.75, ...]0.92

Tables, endpoints, code paths

kindnamepurpose
Mailboxorders@ai-globalfoodsolutions.cocustomer PO intake
R2 bucketgfs-inbound-attachmentsPDF storage
D1 tableinbound_email_logemail audit
D1 tabledata_tagger_templatestpl_driscoll_po_so_v3 lives here
D1 tabledata_tagger_extractionsone row per inbound PO
D1 tableproposed_actionsHITL queue (kind=data_tagger_extraction)
D1 tablens_pending_pushesNS_PUSH_QUEUE state
D1 tablecustomersemail_domain match column
D1 tableitemscustomer_item_alias mapping
D1 tableeventsevent ledger - so.created_from_po
EndpointPOST /api/data-tagger/applyrun template against inbound doc
EndpointPOST /api/proposed-actions/decideapprove SO draft
EndpointPOST /api/ns/push/sales-orderNS SO write
NS RESTletcustomscript_gfs_platform_push_soSO creator
Code pathsrc/email.tsorders@ inbound handling
Code pathsrc/document_converter.tsPDF parse
Durable ObjectPushMutexDOper-customer NS write mutex

Related