The first deployed Data Tagger use case
Driscoll Foods sends a customer purchase order via email. The Data Tagger applies the saved tpl_driscoll_po_so_v3 template, which maps 8 PDF regions to NetSuite Sales Order fields. Above 0.85 confidence the system auto-stages a draft SO; Mike approves; NS_PUSH_QUEUE writes the SO with otherrefnum set to the customer's PO#. That PO# threads to Invoice, CashSale, and downstream NS chain — the same value Mike can grep against in chat to retrieve the full record set.
This is the pillar's reference implementation. The other two paths (vendor COA, bid RFP) follow the same template-driven pattern with different destination records.
Trigger conditions
- Customer emails a PO to
orders@ai-globalfoodsolutions.co. - Sender domain matches a known customer (e.g.
driscoll-foods.com-> Driscoll Foods). - An active template exists for
(customer x po_inbound x SalesOrd). - (First-time customer) Operator trains a new template at
/data-tagger.html.
The customer PO# becomes SO.otherrefnum · Invoice.otherrefnum · CashSale.otherrefnum. In chat: %8801772 as a search retrieves the whole order chain (see the percent-search rule).
PO 8801772 from Driscoll
13:42 UTC. purchasing@driscoll-foods.com emails orders@ai-globalfoodsolutions.co with attachment PO_8801772.pdf. Subject: "PO 8801772 - Global Food Solutions".
Pipeline runs: inbound_email_log INSERT (mailbox=orders@); R2 PUT gfs-inbound-attachments/inbound/2026-05-27/PO_8801772.pdf; document_converter.ts renders to markdown + span coords.
customers WHERE email_domain LIKE '%driscoll-foods.com' returns customer_id=478. Classifier returns doc_type=po_inbound. ns_record_type=SalesOrd. Template lookup: tpl_driscoll_po_so_v3 (hit_count=47, success_count=45).
8 strategies run in parallel:
regex_after_labelP.O. # -> otherrefnum = "8801772" (0.97)literal_constantentity = "Driscoll Foods" (1.00)multi_line_spanshipaddress = "450 Industrial Blvd, Trenton NJ ..." (0.88)regex_after_labelDelivery Date -> shipdate = "2026-06-03" (0.95)whole_sectionmemo = delivery instructions (0.83)table_with_headersItem # -> lineFields.item (0.94)table_with_headersQty -> lineFields.quantity (0.96)table_with_headersPrice -> lineFields.rate (0.92)
Weighted overall = 0.92. Auto-staged. Mike approves in admin-dashboard. NS_PUSH_QUEUE drains. NS SO created with internal_id 1842738, otherrefnum "8801772". hit_count: 47 -> 48. events.so.created_from_po fires. customer_health for Driscoll recomputes.
20 steps from email to SO
- 01
Driscoll emails PO
purchasing@driscoll-foods.com->orders@ - 02
Log + R2
inbound_email_log+gfs-inbound-attachments - 03
Parse to markdown
document_converter.ts - 04
Resolve customer
sender domain ->
customer_id=478 - 05
Classify doc_type
keyword scan ->
po_inbound - 06
Lookup template
tpl_driscoll_po_so_v3 - 07-14
Apply 8 strategies
otherrefnum / entity / shipaddress / shipdate / memo / item / quantity / rate
- 15
Confidence 0.92
above 0.85 -> auto-draft path
- 16
Stage SO draft (HITL)
proposed_actionskind=data_tagger_extraction - 17
Mike approves
admin-dashboard side-by-side review
- 18
NS_PUSH_QUEUE writes SO
otherrefnum = customer PO#
- 19
Reflexion
hit_count + success_count incremented
- 20
Events fire
events.so.created_from_po,events.data_tagger.extracted_to_ns
What's different after the cycle
What can go wrong
If the PDF has multiple numeric strings near "PO" labels (e.g. internal Driscoll PO ref AND our SO ref), the regex may grab the wrong one. Mike corrects in side-by-side review; correction logs to data_tagger_template_corrections; if frequent, retrain a tighter regex.
If Driscoll sends a scan rather than a real PDF, document_converter returns near-empty markdown. Confidence drops well below 0.85; review-first card surfaces; Mike asks Driscoll to resend a proper PDF.
Driscoll's SKU codes differ from our NS item IDs. items.customer_item_alias[Driscoll] handles the mapping; new SKUs need an alias entry first.
Adjacent flows + diagrams
Code paths + invariants
| Concern | Where |
|---|---|
| Mailbox | orders@ai-globalfoodsolutions.co |
| Email pipeline | src/email.ts |
| Parser | src/document_converter.ts |
| Template | tpl_driscoll_po_so_v3 in data_tagger_templates |
| Apply endpoint | POST /api/data-tagger/apply |
| Approve endpoint | POST /api/proposed-actions/decide |
| NS write | POST /api/ns/push/sales-order |
| NS RESTlet | customscript_gfs_platform_push_so |
| Mutex | PushMutexDO per customer_id |
Dated trail
| Date | Round | Change | Touched by |
|---|---|---|---|
2026-05-27 | R598 | Path 1 wiki + diagram shipped. Driscoll Foods documented as reference customer. 8 field tags spelled out. | Mike + Claude |
The 8 field tags
| # | strategy | NS field | example value |
|---|---|---|---|
| 1 | regex_after_label | SalesOrd.bodyFields.otherrefnum | 8801772 |
| 2 | literal_constant | SalesOrd.bodyFields.entity | 478 |
| 3 | multi_line_span | SalesOrd.bodyFields.shipaddress | 4-line address block |
| 4 | regex_after_label | SalesOrd.bodyFields.shipdate | 2026-06-03 |
| 5 | whole_section | SalesOrd.bodyFields.memo | delivery instructions |
| 6 | table_with_headers | SalesOrd.lineFields.item | SKU array |
| 7 | table_with_headers | SalesOrd.lineFields.quantity | int array |
| 8 | table_with_headers | SalesOrd.lineFields.rate | decimal array |
PO# trace thread (the audit trail)
| Record | Field | Sample value |
|---|---|---|
| Customer PO PDF | P.O. # text | 8801772 |
SalesOrd | otherrefnum | 8801772 |
Invoice | otherrefnum | 8801772 |
CashSale | otherrefnum | 8801772 |
| Chat search | %8801772 | retrieves the whole chain |
It broke - what now
Scenario · PO# wrong in draft
Mike sees the regex grabbed the wrong number.
- Edit the field in side-by-side; approve. Correction logs.
- If recurring (miss/hit > 0.2): train v4 with tighter regex or switch to
fixed_region. - Bump: prior v3 ->
status=superseded; new v4 ->status=active.
Scenario · NS write 500 / partial fail
Approve succeeded, NS RESTlet returned 500.
- Check queue:
SELECT * FROM ns_pending_pushes WHERE status='failed' - Check NS RESTlet logs via NS SuiteScript debugger
- Force re-drain:
POST /api/ns/push-queue/drain
Logs to check
data_tagger_extractionsdata_tagger_template_correctionsproposed_actionsns_pending_pushesevents(so.created_from_po)
Open items for Path 1
- STUBMigration 142 land
Agent BB-1 owns; templates + extractions tables need to land before this path is fully operational at runtime.
- STUBSide-by-side review widget
Agent BB-2 owns. PDF overlay + editable form widget in flight.
- OPENThreshold telemetry
Track per-template confidence distribution to justify the 0.85 threshold (or raise per-customer).