Wiki · Data Tagger Path 1 · customer PO → SO · R598

Data Tagger Path 1 — customer PO to SO

The first deployed Data Tagger use case. Driscoll Foods' customer PO becomes a NetSuite Sales Order with the customer PO# threaded through otherrefnum — establishing the audit trail that carries to Invoice and CashSale downstream.

Path 1 · deployed reference 8 strategies · conf 0.92
What this is

The first deployed Data Tagger use case

Driscoll Foods sends a customer purchase order via email. The Data Tagger applies the saved tpl_driscoll_po_so_v3 template, which maps 8 PDF regions to NetSuite Sales Order fields. Above 0.85 confidence the system auto-stages a draft SO; Mike approves; NS_PUSH_QUEUE writes the SO with otherrefnum set to the customer's PO#. That PO# threads to Invoice, CashSale, and downstream NS chain — the same value Mike can grep against in chat to retrieve the full record set.

This is the pillar's reference implementation. The other two paths (vendor COA, bid RFP) follow the same template-driven pattern with different destination records.

Diagram: ns-data-tagger-path-1-customer-po-to-so.html.

When to use it

Trigger conditions

PO# is the trace thread

The customer PO# becomes SO.otherrefnum · Invoice.otherrefnum · CashSale.otherrefnum. In chat: %8801772 as a search retrieves the whole order chain (see the percent-search rule).

Worked example

PO 8801772 from Driscoll

Scenario

13:42 UTC. purchasing@driscoll-foods.com emails orders@ai-globalfoodsolutions.co with attachment PO_8801772.pdf. Subject: "PO 8801772 - Global Food Solutions".

Pipeline runs: inbound_email_log INSERT (mailbox=orders@); R2 PUT gfs-inbound-attachments/inbound/2026-05-27/PO_8801772.pdf; document_converter.ts renders to markdown + span coords.

customers WHERE email_domain LIKE '%driscoll-foods.com' returns customer_id=478. Classifier returns doc_type=po_inbound. ns_record_type=SalesOrd. Template lookup: tpl_driscoll_po_so_v3 (hit_count=47, success_count=45).

8 strategies run in parallel:

  • regex_after_label P.O. # -> otherrefnum = "8801772" (0.97)
  • literal_constant entity = "Driscoll Foods" (1.00)
  • multi_line_span shipaddress = "450 Industrial Blvd, Trenton NJ ..." (0.88)
  • regex_after_label Delivery Date -> shipdate = "2026-06-03" (0.95)
  • whole_section memo = delivery instructions (0.83)
  • table_with_headers Item # -> lineFields.item (0.94)
  • table_with_headers Qty -> lineFields.quantity (0.96)
  • table_with_headers Price -> lineFields.rate (0.92)

Weighted overall = 0.92. Auto-staged. Mike approves in admin-dashboard. NS_PUSH_QUEUE drains. NS SO created with internal_id 1842738, otherrefnum "8801772". hit_count: 47 -> 48. events.so.created_from_po fires. customer_health for Driscoll recomputes.

Step-by-step what happens

20 steps from email to SO

  1. 01

    Driscoll emails PO

    purchasing@driscoll-foods.com -> orders@

  2. 02

    Log + R2

    inbound_email_log + gfs-inbound-attachments

  3. 03

    Parse to markdown

    document_converter.ts

  4. 04

    Resolve customer

    sender domain -> customer_id=478

  5. 05

    Classify doc_type

    keyword scan -> po_inbound

  6. 06

    Lookup template

    tpl_driscoll_po_so_v3

  7. 07-14

    Apply 8 strategies

    otherrefnum / entity / shipaddress / shipdate / memo / item / quantity / rate

  8. 15

    Confidence 0.92

    above 0.85 -> auto-draft path

  9. 16

    Stage SO draft (HITL)

    proposed_actions kind=data_tagger_extraction

  10. 17

    Mike approves

    admin-dashboard side-by-side review

  11. 18

    NS_PUSH_QUEUE writes SO

    otherrefnum = customer PO#

  12. 19

    Reflexion

    hit_count + success_count incremented

  13. 20

    Events fire

    events.so.created_from_po, events.data_tagger.extracted_to_ns

Outcomes

What's different after the cycle

NS SO
created
otherrefnum=PO#
Template hits
+1
47 -> 48
Confidence
0.92
above threshold
Operator time
~30s
read + approve
Failure modes

What can go wrong

PO# regex catches the wrong number

If the PDF has multiple numeric strings near "PO" labels (e.g. internal Driscoll PO ref AND our SO ref), the regex may grab the wrong one. Mike corrects in side-by-side review; correction logs to data_tagger_template_corrections; if frequent, retrain a tighter regex.

Scanned PDF not text-readable

If Driscoll sends a scan rather than a real PDF, document_converter returns near-empty markdown. Confidence drops well below 0.85; review-first card surfaces; Mike asks Driscoll to resend a proper PDF.

Item alias mismatch

Driscoll's SKU codes differ from our NS item IDs. items.customer_item_alias[Driscoll] handles the mapping; new SKUs need an alias entry first.

Related

Adjacent flows + diagrams

For developers

Code paths + invariants

ConcernWhere
Mailboxorders@ai-globalfoodsolutions.co
Email pipelinesrc/email.ts
Parsersrc/document_converter.ts
Templatetpl_driscoll_po_so_v3 in data_tagger_templates
Apply endpointPOST /api/data-tagger/apply
Approve endpointPOST /api/proposed-actions/decide
NS writePOST /api/ns/push/sales-order
NS RESTletcustomscript_gfs_platform_push_so
MutexPushMutexDO per customer_id
// Sample template field_tags JSON for tpl_driscoll_po_so_v3 [ { ns_field: "SalesOrd.bodyFields.otherrefnum", strategy: "regex_after_label", pattern: "/P\\.?O\\.?\\s*#\\s*([0-9A-Z\\-]{4,20})/i", weight: 2.0 }, { ns_field: "SalesOrd.bodyFields.entity", strategy: "literal_constant", pattern: "Driscoll Foods", weight: 1.0 }, { ns_field: "SalesOrd.bodyFields.shipaddress", strategy: "multi_line_span", pattern: {anchor:"SHIP TO:", lines: 4}, weight: 1.0 }, // ... 5 more field tags ... ]
Changelog

Dated trail

DateRoundChangeTouched by
2026-05-27R598Path 1 wiki + diagram shipped. Driscoll Foods documented as reference customer. 8 field tags spelled out.Mike + Claude
Schema · data contract

The 8 field tags

#strategyNS fieldexample value
1regex_after_labelSalesOrd.bodyFields.otherrefnum8801772
2literal_constantSalesOrd.bodyFields.entity478
3multi_line_spanSalesOrd.bodyFields.shipaddress4-line address block
4regex_after_labelSalesOrd.bodyFields.shipdate2026-06-03
5whole_sectionSalesOrd.bodyFields.memodelivery instructions
6table_with_headersSalesOrd.lineFields.itemSKU array
7table_with_headersSalesOrd.lineFields.quantityint array
8table_with_headersSalesOrd.lineFields.ratedecimal array

PO# trace thread (the audit trail)

RecordFieldSample value
Customer PO PDFP.O. # text8801772
SalesOrdotherrefnum8801772
Invoiceotherrefnum8801772
CashSaleotherrefnum8801772
Chat search%8801772retrieves the whole chain
Runbook · when it breaks

It broke - what now

Scenario · PO# wrong in draft

Mike sees the regex grabbed the wrong number.

  1. Edit the field in side-by-side; approve. Correction logs.
  2. If recurring (miss/hit > 0.2): train v4 with tighter regex or switch to fixed_region.
  3. Bump: prior v3 -> status=superseded; new v4 -> status=active.

Scenario · NS write 500 / partial fail

Approve succeeded, NS RESTlet returned 500.

  1. Check queue: SELECT * FROM ns_pending_pushes WHERE status='failed'
  2. Check NS RESTlet logs via NS SuiteScript debugger
  3. Force re-drain: POST /api/ns/push-queue/drain

Logs to check

Backlog

Open items for Path 1