Data Tagger — Path 2 · vendor COA to compliance

vendors@ -> COA template -> lot_number / item_code / test_date / parameters[] / pass-fail -> vendor_coas -> events.coa.received

Food-safety compliance use case. A vendor (e.g. Bongards Creameries, Driscoll Foods on USDA cheese, or a co-pack) emails a Certificate of Analysis attached to vendors@ai-globalfoodsolutions.co. The COA template extracts lot_number, item_code, test_date, an array of parameters_tested with results (moisture %, fat %, salt %, microbial), and the overall pass/fail. Output writes to the D1 vendor_coas table (TBD — not yet created; needs its own migration), updates item lot tracking, and fires events.coa.received for compliance subscribers. Failed lots route to a separate HITL action: hold-the-lot + notify warehouse. COAs are NOT on the SO/PO/WO golden path — they sit alongside as food-safety compliance.

vendor_coas table TBD (no migration yet) HITL on COA insert (compliance gate) strategy schemas REAL (migration 142)

Pipeline — vendor COA arrives → extract → vendor_coas → compliance events

idle
Vendor sends COA PDF/DOCX to vendors@ - parsed - vendor resolved - COA template applied - lot_number/item_code/test_date/parameters[]/pass-fail extracted - HITL compliance review - INSERT vendor_coas - item lot tracking updated - events.coa.received fires - if FAIL fork to hold-lot workflow LANE 1 / INTAKE · vendor COA arrives LANE 2 / EXTRACT · COA template applies 5 field tags LANE 3 / HITL + COMPLIANCE WRITE · vendor_coas INSERT + events LANE 4 / FAIL FORK · hold lot + notify warehouse (if pass_fail=fail) WHAT THIS DOES: A vendor (Bongards, USDA cheese supplier, or a co-pack) emails the Certificate of Analysis for a recent lot. PDF or DOCX, typically 1-2 pages, with batch lot, test date, and laboratory measurements. TECHNICAL DETAILS: VENDOR EMAILS COA TYPICAL SENDERS Bongards Creameries (cheddar barrel COAs) co-packers (Right Start, Public Foods, etc.) USDA commodity COAs ATTACHMENT COA-LOT-2026053-BNG.pdf / .docx TO vendors@ai-globalfoodsolutions.co STATUS: REAL (intake) vendor emails COA vendors@ai-globalfoodsolutions.co PDF or DOCX attachment Bongards / co-pack / USDA EXTERNAL · CF EMAIL ROUTING i WHAT THIS DOES: Log email + save attachment to R2 (same audit pattern as Path 1). TECHNICAL DETAILS: LOG + R2 INSERT inbound_email_log (mailbox=vendors@) PUT R2 gfs-inbound-attachments/inbound/<date>/COA_*.pdf STATUS: REAL log + R2 inbound_email_log + R2 DATABASE · AUDIT TRAIL i WHAT THIS DOES: Parse PDF or DOCX to markdown. DOCX uses native parser (table structure preserved cleanly which matters for parameter tables). CODE: src/document_converter.ts TECHNICAL DETAILS: PARSE TO MARKDOWN INPUT: R2 key OUTPUT: md + spans DOCX: native parser preserves table headers cleanly STATUS: REAL parse to markdown document_converter.ts DOCX preserves table headers BACKEND · SHARED PARSER i WHAT THIS DOES: Resolve sender domain to vendor_id. Falls back to name_synonyms fuzzy on body text if no domain match (many co-packers don't have stable email domains). TECHNICAL DETAILS: RESOLVE VENDOR SQL: vendors WHERE email_domain MATCH OR name_synonyms fuzzy on body RESULT: vendor_id = VEN-Bongards STATUS: REAL resolve vendor sender domain or fuzzy vendor_id = VEN-Bongards DATABASE i WHAT THIS DOES: Classify as coa via keyword scan. TECHNICAL DETAILS: CLASSIFY DOC_TYPE = coa KEYWORDS CERTIFICATE OF ANALYSIS, C.O.A., COA ns_record_type = vendor_coas STATUS: REAL classify doc_type = coa keyword scan ns_record_type = vendor_coas BACKEND i WHAT THIS DOES: Look up Bongards COA template. Versioned per vendor. TECHNICAL DETAILS: LOOKUP COA TEMPLATE SQL data_tagger_templates WHERE vendor_id=VEN-Bongards AND doc_type=coa AND ns_record_type=vendor_coas AND status=active RESULT: tpl_bongards_coa_v1 (5 field tags) STATUS: REAL (table contract) lookup COA template tpl_bongards_coa_v1 5 field tags DATABASE i WHAT THIS DOES: Apply the 5 COA strategies in parallel. TECHNICAL DETAILS: APPLY 5 COA STRATEGIES lot_number regex_after_label /Lot[\s#:]*([A-Z0-9\-]+)/ item_code regex_after_label /Item[\s#:]*([A-Z0-9\-]+)/ test_date regex_after_label /Test Date[:\s]*([0-9\/\-\.]+)/ parameters_tested[] table_with_headers (Parameter, Result, Spec) pass_fail formula (all parameters within spec) STATUS: REAL (strategies) apply 5 COA strategies lot / item / date / params / pass-fail parallel extraction BACKEND i WHAT THIS DOES: COA confidence is computed similarly to Path 1; threshold is higher (0.90) because compliance writes get more scrutiny. TECHNICAL DETAILS: COMPUTE CONFIDENCE threshold = 0.90 (vs 0.85 for SO) INSERT data_tagger_extractions (compliance=true) STATUS: REAL compute confidence (threshold 0.90) higher bar than Path 1 (compliance) data_tagger_extractions (compliance=true) DATABASE i WHAT THIS DOES: Compliance HITL gate. Mike (or a delegated compliance reviewer) checks the extraction. Higher bar than ordinary HITL because COAs are food-safety records. TECHNICAL DETAILS: COMPLIANCE HITL INSERT proposed_actions (kind=data_tagger_coa_extraction) SURFACE: admin-dashboard compliance panel REVIEWERS: Mike OR delegated compliance role STATUS: REAL (queue) - compliance UI in flight stage proposed_action (compliance HITL) kind = data_tagger_coa_extraction food-safety scrutiny higher than SO delegable to compliance reviewer SECURITY · ADR-031 + COMPLIANCE i WHAT THIS DOES: Reviewer approves/rejects with side-by-side PDF + form. On approve, ready to INSERT vendor_coas. TECHNICAL DETAILS: REVIEWER APPROVES admin-dashboard compliance panel STATUS: in flight reviewer approves side-by-side PDF + form POST /api/proposed-actions/decide CLOUD i WHAT THIS DOES: INSERT vendor_coas - the destination table. **TBD - this table is NOT YET CREATED.** A migration must land (proposed mig 144 or similar) before this wiki path is meaningfully operational. Schema sketch only. D1 TABLE: vendor_coas (TBD - not yet a migration) TECHNICAL DETAILS: INSERT vendor_coas (TBD table) SCHEMA SKETCH coa_id TEXT PRIMARY KEY vendor_id INTEGER NOT NULL item_code TEXT lot_number TEXT NOT NULL test_date TEXT parameters_json TEXT pass_fail TEXT CHECK (pass_fail IN ('pass','fail')) extraction_id TEXT REFERENCES data_tagger_extractions(id) created_at INTEGER INDEXES (vendor_id, item_code, lot_number) UNIQUE (pass_fail) for fail-routing STATUS: STUB (table does not exist yet) ★ INSERT vendor_coas (TBD table) D1 mig 144 (TBD - NOT YET CREATED) coa_id / vendor_id / lot / params / pass_fail UNIQUE (vendor_id, item_code, lot_number) FINANCE-KEY · COMPLIANCE RECORD · STUB i WHAT THIS DOES: Update item lot tracking. items table or a TBD item_lots table gets the lot_number entered. Question is whether lots live alongside items (lot column) or in a separate table. TECHNICAL DETAILS: UPDATE ITEM LOT TRACKING OPTION A: items.lots_json append OPTION B: item_lots table (TBD) STATUS: STUB (design open) update item lot tracking items.lots_json OR item_lots TBD design open DATABASE · STUB i WHAT THIS DOES: Emit events.coa.received. Compliance subscribers react: monthly food-safety review, FDA-trail report, customer-facing COA-on-request. TECHNICAL DETAILS: EVENTS FIRED events.coa.received { coa_id, vendor_id, item_code, lot_number, pass_fail } events.data_tagger.extracted_to_ns SUBSCRIBERS compliance_admin role monthly food-safety review STATUS: REAL (event ledger R549) events.coa.received fires event ledger (R549) compliance subscribers react MESSAGEBUS i WHAT THIS DOES: If pass_fail = fail, fork a separate proposed_action: hold the lot in inventory, notify warehouse + CS. Customer-facing comms templated. TECHNICAL DETAILS: FAIL FORK (only when pass_fail = fail) INSERT proposed_actions (kind=coa_fail_hold_lot) payload: { coa_id, lot_number, item_code, reason_json } ACTIONS inventory_admin holds lot (set status=quarantine) warehouse_admin notified CS prepared with customer-facing template STATUS: STUB - workflow contract undefined fail fork · hold lot + notify warehouse (only if pass_fail = fail) INSERT proposed_actions kind=coa_fail_hold_lot inventory.quarantine + warehouse notify + CS template STUB · workflow contract undefined SECURITY · HITL · FOOD SAFETY i WHAT THIS DOES: Increment template metrics whether pass or fail. Both outcomes are successful extractions from the COA template's perspective. TECHNICAL DETAILS: REFLEXION UPDATE data_tagger_templates SET hit_count=hit_count+1, success_count=success_count+1 EVENTS events.data_tagger.template_used STATUS: REAL reflexion (template metrics) hit_count + success_count pass OR fail are both successful extractions BACKEND · SELF-IMPROVING i log parse md -> identify vendor 5 values -> confidence stage compliance HITL approve coa_id -> lot tracking if FAIL fork always: reflexion
Glossary
Database (templates, vendor_coas)
Backend (strategies, classifier)
HITL (compliance gate, fail fork)
Event ledger
vendor_coas: TBD D1 table (NOT YET CREATED; needs migration before path is operational)
tpl_bongards_coa_v1: example saved template for the path

COA field tags — the 5 strategies for vendor COA

#strategyfieldpatternexample
1regex_after_labellot_number/Lot[\s#:]*([A-Z0-9\-]+)/iLOT-2026053-BNG
2regex_after_labelitem_code/Item[\s#:]*([A-Z0-9\-]+)/iSKU-419
3regex_after_labeltest_date/Test Date[:\s]*([0-9\/\-\.]+)/i2026-05-22
4table_with_headersparameters_tested[]headers: Parameter / Result / Specmoisture 36.4% / fat 32.1% / salt 1.7%
5formulapass_failall params within specpass

Open TBDs