Anomaly remediation
Anomaly remediation is the substrate workflow that closes the loop between "the AI noticed something weird" and "Mike has a one-tap remediation in his queue." It runs every 15 minutes on the existing R174 cron, so the latency between an anomaly being detectable and Mike seeing the prompt is bounded.
The scan reuses the find_anomalies_today chat tool internals, then classifies findings by category and severity. For each anomaly, a remediation row stages in proposed_actions with action_type=anomaly_response.
Risk is 2 (low) — nothing pushes to NS from this workflow. Worst case is a noisy queue, which is recoverable.
Trigger conditions
- Triggered by the 15-minute R174 cron — no manual invocation needed in steady state.
- Manual re-run when a vendor cost spike is suspected outside the cron window.
- After a deploy that changed the anomaly heuristics — useful to validate new findings.
Anomalies are de-duplicated against rows already open in anomaly_investigations within the last 2 hours. Repeated detections of the same anomaly do not generate duplicate proposed_actions.
The 2 beats
-
01
Propose one remediation per anomaly
For each fresh anomaly the runner stages a
proposed_actionsrow withaction_type=anomaly_response, the structured payload, and a suggested next step. Mike can approve, ignore, or escalate. -
02
Surface to admin dashboard
New rows insert into
anomaly_investigations(the persistent record) and the weekly aggregateweekly_anomaliesis updated. The admin-dashboard pulls live from both.
What's different after the workflow runs
- New anomalies appear on /admin-dashboard.html within the next cron tick.
- Each open anomaly has at least one proposed_action waiting in Mike's queue.
- weekly_anomalies aggregates the trailing week for trend review.
- reflexion_log holds a tagged
anomaly_remediationrow per cron run.
What can go wrong and how to recover
The cron tick logs a failure in workflow_runs but doesn't retry — next tick (15 min) picks it up. Acceptable for risk-2 work.
If the 2h dedupe window is too tight for a slow-moving anomaly, manual cleanup is DELETE FROM anomaly_investigations WHERE .... Rare.
A successful run that produces zero rows is normal. The reflexion entry still records the no-op tick.
Adjacent workflows + diagrams
Code paths + invariants
| Concern | Where |
|---|---|
| Workflow contract | workflow_definitions WHERE workflow_type='anomaly_remediation' |
| Cron schedule | 22,37,52 * * * * (R174) |
| Detector | find_anomalies_today chat tool internals |
| Output tables | anomaly_investigations, weekly_anomalies |
| Reflexion tag | anomaly_remediation |
| Risk level | 2 |
| Expected duration | ~15 min |
| Trigger | cron · 22,37,52 * * * * (R174 cron) |