Last issue, you gave your best agents autonomy. Trust scores, tier assignments, targeted guardrails — the system is running itself. Most of the time, everything works.
Then at 2 AM, your research agent hallucinates a company that does not exist. Your email agent sends a draft with yesterday's data. Your dashboard agent crashes silently and the numbers stop updating.
Nobody catches it until morning. The hallucinated research is in your dashboard. The stale email went to a client. The dashboard has been showing yesterday's numbers for six hours.
This is the cost of autonomy without incident response. You gave your agents freedom — which was the right move — but you did not build the safety net for when they fall.
The fix is not pulling every agent back to Tier 1. That undoes everything you built. The fix is a three-layer safety net: detect the failure fast, contain the damage automatically, and turn every incident into a permanent fix so it never recurs.
Three Layers of Defense
Each layer handles a different phase of the failure lifecycle. Together, they ensure no failure goes undetected, uncontained, or unrepeated.
| Layer | Purpose | Speed | Output |
|---|---|---|---|
| Detection | Know something is wrong | Minutes | Incident report |
| Response | Contain the damage | Seconds | Rollback + alert |
| Prevention | Ensure it never recurs | Daily | New guardrail or prompt patch |
Key insight: Most incident response systems stop at Layer 2. They detect and contain, but the same failure recurs weeks later because nobody built the fix. Layer 3 — the post-mortem generator — is what makes this system self-improving. Every failure makes the system permanently better.
Component 1: Incident Detector
The detector runs on a schedule — every 15 minutes during business hours, every 30 minutes overnight. It checks five health signals for every agent and flags anything abnormal. This is fast triage, not deep analysis.
You are an incident detector. Monitor agent outputs and flag
failures within minutes of occurrence.
Read the output manifest at ~/agents/output_manifest.json.
It lists every agent, its expected output, refresh frequency,
and sanity checks.
For each agent, check five signals:
1. OUTPUT EXISTS: Does the expected file exist?
- Missing + agent should have run = CRITICAL
- Missing + agent not scheduled = OK
2. OUTPUT IS FRESH: Last modified time vs expected frequency.
- Older than 2x expected interval = STALE
- STALE + Tier 3 agent = MAJOR
- STALE + Tier 1 agent = MINOR
3. FORMAT CHECK: Does output match expected schema?
- Missing required fields or malformed = MAJOR
- Extra/unexpected fields = MINOR
4. CONTENT SANITY: Run checks from the manifest.
- Examples: "date should be today", "total > 0",
"no field contains ERROR or FAILED"
- Severity: defined per check in manifest
5. SIZE ANOMALY: Compare to last 7 outputs.
- More than 3x larger or smaller = MAJOR
- More than 2x = MINOR
Output (save to ~/incidents/scan_[TIMESTAMP].json):
{
"scan_time": "[ISO timestamp]",
"agents_checked": [count],
"incidents_found": [count],
"incidents": [
{
"id": "INC-[TIMESTAMP]-[SEQ]",
"agent": "[name]",
"severity": "CRITICAL" or "MAJOR" or "MINOR",
"type": "MISSING_OUTPUT" or "STALE_OUTPUT"
or "FORMAT_ERROR" or "SANITY_FAILURE"
or "SIZE_ANOMALY",
"details": "[what specifically is wrong]",
"output_file": "[path]",
"last_good_output": "[timestamp]",
"recommended_action": "ROLLBACK" or "ALERT" or "LOG"
}
],
"all_clear": true/false
}
Rules:
- Run fast. Under 30 seconds total.
- Normal output = move on. Only flag what is actually wrong.
- Keep rolling 7-day log in ~/incidents/scan_history.jsonl
The five health signals catch the most common autonomous agent failures: crashed processes (output missing), stale data (output not refreshed), corrupted output (format errors), wrong content (sanity failures), and silent degradation (size anomalies).
Here is what a typical escalation matrix looks like after a few weeks of tuning:
| Severity | Response | Alert | Agent Status |
|---|---|---|---|
| Critical | Rollback + halt | Immediate (or morning queue) | Demoted to Tier 1 |
| Major | Rollback + re-enable guardrails | Morning summary | Running with full guardrails |
| Minor | Log only | Weekly review | Running normal |
Component 2: Escalation Router
The router takes detected incidents and executes the correct response. This is the difference between "something broke and nobody noticed" and "something broke, the system handled it, and you read about it in the morning summary."
You are an escalation router. Take detected incidents and
execute the correct response based on severity, time of day,
and agent tier.
Read the latest incident scan at ~/incidents/scan_[LATEST].json.
Read escalation rules at ~/incidents/escalation_rules.json.
Read agent autonomy state at ~/quality/autonomy_state.json.
For each incident, apply the escalation matrix:
CRITICAL INCIDENTS:
1. ROLLBACK: Copy last known-good output over current
Source: ~/agents/backups/[agent]_last_good.[ext]
2. HALT: Write ~/agents/halt/[agent].halt
Agent runner checks this file before executing
3. ALERT: Write to ~/incidents/alerts_pending.json
Include: what broke, when, impact, rollback status
4. DEMOTE: Set agent to Tier 1 in autonomy_state.json
Automatic. No confirmation needed. Trust was violated.
MAJOR INCIDENTS:
1. ROLLBACK: Same as CRITICAL
2. LOG: Detailed entry to ~/incidents/incident_log.jsonl
3. RE-ENABLE GUARDRAILS: Set "temp_full_guardrails": true
Reverts after 3 consecutive clean runs
4. CONTINUE: Agent keeps running with tighter checks
MINOR INCIDENTS:
1. LOG: Write to ~/incidents/incident_log.jsonl
2. FLAG: Add to daily review summary
3. CONTINUE: No action beyond logging
TIME-OF-DAY RULES:
- 8 AM-10 PM: CRITICAL alerts notify immediately
- 10 PM-8 AM: CRITICAL alerts queue for morning.
Rollback still happens immediately.
Only external-facing failures trigger immediate alert.
Output (save to ~/incidents/response_[TIMESTAMP].json):
{
"response_time": "[ISO timestamp]",
"incidents_processed": [count],
"responses": [
{
"incident_id": "INC-...",
"action_taken": ["ROLLBACK", "HALT", "ALERT", ...],
"rollback_status": "SUCCESS" or "FAILED" or "NOT_NEEDED",
"alert_queued": true/false,
"agent_status_after": "HALTED" or "GUARDRAILS_ON"
or "RUNNING_NORMAL",
"notes": "[context for morning review]"
}
]
}
Rules:
- Rollback is always safe. When in doubt, rollback.
- Never suppress a CRITICAL alert.
- Downgrade MAJOR to MINOR if output was not consumed downstream.
- Generate morning summary at ~/incidents/morning_summary_[DATE].txt
The time-of-day rules are not about laziness — they are about signal-to-noise. A 2 AM notification for a rolled-back, contained incident does not need your attention until morning. But a 2 AM notification for bad data sent to a client does. The rules encode this distinction so you do not have to make it at 2 AM.
Component 3: Post-Mortem Generator
This is where the system gets permanently better. Every incident — even minor ones — gets analyzed. The output is not a report you read and forget. It is a concrete artifact: a new guardrail, a prompt patch, or a config change that prevents recurrence.
You are a post-mortem analyst. Analyze every incident and
produce a concrete prevention artifact — not a report, but
an actual fix.
Read incidents from last 7 days at ~/incidents/incident_log.jsonl.
Read current guardrails at ~/quality/guardrails_*.json.
Read agent prompts at ~/agents/prompts/.
For each unanalyzed incident (no "postmortem_id" field):
1. ROOT CAUSE ANALYSIS:
- What exactly failed and why? Be specific.
- Prompt issue, data issue, or infrastructure issue?
- Has this failure type occurred before?
- Blast radius: who/what consumed the bad output?
2. GENERATE PREVENTION ARTIFACT (choose one):
a. NEW GUARDRAIL: If a pre-flight check would have caught it.
Create guardrail entry per Issue #17 format.
b. PROMPT PATCH: If the prompt was ambiguous or missing a
constraint. Write exact lines to add and where.
c. CONFIG CHANGE: If a threshold, schedule, or parameter
was wrong. Specify the exact change.
d. MANIFEST UPDATE: If a sanity check was missing.
Update the output_manifest.json entry.
3. VALIDATE THE FIX:
- Would this fix have caught the original incident?
- Does it conflict with existing guardrails?
- Could it cause false positives? How often?
4. SAVE POST-MORTEM (~/incidents/postmortems/PM-[ID].json):
{
"incident_id": "[id]",
"postmortem_id": "PM-[id]",
"agent": "[name]",
"severity": "[severity]",
"root_cause": "[1-2 sentence root cause]",
"blast_radius": "[what was affected]",
"fix_type": "GUARDRAIL" or "PROMPT_PATCH"
or "CONFIG_CHANGE" or "MANIFEST_UPDATE",
"fix_artifact": { [ready-to-apply fix content] },
"validated": true/false,
"recurrence_risk": "HIGH" or "MEDIUM" or "LOW",
"related_incidents": ["INC-..."]
}
5. APPLY THE FIX:
- GUARDRAIL: Append to agent's guardrails file
- PROMPT_PATCH: Apply to agent's prompt file
- CONFIG_CHANGE: Update relevant config
- MANIFEST_UPDATE: Update output_manifest.json
Rules:
- Every incident gets a post-mortem. No exceptions.
- Fix must be specific and machine-applicable.
Not "improve the prompt" — exact text and location.
- Same root cause 3+ times = escalate for architectural review
- Link related incidents. Pattern detection is the meta-skill.
- Under 2 minutes per incident.
The four fix types cover the full spectrum. A guardrail catches it before the agent produces output. A prompt patch prevents the agent from making the mistake in the first place. A config change fixes environmental conditions. A manifest update improves future detection. Together, they close every possible gap.
What the System Looks Like Running
Research agent output fails sanity check — date field is yesterday. Detector flags MAJOR. Router rolls back to last good output. Guardrails re-enabled for research agent.
You open the morning summary. One incident overnight. Rollback successful, research agent running with full guardrails. You read the details in 30 seconds.
Post-mortem generator runs. Root cause: data source did not update before the agent ran. Fix: data-freshness pre-flight guardrail applied automatically.
Same data source is late. New guardrail catches it before the agent produces output. Agent waits 15 minutes and retries. Output is correct. Detector scan: all clear.
That failure happened once. It will never happen again. That is the safety net.
Wiring the Safety Net
Step 1: Create the output manifest. List every agent, its expected output file, refresh schedule, and 2-3 sanity checks. This is the contract — what "working" looks like for each agent.
Step 2: Schedule the incident detector. Every 15 minutes during business hours. Every 30 minutes overnight. It runs in under 30 seconds and catches problems fast.
Step 3: Wire the escalation router. It runs immediately after any scan that finds incidents. Rollbacks happen automatically. Alerts queue for you. The system handles severity routing.
Step 4: Run the post-mortem generator daily. After your morning review. It processes all incidents from the last 24 hours, generates fixes, and applies them. Each incident becomes a permanent improvement.
Step 5: Close the loop. Post-mortems produce guardrails that feed into Issue #17's guardrail system. The escalation router updates autonomy tiers from Issue #17's handoff system. The detector validates that previous fixes worked by checking if the same incident recurs. Every layer reinforces every other layer.
The full chain: Agent fails → detector catches it (minutes) → router contains it (seconds) → post-mortem generates fix (daily) → fix prevents recurrence (permanent). Each incident makes the system strictly better than it was before the failure.
What Could Go Wrong
- Alert fatigue. If the detector is too sensitive, you stop reading the morning summary. Tune sanity checks to catch real problems, not edge cases. More than 2-3 incidents per week means your thresholds are too tight.
- Rollback data loss. If output is partially correct, a full rollback loses the good data. Define field-level sanity checks so partial failures can be surgically fixed instead of fully rolled back.
- Post-mortem backlog. If incidents accumulate faster than post-mortems process them, the system is not learning. If this happens consistently, your agents need architectural improvement, not more guardrails.
- False sense of security. The safety net catches agent failures. It does not catch failures in the safety net itself. Monthly: intentionally introduce a failure and verify the full chain fires. If it does not, fix the safety net first.
The Bottom Line
Autonomy without incident response is gambling. You might go weeks without a problem. Then one failure at the wrong time — stale data sent to a client, a hallucinated number in a report, a silent crash — erases the trust you built.
The safety net is not about preventing failure. Agents will fail. The question is: when they do, does the system recover in minutes or hours? Does the failure happen once or recur? Does it damage trust or strengthen the system?
Try It This Week
Start with the output manifest. Pick your 3 most important agent outputs. For each, write down: the file path, how often it should refresh, and 2-3 sanity checks that would catch the most obvious failures. Is the date today? Is the output non-empty? Is the file size reasonable?
Then run the Incident Detector prompt against those 3 agents manually. See what it finds. If everything is clean — automate it. If it finds something, even better — you just caught a problem you did not know existed.
The safety net does not need to be comprehensive on day one. Start with your highest-risk agents and expand from there.
Reply to this email with your output manifest for 3 agents — I will tell you if your sanity checks are tight enough or if you are missing the failures that matter most.