Current runbook version
Execute repeatable incident response with governed AI support.
The AI runbook is now a route-native part of the production web shell, so live execution guidance can evolve with the app instead of staying trapped in legacy demo HTML.
AI-executable steps
Approval gates
Target to first status update
Trigger Conditions And Preconditions
Invoke the runbook only when signal, scope, and ownership are clear enough to support structured execution.
| Signal / Check | Threshold Or Rule | Source | Runbook Decision |
|---|---|---|---|
| Remote access latency sustained | P95 latency above 280 ms for 10 minutes | Gateway telemetry + synthetic check | Open a shared-service incident, declare severity, and attach the runbook |
| Cross-division impact | 25+ affected users or 3+ divisions impacted | Ticket correlation + service desk spike | Treat as a shared service issue, not a local workstation issue |
| No planned change in effect | No active maintenance, routing, or patch window | Change calendar | Prevent false-positive execution |
| Human ownership established | Incident commander and service owner named before step 3 | On-call roster | Allow AI execution under accountable oversight |
Execution Sequence
Each step defines what AI may do, what a human must confirm, and where the guardrail sits.
Detect And Open Incident
AI action: Correlate gateway latency, session failure rate, and service desk spike; draft the incident summary and severity proposal.
Human checkpoint: Incident commander confirms severity and accountable owner.
Guardrail: Stop if the issue is isolated or tied to planned maintenance.
Gather Diagnostics And Compare Baseline
AI action: Pull gateway, broker, and authentication telemetry and compare to last-known-good baseline.
Human checkpoint: Infrastructure lead validates that the evidence pack is complete enough to support next actions.
Guardrail: Escalate to manual triage if telemetry sources are missing, stale, or contradictory.
Draft Initial Communications
AI action: Prepare an internal support note, stakeholder holding statement, and short leadership summary using current facts only.
Human checkpoint: Service owner approves anything sent beyond the internal support note.
Guardrail: AI may draft messages, but never send leadership or stakeholder communication autonomously.
Recommend Workaround Or Containment
AI action: Recommend the least-risk workaround using recent incident patterns and known-good recovery paths.
Human checkpoint: Network or infrastructure lead approves any routing change, failover, or service-impacting workaround.
Guardrail: AI may recommend changes but may not execute network, firewall, or gateway actions.
Prepare Vendor Escalation Packet
AI action: Assemble timestamps, impacted scope, comparative metrics, and evidence into a vendor-ready case draft.
Human checkpoint: Incident commander confirms the facts and sends the case.
Guardrail: Keep language factual and evidence-based; do not assign fault or speculate beyond the data.