Runbook Governance

Execute repeatable incident response with governed AI support.

The AI runbook is now a route-native part of the production web shell, so live execution guidance can evolve with the app instead of staying trapped in legacy demo HTML.

Runbook ID: NT-22053-123Incident: INC-2041Mode: AI-assisted + human approvals

current build

v3.2

Current runbook version

automation ready

4 / 6

AI-executable steps

human controlled

Approval gates

service comms

< 10 min

Target to first status update

Trigger Conditions And Preconditions

Invoke the runbook only when signal, scope, and ownership are clear enough to support structured execution.

Severity 2+ only

Signal / Check	Threshold Or Rule	Source	Runbook Decision
Remote access latency sustained	P95 latency above 280 ms for 10 minutes	Gateway telemetry + synthetic check	Open a shared-service incident, declare severity, and attach the runbook
Cross-division impact	25+ affected users or 3+ divisions impacted	Ticket correlation + service desk spike	Treat as a shared service issue, not a local workstation issue
No planned change in effect	No active maintenance, routing, or patch window	Change calendar	Prevent false-positive execution
Human ownership established	Incident commander and service owner named before step 3	On-call roster	Allow AI execution under accountable oversight

Execution Sequence

Each step defines what AI may do, what a human must confirm, and where the guardrail sits.

Detect And Open Incident

AI action: Correlate gateway latency, session failure rate, and service desk spike; draft the incident summary and severity proposal.

Human checkpoint: Incident commander confirms severity and accountable owner.

Guardrail: Stop if the issue is isolated or tied to planned maintenance.

Gather Diagnostics And Compare Baseline

AI action: Pull gateway, broker, and authentication telemetry and compare to last-known-good baseline.

Human checkpoint: Infrastructure lead validates that the evidence pack is complete enough to support next actions.

Guardrail: Escalate to manual triage if telemetry sources are missing, stale, or contradictory.

Draft Initial Communications

AI action: Prepare an internal support note, stakeholder holding statement, and short leadership summary using current facts only.

Human checkpoint: Service owner approves anything sent beyond the internal support note.

Guardrail: AI may draft messages, but never send leadership or stakeholder communication autonomously.

Recommend Workaround Or Containment

AI action: Recommend the least-risk workaround using recent incident patterns and known-good recovery paths.

Human checkpoint: Network or infrastructure lead approves any routing change, failover, or service-impacting workaround.

Guardrail: AI may recommend changes but may not execute network, firewall, or gateway actions.

Prepare Vendor Escalation Packet

AI action: Assemble timestamps, impacted scope, comparative metrics, and evidence into a vendor-ready case draft.

Human checkpoint: Incident commander confirms the facts and sends the case.

Guardrail: Keep language factual and evidence-based; do not assign fault or speculate beyond the data.