SysOps AI Runbook

AI Runbook

Governed runbook showing how SysOps teams can combine human ownership, AI-executable steps, and auditable prompts during incident response.

Session requiredAuthenticated routes expect a real session token or cookie.API base: https://dev.sysopsai.net/api
Runbook Governance

Execute repeatable incident response with governed AI support.

The AI runbook is now a route-native part of the production web shell, so live execution guidance can evolve with the app instead of staying trapped in legacy demo HTML.

Runbook ID: NT-22053-123Incident: INC-2041Mode: AI-assisted + human approvals
current build
v3.2

Current runbook version

automation ready
4 / 6

AI-executable steps

human controlled
2

Approval gates

service comms
< 10 min

Target to first status update

Trigger Conditions And Preconditions

Invoke the runbook only when signal, scope, and ownership are clear enough to support structured execution.

Severity 2+ only
Signal / CheckThreshold Or RuleSourceRunbook Decision
Remote access latency sustainedP95 latency above 280 ms for 10 minutesGateway telemetry + synthetic checkOpen a shared-service incident, declare severity, and attach the runbook
Cross-division impact25+ affected users or 3+ divisions impactedTicket correlation + service desk spikeTreat as a shared service issue, not a local workstation issue
No planned change in effectNo active maintenance, routing, or patch windowChange calendarPrevent false-positive execution
Human ownership establishedIncident commander and service owner named before step 3On-call rosterAllow AI execution under accountable oversight

Execution Sequence

Each step defines what AI may do, what a human must confirm, and where the guardrail sits.

1

Detect And Open Incident

AI action: Correlate gateway latency, session failure rate, and service desk spike; draft the incident summary and severity proposal.

Human checkpoint: Incident commander confirms severity and accountable owner.

Guardrail: Stop if the issue is isolated or tied to planned maintenance.

2

Gather Diagnostics And Compare Baseline

AI action: Pull gateway, broker, and authentication telemetry and compare to last-known-good baseline.

Human checkpoint: Infrastructure lead validates that the evidence pack is complete enough to support next actions.

Guardrail: Escalate to manual triage if telemetry sources are missing, stale, or contradictory.

3

Draft Initial Communications

AI action: Prepare an internal support note, stakeholder holding statement, and short leadership summary using current facts only.

Human checkpoint: Service owner approves anything sent beyond the internal support note.

Guardrail: AI may draft messages, but never send leadership or stakeholder communication autonomously.

4

Recommend Workaround Or Containment

AI action: Recommend the least-risk workaround using recent incident patterns and known-good recovery paths.

Human checkpoint: Network or infrastructure lead approves any routing change, failover, or service-impacting workaround.

Guardrail: AI may recommend changes but may not execute network, firewall, or gateway actions.

5

Prepare Vendor Escalation Packet

AI action: Assemble timestamps, impacted scope, comparative metrics, and evidence into a vendor-ready case draft.

Human checkpoint: Incident commander confirms the facts and sends the case.

Guardrail: Keep language factual and evidence-based; do not assign fault or speculate beyond the data.