ODOE AI Runbook

Trigger Conditions And Preconditions

Invoke the runbook only when the signal, scope, and ownership conditions are clear enough to support structured execution under the severity framework.

Severity 2+ only

Signal / Check	Threshold Or Rule	Source	Runbook Decision
Remote access latency sustained	P95 latency above 280 ms for 10 minutes	Gateway telemetry + synthetic check	Open a shared-service incident, declare severity, and attach the runbook
Cross-division impact	25+ affected users or 3+ divisions impacted	Ticket correlation + service desk spike	Treat as a shared service issue, not a local workstation issue
No planned change in effect	No active maintenance, routing, or patch window	Change calendar	Prevent false-positive execution
Human ownership established	Incident commander and service owner named before step 3	On-call roster	Allow AI execution under accountable oversight

Execution Sequence

Each step defines what AI may do, what a human must confirm, and what evidence gets written back to the incident.

1 Trigger validation

Detect And Open Incident

Validate shared-service scope and attach the runbook.

AI-executable

AI Action

Correlate gateway latency, session failure rate, and service desk spike; draft the incident summary and severity proposal.

Human Checkpoint

Incident commander confirms severity and accountable owner.

Write-Back

Incident summary, blast-radius estimate, and runbook attachment logged to the ticket.

Guardrail

Stop if the issue is isolated to one user, one endpoint, or a planned maintenance window.

2 Evidence pack

Gather Diagnostics And Compare Baseline

Build a consistent evidence set before deeper action.

AI-executable

AI Action

Pull gateway, broker, and authentication telemetry; compare current state to the last-known-good baseline.

Human Checkpoint

Infrastructure lead validates that the evidence pack is complete enough to support next actions.

Write-Back

Diagnostics bundle, baseline diff, and suspected failure domain attached to the incident.

Guardrail

Escalate to manual triage if telemetry sources are missing, stale, or contradictory.

3 Communication drafts

Draft Initial Communications

Prepare clear updates without letting AI publish them autonomously.

Approval required

AI Action

Prepare an internal support note, stakeholder holding statement, and short leadership summary using current facts only.

Human Checkpoint

Service owner approves anything sent beyond the internal support note.

Write-Back

Communication drafts saved with requested send or hold decision.

Guardrail

AI may draft messages, but never send leadership or stakeholder communication autonomously.

4 Containment planning

Recommend Workaround Or Containment

Use AI for structured options, not autonomous service-impacting change.

Human-approved

AI Action

Recommend the least-risk workaround using recent incident patterns and known-good recovery paths.

Human Checkpoint

Network or infrastructure lead approves any routing change, failover, or service-impacting workaround.

Write-Back

Recommended action, risk summary, and rollback note appended to the ticket.

Guardrail

AI may recommend changes but may not execute network, firewall, or gateway actions.

5 Vendor packet

Prepare Vendor Escalation Packet

Convert the evidence pack into a usable vendor case quickly and consistently.

AI-prepared

AI Action

Assemble timestamps, impacted scope, comparative metrics, and evidence into a vendor-ready case draft.

Human Checkpoint

Incident commander confirms the facts and sends the case.

Write-Back

Vendor packet, case ID placeholder, and next-response expectation logged to the incident.

Guardrail

Keep language factual and evidence-based; do not assign fault or speculate beyond the data.

6 Recovery validation

Verify Recovery And Prepare Closure

Close the loop only after service and governance conditions are both met.

Approval required

AI Action

Monitor latency recovery, confirm session success trend, and draft recovery note plus follow-up tasks.

Human Checkpoint

Incident commander confirms service restoration, closure readiness, and post-incident owner.

Write-Back

Recovery confirmation, closure draft, and follow-up work items posted to the incident.

Guardrail

AI may not close the incident or mark service restored without human confirmation.

AI Prompt Pack

These are the governed instructions an AI operator would receive during execution.

Prompt-driven

System / Control Prompt

Sets the AI role, permissions, and required output format before execution begins.

Role: governed incident runbook executor for ODOE IT.
Runbook: NT-22053-123 Remote Access Latency Response.

Objectives:
- reduce time to reliable diagnosis
- keep communication factual and timely
- produce auditable outputs at each step

Rules:
- never change routing, gateway configuration, firewall policy, or incident status without human approval
- never send leadership or stakeholder communication without human approval
- when evidence is incomplete, say so and request the next human decision

Return format:
1. Situation
2. Evidence
3. Recommended Next Step
4. Ticket Updates

Execution Prompt

Provides the live incident context and the exact tasks to complete inside the guardrails.

Incident: INC-2041
Service: Remote access

Current signal:
- latency p95 = 312 ms for 14 minutes
- 41 affected users across 3 divisions
- no approved change window is active

Tasks:
1. Validate runbook trigger conditions.
2. Assemble diagnostics bundle from gateway, broker, and auth telemetry.
3. Produce blast-radius summary.
4. Draft internal support note and stakeholder holding statement.
5. Draft vendor escalation packet.

Do not execute routing, failover, or closure actions.

Vendor Escalation Prompt

Used only after diagnostics are attached and the incident commander approves vendor escalation.

Draft a vendor escalation using the attached evidence pack.

Include:
- incident start time and timeline highlights
- impacted user scope and affected divisions
- current latency and session failure indicators
- comparison to the last-known-good baseline
- current workaround status
- requested vendor action in the next 30 minutes

Keep tone factual. Do not assign blame. Flag any missing evidence explicitly.

AI-Executable Runbook Contract

The same runbook is represented in machine-readable form so an AI operator can follow the rules consistently and write outputs back to the incident.

Embedded JSON contract

runbook_id: NT-22053-123
title: Remote Access Latency Response
execution_mode: ai_assisted_with_human_approval
allowed_tools:
  - telemetry.read
  - baseline.compare
  - ticket.update
  - timeline.append
  - communications.draft
  - artifact.attach
  - vendor.case_draft
blocked_tools:
  - network.change
  - gateway.failover
  - firewall.change
  - communications.send
  - incident.close
invoke_when:
  - remote_access_latency_p95 > 280ms for 10m
  - affected_users >= 25
required_context:
  - incident_id
  - incident_commander
  - service_owner
  - last_known_good_baseline
  - current_gateway_metrics
write_back_to_ticket:
  - blast_radius_summary
  - diagnostics_bundle_reference
  - communication_drafts
  - vendor_packet
  - next_human_decision
success_criteria:
  - remote_access_latency_p95 < 140ms for 15m
  - incident_commander_confirms_service_restored = true

Human Control Boundaries

AI supports speed and structure; humans retain accountability for risk-bearing decisions.

Human ownership starts before automation

The runbook should not proceed past diagnosis unless an incident commander and service owner are named.

AI drafts, humans send

Internal notes can be prepared automatically, but stakeholder and leadership messages require explicit approval before sending.

Risk-bearing changes require approval

Routing changes, gateway failover, firewall updates, and incident closure remain human decisions.

Every action writes to the audit trail

The value of AI is not only speed. It is also repeatable evidence and cleaner operational history.

Allowed And Blocked Actions

The runbook stays safe because the action set is explicit before execution begins.

AI may use

telemetry.read baseline.compare ticket.update timeline.append communications.draft artifact.attach vendor.case_draft

Never autonomous

network.change gateway.failover firewall.change communications.send incident.close access.provision

Outputs Written Back To The Incident

Each execution step should leave evidence behind, not just recommendations.

Situation Summary: Incident narrative, severity proposal, and initial blast-radius estimate appended to the timeline.

Diagnostics Bundle: Gateway telemetry, baseline comparison, and evidence pack reference attached to the incident.

Communication Drafts: Internal support note, stakeholder holding statement, and short leadership update saved for approval.

Vendor Packet: Escalation draft with timestamps, scope, evidence, and requested next action prepared for human send.

Recovery Note: Recovery validation and follow-up actions drafted before closure is considered.

Success And Stop Conditions

Execution should end on explicit service and governance criteria, not on optimism.

Continue while the service signal is still breached

Latency remains above threshold or cross-division impact is still visible.

Escalate manually when evidence is incomplete

If the AI cannot assemble a reliable evidence set, the runbook should hand off clearly instead of guessing.

Stop only after measurable recovery

Recovery requires latency below 140 ms for 15 minutes and incident commander confirmation that users are stable.

Do not close until governance steps are complete

Closure waits for communications, vendor status, and follow-up tasks to be written back to the ticket.

AI-Enabled Incident Runbook

Execute repeatable incident response with governed AI support.

Trigger Conditions And Preconditions

Execution Sequence

Detect And Open Incident

AI Action

Human Checkpoint

Write-Back

Guardrail

Gather Diagnostics And Compare Baseline

AI Action

Human Checkpoint

Write-Back

Guardrail

Draft Initial Communications

AI Action

Human Checkpoint

Write-Back

Guardrail

Recommend Workaround Or Containment

AI Action

Human Checkpoint

Write-Back

Guardrail

Prepare Vendor Escalation Packet

AI Action

Human Checkpoint

Write-Back

Guardrail

Verify Recovery And Prepare Closure

AI Action

Human Checkpoint

Write-Back

Guardrail

AI Prompt Pack

System / Control Prompt

Execution Prompt

Vendor Escalation Prompt

AI-Executable Runbook Contract