Safe AI-Driven Incident Response Automation
Problem
The appeal of AI-driven incident response automation is clear: security incidents happen at 3 AM, analysts are asleep, and a known-bad IP is scanning your network or a compromised credential is accessing production. An automated system that can block the IP, revoke the credential, and isolate the affected host in seconds — without waiting for an on-call engineer to wake up and connect their laptop — provides a response speed advantage that human-only IR cannot match.
SOAR platforms have offered rule-based automation for years. The difference AI brings is the ability to handle novel situations — making decisions about whether to isolate a host based on a combination of signals that don’t match any predefined rule, or determining whether a credential access pattern represents compromise or a legitimate unusual use case. This flexibility is also the source of the risk.
The blast radius problem: automated IR actions are not reversible in the same way a human analyst’s investigations are reversible. A human analyst who investigates and concludes an alert is benign has spent time and moved on. An automated system that concludes an alert is a compromise and acts on that conclusion may have:
- Isolated a production host, taking down a service for thousands of users
- Revoked a service account credential, breaking a payment pipeline
- Blocked a CIDR block, cutting off a large customer’s access
- Initiated a forensic snapshot that locks a VM and prevents normal operations
- Sent a security notification to an executive about a “confirmed incident” that turned out to be a false positive
Each of these actions has a reversal procedure, but reversal takes time, causes secondary alerts, and damages trust in both the automated system and the security team.
The failure mode that matters most is not “AI takes no action when it should” — that is the same as the current state without automation. The failure mode that matters is “AI takes wrong action with high confidence, causing service disruption and burning trust in security tooling.”
The correct design is a tiered action model: some actions are low-blast-radius and can be taken automatically; others are high-blast-radius and require human approval even at 3 AM; and for every automated action, a rollback procedure must exist and be tested.
Target systems: any organisation building or deploying AI-assisted SOAR (Security Orchestration, Automation, and Response); security teams integrating LLMs into incident response workflows; platform teams responsible for automated remediation tooling.
Threat Model
The threats addressed by this article are operational risks rather than external adversaries:
Risk 1 — AI false positive causes production outage. AI identifies legitimate high-volume database queries as an exfiltration attempt. Automated response isolates the database server. 100,000 users see service errors for 45 minutes while the isolation is reversed.
Risk 2 — AI revokes credential based on incorrect context. AI determines that a service account is compromised based on unusual access time (3 AM). The access was from an approved overnight batch job that was recently rescheduled. Credential revocation breaks the batch job; data pipeline misses an SLA.
Risk 3 — Automated response creates a larger incident. AI isolates a host that was sending suspicious traffic. The host was the primary NAT gateway for an office. Isolation cuts off 50 engineers from all company resources. The original security incident (a scanning tool misconfiguration) was benign.
Risk 4 — Cascading automated actions. AI revokes a service account credential. The service account was used by a monitoring agent. The monitoring agent stops reporting. A second AI system detects the monitoring gap and escalates, triggering additional automated responses. An action cascade amplifies a minor misconfiguration into a multi-system outage.
Configuration / Implementation
Step 1 — Define the action tier model
Classify every potential automated IR action by blast radius before building any automation:
# incident-response-action-tiers.yaml
# Classification of IR actions by blast radius and reversibility
tier_1_auto_allowed:
# Low blast radius, easily reversible, time-sensitive
# AI can execute without human approval
actions:
- name: add_ip_to_watchlist
description: Flag an IP for enhanced monitoring only
blast_radius: none
reversible: immediate
- name: increase_logging_verbosity
description: Enable debug logging for a service for 1 hour
blast_radius: minor_performance
reversible: automatic_after_ttl
- name: create_alert_ticket
description: Create a ticket in the ticketing system
blast_radius: none
reversible: immediate
- name: send_analyst_notification
description: Page the on-call analyst with context
blast_radius: analyst_interruption
reversible: n/a
- name: quarantine_email
description: Move a suspicious email to quarantine folder
blast_radius: single_user_email
reversible: immediate
tier_2_human_approval_required:
# Moderate blast radius or slow/costly reversal
# AI prepares the action; human approves within 15 minutes
actions:
- name: block_ip_at_firewall
description: Add a block rule for a specific IP
blast_radius: may_affect_legitimate_users_from_that_ip
reversible: minutes
approval_timeout_minutes: 15
- name: disable_user_account_temporarily
description: Suspend a user account for 2 hours
blast_radius: single_user
reversible: minutes
approval_timeout_minutes: 15
- name: revoke_api_key
description: Revoke a specific API key
blast_radius: services_using_that_key
reversible: requires_key_rotation
approval_timeout_minutes: 10
tier_3_never_automated:
# High blast radius or irreversible
# Always requires human execution regardless of AI recommendation
actions:
- name: isolate_production_host
description: Cut network access to a production server
blast_radius: service_outage_for_users
rationale: Risk of false positive outage exceeds benefit of speed
- name: revoke_service_account_credential
description: Revoke a credential used by production services
blast_radius: dependent_services_fail
rationale: Cascading failures require human impact assessment
- name: block_cidr_range
description: Block an entire CIDR block
blast_radius: may_affect_thousands_of_users
rationale: IP block at range level is imprecise; human must assess scope
- name: initiate_forensic_capture
description: Take a forensic snapshot of a running VM
blast_radius: vm_performance_impact_and_potential_lock
rationale: Legal and operational implications require human judgment
- name: send_breach_notification
description: Send a regulatory or customer breach notification
blast_radius: irreversible_legal_notification
rationale: Absolutely requires human sign-off
Step 2 — Build the human-in-the-loop approval workflow
# ir_automation.py
import anthropic
import asyncio
from dataclasses import dataclass
from enum import Enum
from datetime import datetime, timedelta
import json
client = anthropic.Anthropic()
class ActionTier(Enum):
AUTO = "auto"
APPROVAL_REQUIRED = "approval_required"
NEVER_AUTOMATED = "never_automated"
@dataclass
class IRAction:
name: str
tier: ActionTier
target: str
parameters: dict
ai_confidence: float
ai_reasoning: str
approval_timeout_minutes: int = 15
class IRAutomationEngine:
TIER_MAP = {
"add_ip_to_watchlist": ActionTier.AUTO,
"increase_logging_verbosity": ActionTier.AUTO,
"create_alert_ticket": ActionTier.AUTO,
"send_analyst_notification": ActionTier.AUTO,
"block_ip_at_firewall": ActionTier.APPROVAL_REQUIRED,
"disable_user_account_temporarily": ActionTier.APPROVAL_REQUIRED,
"revoke_api_key": ActionTier.APPROVAL_REQUIRED,
"isolate_production_host": ActionTier.NEVER_AUTOMATED,
"revoke_service_account_credential": ActionTier.NEVER_AUTOMATED,
"block_cidr_range": ActionTier.NEVER_AUTOMATED,
}
async def analyse_and_respond(self, alert: dict) -> list[IRAction]:
"""Generate IR actions for an alert using AI analysis."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1500,
system="""You are a security incident response system. Analyse security alerts
and recommend response actions. You must ONLY recommend actions from the
approved action list. You must NEVER recommend host isolation, service account
revocation, CIDR blocking, or breach notifications — these require human decision.
Approved automated actions: add_ip_to_watchlist, increase_logging_verbosity,
create_alert_ticket, send_analyst_notification, block_ip_at_firewall (requires approval),
disable_user_account_temporarily (requires approval), revoke_api_key (requires approval).
For each recommended action, provide confidence (0-100) and reasoning.""",
messages=[{
"role": "user",
"content": f"""Alert: {json.dumps(alert, indent=2)}
Recommend response actions as a JSON list:
[{{"action": "action_name", "target": "target_id", "parameters": {{}}, "confidence": 0-100, "reasoning": "..."}}]
Only include actions where confidence >= 70."""
}]
)
try:
actions_raw = json.loads(
response.content[0].text.strip().strip('`').strip()
)
except json.JSONDecodeError:
# Fallback: at minimum, create a ticket
actions_raw = [{
"action": "create_alert_ticket",
"target": alert.get("id"),
"parameters": {"alert": alert},
"confidence": 100,
"reasoning": "AI response parsing failed — creating ticket for human review"
}]
ir_actions = []
for a in actions_raw:
action_name = a.get("action")
tier = self.TIER_MAP.get(action_name, ActionTier.NEVER_AUTOMATED)
# Additional safety: never automate tier-3 actions even if AI recommends them
if tier == ActionTier.NEVER_AUTOMATED:
# Downgrade to notification only
ir_actions.append(IRAction(
name="send_analyst_notification",
tier=ActionTier.AUTO,
target=alert.get("id"),
parameters={
"message": f"AI recommended '{action_name}' (tier-3 action) — requires human decision",
"original_action": a,
},
ai_confidence=a.get("confidence", 0),
ai_reasoning=f"Downgraded from {action_name}: tier-3 action requires human",
))
continue
ir_actions.append(IRAction(
name=action_name,
tier=tier,
target=a.get("target"),
parameters=a.get("parameters", {}),
ai_confidence=a.get("confidence", 0),
ai_reasoning=a.get("reasoning", ""),
))
return ir_actions
async def execute_with_approval(self, action: IRAction) -> dict:
"""Execute an action, getting approval for tier-2 actions."""
if action.tier == ActionTier.AUTO:
return await self._execute_action(action)
elif action.tier == ActionTier.APPROVAL_REQUIRED:
# Send approval request to on-call
approval = await self._request_approval(action)
if approval:
return await self._execute_action(action)
else:
return {"status": "declined", "action": action.name}
else: # NEVER_AUTOMATED
# This should not happen — see downgrade above
return {"status": "blocked", "reason": "tier-3 action blocked"}
async def _request_approval(self, action: IRAction) -> bool:
"""Send approval request and wait for human response."""
# Integration point for PagerDuty, Slack, etc.
# Returns True if approved within timeout
print(f"APPROVAL REQUIRED: {action.name} on {action.target}")
print(f"AI Confidence: {action.ai_confidence}%")
print(f"Reasoning: {action.ai_reasoning}")
print(f"Approve? (y/N, timeout {action.approval_timeout_minutes}m): ")
# In production: async wait for webhook response
return False # Default to safe
async def _execute_action(self, action: IRAction) -> dict:
"""Execute an approved action with rollback tracking."""
print(f"EXECUTING: {action.name} on {action.target}")
# Integration point for your IR platform
# Must record rollback information
return {"status": "executed", "action": action.name, "target": action.target}
Step 3 — Require rollback procedures for every automated action
# rollback_registry.py
# Every automated action must have a registered rollback procedure
ROLLBACK_PROCEDURES = {
"block_ip_at_firewall": {
"command": "iptables -D INPUT -s {ip} -j DROP",
"verify": "iptables -L INPUT | grep {ip}",
"max_auto_duration_hours": 24, # Auto-unblock after 24 hours
},
"disable_user_account_temporarily": {
"command": "usermod -U {username}", # Enable account
"verify": "id {username}",
"max_auto_duration_hours": 2,
},
"quarantine_email": {
"command": "move email from quarantine to inbox",
"verify": "check inbox",
"max_auto_duration_hours": 48,
},
"add_ip_to_watchlist": {
"command": "remove_from_watchlist({ip})",
"verify": "check watchlist",
"max_auto_duration_hours": 72,
},
}
def register_action_for_rollback(action: IRAction, execution_timestamp: datetime) -> str:
"""Register an executed action for automatic rollback if not confirmed."""
rollback_info = ROLLBACK_PROCEDURES.get(action.name)
if not rollback_info:
raise ValueError(f"No rollback procedure for {action.name}")
rollback_at = execution_timestamp + timedelta(
hours=rollback_info["max_auto_duration_hours"]
)
# Store in database/queue for scheduled rollback
rollback_record = {
"action": action.name,
"target": action.target,
"parameters": action.parameters,
"executed_at": execution_timestamp.isoformat(),
"rollback_at": rollback_at.isoformat(),
"rollback_command": rollback_info["command"].format(**action.parameters),
"status": "pending_confirmation", # Analyst must confirm to keep the action
}
print(f"Rollback scheduled: {action.name} will be reversed at {rollback_at}")
return rollback_at.isoformat()
Step 4 — Test automated IR actions in a staging environment
#!/bin/bash
# test-ir-automation.sh — verify automated actions and rollbacks work correctly
# Test: block an IP in staging
TEST_IP="203.0.113.100"
echo "Testing IP block action..."
# Execute the block
iptables -I INPUT -s "$TEST_IP" -j DROP
# Verify the block is in place
iptables -L INPUT | grep "$TEST_IP" && echo "PASS: IP blocked" || echo "FAIL: IP not blocked"
# Test rollback
echo "Testing rollback..."
iptables -D INPUT -s "$TEST_IP" -j DROP
# Verify rollback
! iptables -L INPUT | grep -q "$TEST_IP" && echo "PASS: Rollback successful" || echo "FAIL: Rollback failed"
echo "IR automation test complete"
Expected Behaviour
| Action type | Without tier model | With tier model |
|---|---|---|
| Production host isolation | AI executes automatically | Blocked (tier-3); analyst notification only |
| CIDR block | AI executes automatically | Blocked (tier-3); analyst must make the call |
| IP watchlist addition | Requires human approval | AI executes immediately (tier-1) |
| Specific IP block at firewall | AI executes automatically | Human approval required within 15 minutes (tier-2) |
| Tier-2 action not approved in 15 minutes | Action waits indefinitely | Times out; creates ticket; analyst paged again |
| Any automated action | No rollback | Rollback registered; auto-reverses after TTL if not confirmed |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Tier-3 actions never automated | Eliminates production outage from false positive | Loses speed advantage for high-impact responses | Tier-2 approval workflow gets a human decision in <15 minutes for most cases; speed loss is acceptable |
| 15-minute approval window | Balances speed and human oversight | At 3 AM, 15 minutes to get a response may fail | Configure escalation: if primary on-call doesn’t respond in 5 minutes, page secondary; if no response in 15 minutes, escalate to manager |
| Automatic rollback after TTL | Prevents long-lived incorrect automations | Rolls back valid actions if analyst forgets to confirm | Configure confirmation reminder alerts at TTL - 2 hours |
| AI cannot recommend tier-3 actions | Prevents AI from creating false urgency for irreversible actions | AI may be overly conservative | Review AI recommendations and tier map quarterly; consider upgrading some tier-3 to tier-2 as confidence in the system grows |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Approval timeout causes action to not execute | Genuine attack continues; no response taken | Analyst checks IR log; no action executed despite tier-2 recommendation | Page a different on-call; manually execute; review approval timeout configuration |
| Rollback fires on a valid action | A correctly-blocked IP is unblocked by automatic rollback | Malicious IP resumes activity; alert re-fires | Extend TTL before automatic rollback fires; require confirmation to keep actions beyond TTL |
| Cascading actions from alert flood | Multiple tier-1 actions executed in response to 100 correlated alerts | IR log shows 100 actions in 5 minutes; watchlist full | Rate-limit automated action execution to N actions per minute; require human approval above threshold |
| AI misclassifies tier — recommends tier-3 as tier-2 | Tier-3 action enters approval queue | Safety check in execute_with_approval catches it; logs warning | The TIER_MAP is the authoritative tier assignment; AI recommendation never overrides it |
Related Articles
- Incident Response Hardening Playbook — the manual IR process that automated actions supplement, not replace
- Security Automation SOAR — the SOAR platform layer that AI-driven IR automation integrates with
- AI Alert Triage and Escalation — the triage step that precedes and feeds into automated response decisions
- Zero-Day Response Playbook — emergency response procedures where AI automation provides speed for the initial containment phase
- Cloud Audit Log Tampering Detection — protecting the logs that IR automation actions create and depend on