Safe AI-Driven Incident Response Automation

Safe AI-Driven Incident Response Automation

Problem

The appeal of AI-driven incident response automation is clear: security incidents happen at 3 AM, analysts are asleep, and a known-bad IP is scanning your network or a compromised credential is accessing production. An automated system that can block the IP, revoke the credential, and isolate the affected host in seconds — without waiting for an on-call engineer to wake up and connect their laptop — provides a response speed advantage that human-only IR cannot match.

SOAR platforms have offered rule-based automation for years. The difference AI brings is the ability to handle novel situations — making decisions about whether to isolate a host based on a combination of signals that don’t match any predefined rule, or determining whether a credential access pattern represents compromise or a legitimate unusual use case. This flexibility is also the source of the risk.

The blast radius problem: automated IR actions are not reversible in the same way a human analyst’s investigations are reversible. A human analyst who investigates and concludes an alert is benign has spent time and moved on. An automated system that concludes an alert is a compromise and acts on that conclusion may have:

  • Isolated a production host, taking down a service for thousands of users
  • Revoked a service account credential, breaking a payment pipeline
  • Blocked a CIDR block, cutting off a large customer’s access
  • Initiated a forensic snapshot that locks a VM and prevents normal operations
  • Sent a security notification to an executive about a “confirmed incident” that turned out to be a false positive

Each of these actions has a reversal procedure, but reversal takes time, causes secondary alerts, and damages trust in both the automated system and the security team.

The failure mode that matters most is not “AI takes no action when it should” — that is the same as the current state without automation. The failure mode that matters is “AI takes wrong action with high confidence, causing service disruption and burning trust in security tooling.”

The correct design is a tiered action model: some actions are low-blast-radius and can be taken automatically; others are high-blast-radius and require human approval even at 3 AM; and for every automated action, a rollback procedure must exist and be tested.

Target systems: any organisation building or deploying AI-assisted SOAR (Security Orchestration, Automation, and Response); security teams integrating LLMs into incident response workflows; platform teams responsible for automated remediation tooling.


Threat Model

The threats addressed by this article are operational risks rather than external adversaries:

Risk 1 — AI false positive causes production outage. AI identifies legitimate high-volume database queries as an exfiltration attempt. Automated response isolates the database server. 100,000 users see service errors for 45 minutes while the isolation is reversed.

Risk 2 — AI revokes credential based on incorrect context. AI determines that a service account is compromised based on unusual access time (3 AM). The access was from an approved overnight batch job that was recently rescheduled. Credential revocation breaks the batch job; data pipeline misses an SLA.

Risk 3 — Automated response creates a larger incident. AI isolates a host that was sending suspicious traffic. The host was the primary NAT gateway for an office. Isolation cuts off 50 engineers from all company resources. The original security incident (a scanning tool misconfiguration) was benign.

Risk 4 — Cascading automated actions. AI revokes a service account credential. The service account was used by a monitoring agent. The monitoring agent stops reporting. A second AI system detects the monitoring gap and escalates, triggering additional automated responses. An action cascade amplifies a minor misconfiguration into a multi-system outage.


Configuration / Implementation

Step 1 — Define the action tier model

Classify every potential automated IR action by blast radius before building any automation:

# incident-response-action-tiers.yaml
# Classification of IR actions by blast radius and reversibility

tier_1_auto_allowed:
  # Low blast radius, easily reversible, time-sensitive
  # AI can execute without human approval
  actions:
  - name: add_ip_to_watchlist
    description: Flag an IP for enhanced monitoring only
    blast_radius: none
    reversible: immediate
    
  - name: increase_logging_verbosity
    description: Enable debug logging for a service for 1 hour
    blast_radius: minor_performance
    reversible: automatic_after_ttl
    
  - name: create_alert_ticket
    description: Create a ticket in the ticketing system
    blast_radius: none
    reversible: immediate
    
  - name: send_analyst_notification
    description: Page the on-call analyst with context
    blast_radius: analyst_interruption
    reversible: n/a
    
  - name: quarantine_email
    description: Move a suspicious email to quarantine folder
    blast_radius: single_user_email
    reversible: immediate

tier_2_human_approval_required:
  # Moderate blast radius or slow/costly reversal
  # AI prepares the action; human approves within 15 minutes
  actions:
  - name: block_ip_at_firewall
    description: Add a block rule for a specific IP
    blast_radius: may_affect_legitimate_users_from_that_ip
    reversible: minutes
    approval_timeout_minutes: 15
    
  - name: disable_user_account_temporarily
    description: Suspend a user account for 2 hours
    blast_radius: single_user
    reversible: minutes
    approval_timeout_minutes: 15
    
  - name: revoke_api_key
    description: Revoke a specific API key
    blast_radius: services_using_that_key
    reversible: requires_key_rotation
    approval_timeout_minutes: 10

tier_3_never_automated:
  # High blast radius or irreversible
  # Always requires human execution regardless of AI recommendation
  actions:
  - name: isolate_production_host
    description: Cut network access to a production server
    blast_radius: service_outage_for_users
    rationale: Risk of false positive outage exceeds benefit of speed
    
  - name: revoke_service_account_credential
    description: Revoke a credential used by production services
    blast_radius: dependent_services_fail
    rationale: Cascading failures require human impact assessment
    
  - name: block_cidr_range
    description: Block an entire CIDR block
    blast_radius: may_affect_thousands_of_users
    rationale: IP block at range level is imprecise; human must assess scope
    
  - name: initiate_forensic_capture
    description: Take a forensic snapshot of a running VM
    blast_radius: vm_performance_impact_and_potential_lock
    rationale: Legal and operational implications require human judgment
    
  - name: send_breach_notification
    description: Send a regulatory or customer breach notification
    blast_radius: irreversible_legal_notification
    rationale: Absolutely requires human sign-off

Step 2 — Build the human-in-the-loop approval workflow

# ir_automation.py
import anthropic
import asyncio
from dataclasses import dataclass
from enum import Enum
from datetime import datetime, timedelta
import json

client = anthropic.Anthropic()

class ActionTier(Enum):
    AUTO = "auto"
    APPROVAL_REQUIRED = "approval_required"
    NEVER_AUTOMATED = "never_automated"

@dataclass
class IRAction:
    name: str
    tier: ActionTier
    target: str
    parameters: dict
    ai_confidence: float
    ai_reasoning: str
    approval_timeout_minutes: int = 15

class IRAutomationEngine:
    
    TIER_MAP = {
        "add_ip_to_watchlist": ActionTier.AUTO,
        "increase_logging_verbosity": ActionTier.AUTO,
        "create_alert_ticket": ActionTier.AUTO,
        "send_analyst_notification": ActionTier.AUTO,
        "block_ip_at_firewall": ActionTier.APPROVAL_REQUIRED,
        "disable_user_account_temporarily": ActionTier.APPROVAL_REQUIRED,
        "revoke_api_key": ActionTier.APPROVAL_REQUIRED,
        "isolate_production_host": ActionTier.NEVER_AUTOMATED,
        "revoke_service_account_credential": ActionTier.NEVER_AUTOMATED,
        "block_cidr_range": ActionTier.NEVER_AUTOMATED,
    }
    
    async def analyse_and_respond(self, alert: dict) -> list[IRAction]:
        """Generate IR actions for an alert using AI analysis."""
        
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1500,
            system="""You are a security incident response system. Analyse security alerts 
            and recommend response actions. You must ONLY recommend actions from the 
            approved action list. You must NEVER recommend host isolation, service account 
            revocation, CIDR blocking, or breach notifications — these require human decision.
            
            Approved automated actions: add_ip_to_watchlist, increase_logging_verbosity, 
            create_alert_ticket, send_analyst_notification, block_ip_at_firewall (requires approval),
            disable_user_account_temporarily (requires approval), revoke_api_key (requires approval).
            
            For each recommended action, provide confidence (0-100) and reasoning.""",
            messages=[{
                "role": "user",
                "content": f"""Alert: {json.dumps(alert, indent=2)}
                
Recommend response actions as a JSON list:
[{{"action": "action_name", "target": "target_id", "parameters": {{}}, "confidence": 0-100, "reasoning": "..."}}]

Only include actions where confidence >= 70."""
            }]
        )
        
        try:
            actions_raw = json.loads(
                response.content[0].text.strip().strip('`').strip()
            )
        except json.JSONDecodeError:
            # Fallback: at minimum, create a ticket
            actions_raw = [{
                "action": "create_alert_ticket",
                "target": alert.get("id"),
                "parameters": {"alert": alert},
                "confidence": 100,
                "reasoning": "AI response parsing failed — creating ticket for human review"
            }]
        
        ir_actions = []
        for a in actions_raw:
            action_name = a.get("action")
            tier = self.TIER_MAP.get(action_name, ActionTier.NEVER_AUTOMATED)
            
            # Additional safety: never automate tier-3 actions even if AI recommends them
            if tier == ActionTier.NEVER_AUTOMATED:
                # Downgrade to notification only
                ir_actions.append(IRAction(
                    name="send_analyst_notification",
                    tier=ActionTier.AUTO,
                    target=alert.get("id"),
                    parameters={
                        "message": f"AI recommended '{action_name}' (tier-3 action) — requires human decision",
                        "original_action": a,
                    },
                    ai_confidence=a.get("confidence", 0),
                    ai_reasoning=f"Downgraded from {action_name}: tier-3 action requires human",
                ))
                continue
            
            ir_actions.append(IRAction(
                name=action_name,
                tier=tier,
                target=a.get("target"),
                parameters=a.get("parameters", {}),
                ai_confidence=a.get("confidence", 0),
                ai_reasoning=a.get("reasoning", ""),
            ))
        
        return ir_actions
    
    async def execute_with_approval(self, action: IRAction) -> dict:
        """Execute an action, getting approval for tier-2 actions."""
        
        if action.tier == ActionTier.AUTO:
            return await self._execute_action(action)
        
        elif action.tier == ActionTier.APPROVAL_REQUIRED:
            # Send approval request to on-call
            approval = await self._request_approval(action)
            if approval:
                return await self._execute_action(action)
            else:
                return {"status": "declined", "action": action.name}
        
        else:  # NEVER_AUTOMATED
            # This should not happen — see downgrade above
            return {"status": "blocked", "reason": "tier-3 action blocked"}
    
    async def _request_approval(self, action: IRAction) -> bool:
        """Send approval request and wait for human response."""
        # Integration point for PagerDuty, Slack, etc.
        # Returns True if approved within timeout
        print(f"APPROVAL REQUIRED: {action.name} on {action.target}")
        print(f"AI Confidence: {action.ai_confidence}%")
        print(f"Reasoning: {action.ai_reasoning}")
        print(f"Approve? (y/N, timeout {action.approval_timeout_minutes}m): ")
        # In production: async wait for webhook response
        return False  # Default to safe
    
    async def _execute_action(self, action: IRAction) -> dict:
        """Execute an approved action with rollback tracking."""
        print(f"EXECUTING: {action.name} on {action.target}")
        # Integration point for your IR platform
        # Must record rollback information
        return {"status": "executed", "action": action.name, "target": action.target}

Step 3 — Require rollback procedures for every automated action

# rollback_registry.py
# Every automated action must have a registered rollback procedure

ROLLBACK_PROCEDURES = {
    "block_ip_at_firewall": {
        "command": "iptables -D INPUT -s {ip} -j DROP",
        "verify": "iptables -L INPUT | grep {ip}",
        "max_auto_duration_hours": 24,  # Auto-unblock after 24 hours
    },
    "disable_user_account_temporarily": {
        "command": "usermod -U {username}",  # Enable account
        "verify": "id {username}",
        "max_auto_duration_hours": 2,
    },
    "quarantine_email": {
        "command": "move email from quarantine to inbox",
        "verify": "check inbox",
        "max_auto_duration_hours": 48,
    },
    "add_ip_to_watchlist": {
        "command": "remove_from_watchlist({ip})",
        "verify": "check watchlist",
        "max_auto_duration_hours": 72,
    },
}

def register_action_for_rollback(action: IRAction, execution_timestamp: datetime) -> str:
    """Register an executed action for automatic rollback if not confirmed."""
    rollback_info = ROLLBACK_PROCEDURES.get(action.name)
    if not rollback_info:
        raise ValueError(f"No rollback procedure for {action.name}")
    
    rollback_at = execution_timestamp + timedelta(
        hours=rollback_info["max_auto_duration_hours"]
    )
    
    # Store in database/queue for scheduled rollback
    rollback_record = {
        "action": action.name,
        "target": action.target,
        "parameters": action.parameters,
        "executed_at": execution_timestamp.isoformat(),
        "rollback_at": rollback_at.isoformat(),
        "rollback_command": rollback_info["command"].format(**action.parameters),
        "status": "pending_confirmation",  # Analyst must confirm to keep the action
    }
    
    print(f"Rollback scheduled: {action.name} will be reversed at {rollback_at}")
    return rollback_at.isoformat()

Step 4 — Test automated IR actions in a staging environment

#!/bin/bash
# test-ir-automation.sh — verify automated actions and rollbacks work correctly

# Test: block an IP in staging
TEST_IP="203.0.113.100"

echo "Testing IP block action..."
# Execute the block
iptables -I INPUT -s "$TEST_IP" -j DROP

# Verify the block is in place
iptables -L INPUT | grep "$TEST_IP" && echo "PASS: IP blocked" || echo "FAIL: IP not blocked"

# Test rollback
echo "Testing rollback..."
iptables -D INPUT -s "$TEST_IP" -j DROP

# Verify rollback
! iptables -L INPUT | grep -q "$TEST_IP" && echo "PASS: Rollback successful" || echo "FAIL: Rollback failed"

echo "IR automation test complete"

Expected Behaviour

Action type Without tier model With tier model
Production host isolation AI executes automatically Blocked (tier-3); analyst notification only
CIDR block AI executes automatically Blocked (tier-3); analyst must make the call
IP watchlist addition Requires human approval AI executes immediately (tier-1)
Specific IP block at firewall AI executes automatically Human approval required within 15 minutes (tier-2)
Tier-2 action not approved in 15 minutes Action waits indefinitely Times out; creates ticket; analyst paged again
Any automated action No rollback Rollback registered; auto-reverses after TTL if not confirmed

Trade-offs

Aspect Benefit Cost Mitigation
Tier-3 actions never automated Eliminates production outage from false positive Loses speed advantage for high-impact responses Tier-2 approval workflow gets a human decision in <15 minutes for most cases; speed loss is acceptable
15-minute approval window Balances speed and human oversight At 3 AM, 15 minutes to get a response may fail Configure escalation: if primary on-call doesn’t respond in 5 minutes, page secondary; if no response in 15 minutes, escalate to manager
Automatic rollback after TTL Prevents long-lived incorrect automations Rolls back valid actions if analyst forgets to confirm Configure confirmation reminder alerts at TTL - 2 hours
AI cannot recommend tier-3 actions Prevents AI from creating false urgency for irreversible actions AI may be overly conservative Review AI recommendations and tier map quarterly; consider upgrading some tier-3 to tier-2 as confidence in the system grows

Failure Modes

Failure Symptom Detection Recovery
Approval timeout causes action to not execute Genuine attack continues; no response taken Analyst checks IR log; no action executed despite tier-2 recommendation Page a different on-call; manually execute; review approval timeout configuration
Rollback fires on a valid action A correctly-blocked IP is unblocked by automatic rollback Malicious IP resumes activity; alert re-fires Extend TTL before automatic rollback fires; require confirmation to keep actions beyond TTL
Cascading actions from alert flood Multiple tier-1 actions executed in response to 100 correlated alerts IR log shows 100 actions in 5 minutes; watchlist full Rate-limit automated action execution to N actions per minute; require human approval above threshold
AI misclassifies tier — recommends tier-3 as tier-2 Tier-3 action enters approval queue Safety check in execute_with_approval catches it; logs warning The TIER_MAP is the authoritative tier assignment; AI recommendation never overrides it