AI & Security Landscape Articles
AI security guides covering Claude for security, LLM threats, agent security, governance, compliance, jailbreak defence, and red teaming.
AI Security and Threat Landscape Guides
AI Agent Observability and Tracing: OpenTelemetry for Agent Runs and Tool Calls
An agent's run is a graph of model calls, tool invocations, and decisions. Observability that maps cleanly to that graph is the difference between debugging and guessing.
AI Model Output Watermarking: Provenance for Generated Text and Code
SynthID, the Aaronson scheme, and lexical watermarks embed signatures in model output. Detection works statistically. None survives heavy editing — useful but bounded.
Continuous AI Red-Teaming Pipelines: Automated Adversarial Testing in CI
Manual red-teaming finds gaps once. Continuous pipelines find regressions every model upgrade. The infrastructure exists; most teams haven't wired it up.
Privacy-Preserving ML Inference: Differential Privacy, Confidential Computing, and Training Data Protection
ML inference leaks training data through membership inference, model inversion, and embedding attacks. Differential privacy, TEE-based inference, and output filtering bound the leakage.
C2PA Content Credentials: Cryptographic Provenance for AI-Generated Media in Production
Synthetic media is now indistinguishable from camera output. Content Credentials are the practical defense — signed manifests embedded in the file itself.
MCP Authentication Patterns: OAuth 2.1, Capability Tokens, and Per-Tool Authorization
MCP servers expose tool surfaces to LLM agents. The auth model decides what an agent can do — and most deployments leave it underspecified.
Prompt Cache Security: Side-Channels, Poisoning, and Tenant Isolation in LLM Provider Caches
Provider-side prompt caching speeds up applications by 30-90% — and introduces a new attack surface with timing side-channels and poisoning vectors.
Agent Memory Poisoning: Defending the Persistence Layer of Long-Running LLM Agents
Agents with long-term memory survive across sessions. Anything poisoned into that memory persists. A one-shot prompt injection becomes a permanent behavioural change.
AI-Adaptive Malware: How Modern Payloads Change Behaviour Based on Their Environment and How to Defend Against Them
A modern virus is not the same as a virus from five years ago. AI-generated payloads observe their environment, profile the host, detect sandboxes, adapt their persistence mechanism to the OS they land on, and modify their C2 communication to blend with normal traffic. Every instance is unique. This article covers how adaptive malware works and the defensive controls that defeat it.
Running AI-Powered Security Assessments on Your Own Infrastructure: Using Frontier Models Before Attackers Do
If Anthropic's Mythos can find your vulnerabilities, so can every attacker with API access. The only rational response is to find them first. This article covers how to run systematic AI-powered security assessments across your code, infrastructure-as-code, and runtime configuration.
Defending Against AI-Amplified Social Engineering: Phishing, Voice Cloning, and Deepfake Impersonation
Generative AI has eliminated every traditional indicator of phishing: perfect grammar, personalised context, cloned executive voices, and real-time video deepfakes. This article covers the defensive controls that work when human judgement alone cannot distinguish real from fake.
Mythos and the Vulnerability Classes AI Finds First: Eliminating Your Highest-Risk Attack Surface
Frontier AI models like Anthropic's Mythos find vulnerability classes that traditional scanners miss: logic flaws, implicit trust, hardcoded secrets, configuration drift. The defensive response is not faster patching. It is eliminating these classes before they are discovered.
Training Data Extraction Prevention: Stopping Models from Leaking Memorised Data
Large language models memorise portions of their training data. Given the right prompt, a model will reproduce training examples verbatim, including..
Model Extraction Prevention: Detecting and Blocking Model Stealing Through API Queries
Model extraction (model stealing) is an attack where an adversary queries a production ML API systematically to reconstruct a functionally equivalent...
Securing AI Agents in Production: Tool-Use Boundaries, Credential Scoping, and Output Verification
AI agents are being deployed with production tool access: shell execution, kubectl, terraform apply, database queries, API calls.
Building an AI Governance Pipeline: Automated Checks from Training to Production
AI governance in most organisations is a manual process. A model is trained, someone writes a document, a committee meets, approvals are collected...
AI Supply Chain Attack Surface: Models, Datasets, and Inference Dependencies
AI systems introduce a supply chain attack surface that traditional software security does not cover. The three new vectors are.
EU AI Act Compliance for Infrastructure Teams: Risk Classification, Documentation, and Technical Controls
The EU AI Act entered into force in August 2024, with enforcement timelines staggered through 2027.
MCP Tool Permission Patterns: Least Privilege, Approval Workflows, and Scope Boundaries
MCP servers expose tools that agents invoke. Without fine-grained permissions, every connected agent can call every tool. This article covers least privilege patterns, per-client allowlists, human approval gates, audit logging, multi-tenant isolation, and capability tokens.
Claude for Application Security: Finding Logic Vulnerabilities in Source Code
Static application security testing (SAST) tools find pattern-based vulnerabilities effectively. Semgrep matches code against rules.
Auditing AI Actions at Scale: Building Tamper-Proof Logs for Non-Human Actors
AI agents operate at machine speed, generating 10-100x the audit data of human operators.
MCP Transport Security: Securing stdio, SSE, and HTTP Channels for Model Context Protocol
MCP supports three transport types: stdio, SSE, and HTTP. Each has distinct security characteristics. This article covers transport-level hardening for all three, including process isolation, TLS, mTLS, CORS, reverse proxy configuration, and rate limiting.
Claude for Kubernetes Security Auditing: Finding Privilege Escalation Paths Scanners Cannot See
Kubernetes security scanners evaluate resources individually. Tools like kube-bench check node configurations against CIS benchmarks.
LLM Jailbreak Defence: Detecting and Preventing System Prompt Bypasses in Production
LLM jailbreaks are inputs that cause a model to ignore its system prompt, safety training, or usage policies.
Verifying AI Agent Output: Deterministic Checks, Human-in-the-Loop Gates, and Rollback Safety
AI agents generate infrastructure configurations, database migrations, deployment manifests, and shell commands. It passes a casual review.
Securing MCP Servers: Authentication, Tool Sandboxing, and Input Validation for Model Context Protocol
The Model Context Protocol (MCP) gives AI agents structured access to tools: filesystem operations, database queries, API calls, shell commands.
Claude for Infrastructure-as-Code Security Review: Terraform, CloudFormation, and Pulumi
Infrastructure-as-Code scanners like Checkov, tflint, and cfn-lint enforce policy through pattern matching.
LLM Prompt Security Patterns: System Prompt Protection, Input Sanitisation, and Context Isolation
LLM applications are vulnerable to prompt injection, system prompt leakage, and cross-user context contamination. This article covers system prompt hardening, input sanitisation, output filtering, and context isolation for multi-tenant deployments.
Algorithmic Auditing: Testing AI Systems for Bias, Fairness, and Safety Before Deployment
AI systems make decisions that affect people: who gets approved for a loan, whose resume gets shortlisted, which content gets flagged, whose...
Claude, Mythos, and the Non-Human Infrastructure Consumer: Writing Hardening Guides for AI Agents
AI models are no longer just tools that engineers use to write code. They are becoming direct infrastructure consumers:
Detecting AI-Generated Attacks: Moving from Signatures to Behavioural Baselines
Signature-based detection (WAF CRS rules, static Falco rules, antivirus signatures) matches "known bad." AI-generated attacks are polymorphic, every...
Adversarial Attacks on Embeddings: Poisoning Vector Stores and Manipulating Semantic Search
Embedding-based retrieval powers RAG pipelines, semantic search, recommendation systems, and classification.
AI-Powered Vulnerability Discovery: What Automated Code Analysis Means for Your Patch Cycle
AI models can now discover exploitable vulnerabilities in source code faster than human researchers.
Agent-to-Agent Trust: Authentication, Delegation, and Capability Boundaries in Multi-Agent Systems
Multi-agent systems are moving from research demos to production deployments. A coordinator agent delegates tasks to specialist agents: one handles...
Securing LLM Deployments: Model Loading, Runtime Isolation, and Inference Infrastructure
Deploying LLMs in production introduces infrastructure security challenges: model integrity verification, GPU isolation, runtime sandboxing, API authentication, and safe model updates. This article covers the full inference deployment security stack.
The Threat Model Has Changed: Rewriting Security Assumptions for an AI-Augmented World
Every security architecture is built on assumptions about what attackers can do, how fast they can do it, and at what scale.
AI Model Cards in Production: Documenting Capabilities, Limitations, and Security Properties
Every production AI model has boundaries: input domains where it performs well, edge cases where it fails, and security properties that constrain how...
Hardening the AI Control Plane: Kill Switches, Rate Limits, and Human-in-the-Loop Gates
AI agents with write access to production systems can execute 100+ infrastructure changes per minute.
How AI Is Compressing the Attacker Timeline: What Defenders Need to Change Now
The gap between vulnerability disclosure and weaponised exploit used to be measured in weeks.
Membership Inference Defence: Preventing Attackers from Determining Training Data Inclusion
Membership inference attacks determine whether a specific data record was used to train a model.
Sandboxing AI Agent Tool Use: Filesystem, Network, and Process Isolation for Autonomous Actions
AI agents execute tool calls on real infrastructure: writing files, running shell commands, making HTTP requests, modifying databases.
Claude for Security Detection: How Large Language Models Find What Scanners Miss
Traditional security scanners operate on pattern matching. They check for known CVEs in dependency trees, match regex patterns for hardcoded secrets,...
Using AI to Harden Systems: Automated Configuration Review and Remediation
Manual security review of infrastructure-as-code takes 2-4 hours per pull request for complex changes.
AI Credential Delegation: Short-Lived Tokens, Scope Narrowing, and Audit Trails for Agent Access
AI agents need credentials to do useful work: database passwords, API keys, Kubernetes service account tokens, cloud IAM roles.
AI Incident Reporting: Detection, Classification, and Response Procedures for AI System Failures
Traditional incident response assumes failures are binary: the service is up or it is down, the response is correct or it throws an error.
Claude for Security Incident Triage: Rapid Analysis of Logs, Alerts, and Blast Radius
When a security alert fires at 2 AM, the on-call engineer faces an information overload problem.