AI & Security Landscape Articles

AI security guides covering Claude for security, LLM threats, agent security, governance, compliance, jailbreak defence, and red teaming.

AI Security and Threat Landscape Guides

intermediate 14 min read

AI Agent Observability and Tracing: OpenTelemetry for Agent Runs and Tool Calls

An agent's run is a graph of model calls, tool invocations, and decisions. Observability that maps cleanly to that graph is the difference between debugging and guessing.

advanced 14 min read

AI Model Output Watermarking: Provenance for Generated Text and Code

SynthID, the Aaronson scheme, and lexical watermarks embed signatures in model output. Detection works statistically. None survives heavy editing — useful but bounded.

advanced 14 min read

Continuous AI Red-Teaming Pipelines: Automated Adversarial Testing in CI

Manual red-teaming finds gaps once. Continuous pipelines find regressions every model upgrade. The infrastructure exists; most teams haven't wired it up.

advanced 14 min read

Privacy-Preserving ML Inference: Differential Privacy, Confidential Computing, and Training Data Protection

ML inference leaks training data through membership inference, model inversion, and embedding attacks. Differential privacy, TEE-based inference, and output filtering bound the leakage.

intermediate 16 min read

C2PA Content Credentials: Cryptographic Provenance for AI-Generated Media in Production

Synthetic media is now indistinguishable from camera output. Content Credentials are the practical defense — signed manifests embedded in the file itself.

intermediate 14 min read

MCP Authentication Patterns: OAuth 2.1, Capability Tokens, and Per-Tool Authorization

MCP servers expose tool surfaces to LLM agents. The auth model decides what an agent can do — and most deployments leave it underspecified.

advanced 14 min read

Prompt Cache Security: Side-Channels, Poisoning, and Tenant Isolation in LLM Provider Caches

Provider-side prompt caching speeds up applications by 30-90% — and introduces a new attack surface with timing side-channels and poisoning vectors.

advanced 18 min read

Agent Memory Poisoning: Defending the Persistence Layer of Long-Running LLM Agents

Agents with long-term memory survive across sessions. Anything poisoned into that memory persists. A one-shot prompt injection becomes a permanent behavioural change.

advanced 26 min read

AI-Adaptive Malware: How Modern Payloads Change Behaviour Based on Their Environment and How to Defend Against Them

A modern virus is not the same as a virus from five years ago. AI-generated payloads observe their environment, profile the host, detect sandboxes, adapt their persistence mechanism to the OS they land on, and modify their C2 communication to blend with normal traffic. Every instance is unique. This article covers how adaptive malware works and the defensive controls that defeat it.

advanced 24 min read

Running AI-Powered Security Assessments on Your Own Infrastructure: Using Frontier Models Before Attackers Do

If Anthropic's Mythos can find your vulnerabilities, so can every attacker with API access. The only rational response is to find them first. This article covers how to run systematic AI-powered security assessments across your code, infrastructure-as-code, and runtime configuration.

intermediate 22 min read

Defending Against AI-Amplified Social Engineering: Phishing, Voice Cloning, and Deepfake Impersonation

Generative AI has eliminated every traditional indicator of phishing: perfect grammar, personalised context, cloned executive voices, and real-time video deepfakes. This article covers the defensive controls that work when human judgement alone cannot distinguish real from fake.

advanced 22 min read

Mythos and the Vulnerability Classes AI Finds First: Eliminating Your Highest-Risk Attack Surface

Frontier AI models like Anthropic's Mythos find vulnerability classes that traditional scanners miss: logic flaws, implicit trust, hardcoded secrets, configuration drift. The defensive response is not faster patching. It is eliminating these classes before they are discovered.

advanced 16 min read

Training Data Extraction Prevention: Stopping Models from Leaking Memorised Data

Large language models memorise portions of their training data. Given the right prompt, a model will reproduce training examples verbatim, including..

advanced 16 min read

Model Extraction Prevention: Detecting and Blocking Model Stealing Through API Queries

Model extraction (model stealing) is an attack where an adversary queries a production ML API systematically to reconstruct a functionally equivalent...

advanced 20 min read

Securing AI Agents in Production: Tool-Use Boundaries, Credential Scoping, and Output Verification

AI agents are being deployed with production tool access: shell execution, kubectl, terraform apply, database queries, API calls.

advanced 19 min read

Building an AI Governance Pipeline: Automated Checks from Training to Production

AI governance in most organisations is a manual process. A model is trained, someone writes a document, a committee meets, approvals are collected...

advanced 16 min read

AI Supply Chain Attack Surface: Models, Datasets, and Inference Dependencies

AI systems introduce a supply chain attack surface that traditional software security does not cover. The three new vectors are.

advanced 18 min read

EU AI Act Compliance for Infrastructure Teams: Risk Classification, Documentation, and Technical Controls

The EU AI Act entered into force in August 2024, with enforcement timelines staggered through 2027.

advanced 19 min read

MCP Tool Permission Patterns: Least Privilege, Approval Workflows, and Scope Boundaries

MCP servers expose tools that agents invoke. Without fine-grained permissions, every connected agent can call every tool. This article covers least privilege patterns, per-client allowlists, human approval gates, audit logging, multi-tenant isolation, and capability tokens.

advanced 22 min read

Claude for Application Security: Finding Logic Vulnerabilities in Source Code

Static application security testing (SAST) tools find pattern-based vulnerabilities effectively. Semgrep matches code against rules.

advanced 18 min read

Auditing AI Actions at Scale: Building Tamper-Proof Logs for Non-Human Actors

AI agents operate at machine speed, generating 10-100x the audit data of human operators.

advanced 18 min read

MCP Transport Security: Securing stdio, SSE, and HTTP Channels for Model Context Protocol

MCP supports three transport types: stdio, SSE, and HTTP. Each has distinct security characteristics. This article covers transport-level hardening for all three, including process isolation, TLS, mTLS, CORS, reverse proxy configuration, and rate limiting.

advanced 22 min read

Claude for Kubernetes Security Auditing: Finding Privilege Escalation Paths Scanners Cannot See

Kubernetes security scanners evaluate resources individually. Tools like kube-bench check node configurations against CIS benchmarks.

advanced 16 min read

LLM Jailbreak Defence: Detecting and Preventing System Prompt Bypasses in Production

LLM jailbreaks are inputs that cause a model to ignore its system prompt, safety training, or usage policies.

advanced 18 min read

Verifying AI Agent Output: Deterministic Checks, Human-in-the-Loop Gates, and Rollback Safety

AI agents generate infrastructure configurations, database migrations, deployment manifests, and shell commands. It passes a casual review.

advanced 18 min read

Securing MCP Servers: Authentication, Tool Sandboxing, and Input Validation for Model Context Protocol

The Model Context Protocol (MCP) gives AI agents structured access to tools: filesystem operations, database queries, API calls, shell commands.

intermediate 20 min read

Claude for Infrastructure-as-Code Security Review: Terraform, CloudFormation, and Pulumi

Infrastructure-as-Code scanners like Checkov, tflint, and cfn-lint enforce policy through pattern matching.

advanced 19 min read

LLM Prompt Security Patterns: System Prompt Protection, Input Sanitisation, and Context Isolation

LLM applications are vulnerable to prompt injection, system prompt leakage, and cross-user context contamination. This article covers system prompt hardening, input sanitisation, output filtering, and context isolation for multi-tenant deployments.

advanced 19 min read

Algorithmic Auditing: Testing AI Systems for Bias, Fairness, and Safety Before Deployment

AI systems make decisions that affect people: who gets approved for a loan, whose resume gets shortlisted, which content gets flagged, whose...

intermediate 18 min read

Claude, Mythos, and the Non-Human Infrastructure Consumer: Writing Hardening Guides for AI Agents

AI models are no longer just tools that engineers use to write code. They are becoming direct infrastructure consumers:

advanced 18 min read

Detecting AI-Generated Attacks: Moving from Signatures to Behavioural Baselines

Signature-based detection (WAF CRS rules, static Falco rules, antivirus signatures) matches "known bad." AI-generated attacks are polymorphic, every...

advanced 16 min read

Adversarial Attacks on Embeddings: Poisoning Vector Stores and Manipulating Semantic Search

Embedding-based retrieval powers RAG pipelines, semantic search, recommendation systems, and classification.

advanced 16 min read

AI-Powered Vulnerability Discovery: What Automated Code Analysis Means for Your Patch Cycle

AI models can now discover exploitable vulnerabilities in source code faster than human researchers.

advanced 18 min read

Agent-to-Agent Trust: Authentication, Delegation, and Capability Boundaries in Multi-Agent Systems

Multi-agent systems are moving from research demos to production deployments. A coordinator agent delegates tasks to specialist agents: one handles...

advanced 20 min read

Securing LLM Deployments: Model Loading, Runtime Isolation, and Inference Infrastructure

Deploying LLMs in production introduces infrastructure security challenges: model integrity verification, GPU isolation, runtime sandboxing, API authentication, and safe model updates. This article covers the full inference deployment security stack.

advanced 20 min read

The Threat Model Has Changed: Rewriting Security Assumptions for an AI-Augmented World

Every security architecture is built on assumptions about what attackers can do, how fast they can do it, and at what scale.

intermediate 16 min read

AI Model Cards in Production: Documenting Capabilities, Limitations, and Security Properties

Every production AI model has boundaries: input domains where it performs well, edge cases where it fails, and security properties that constrain how...

advanced 16 min read

Hardening the AI Control Plane: Kill Switches, Rate Limits, and Human-in-the-Loop Gates

AI agents with write access to production systems can execute 100+ infrastructure changes per minute.

advanced 20 min read

How AI Is Compressing the Attacker Timeline: What Defenders Need to Change Now

The gap between vulnerability disclosure and weaponised exploit used to be measured in weeks.

advanced 16 min read

Membership Inference Defence: Preventing Attackers from Determining Training Data Inclusion

Membership inference attacks determine whether a specific data record was used to train a model.

advanced 18 min read

Sandboxing AI Agent Tool Use: Filesystem, Network, and Process Isolation for Autonomous Actions

AI agents execute tool calls on real infrastructure: writing files, running shell commands, making HTTP requests, modifying databases.

intermediate 18 min read

Claude for Security Detection: How Large Language Models Find What Scanners Miss

Traditional security scanners operate on pattern matching. They check for known CVEs in dependency trees, match regex patterns for hardcoded secrets,...

intermediate 14 min read

Using AI to Harden Systems: Automated Configuration Review and Remediation

Manual security review of infrastructure-as-code takes 2-4 hours per pull request for complex changes.

advanced 18 min read

AI Credential Delegation: Short-Lived Tokens, Scope Narrowing, and Audit Trails for Agent Access

AI agents need credentials to do useful work: database passwords, API keys, Kubernetes service account tokens, cloud IAM roles.

advanced 18 min read

AI Incident Reporting: Detection, Classification, and Response Procedures for AI System Failures

Traditional incident response assumes failures are binary: the service is up or it is down, the response is correct or it throws an error.

intermediate 20 min read

Claude for Security Incident Triage: Rapid Analysis of Logs, Alerts, and Blast Radius

When a security alert fires at 2 AM, the on-call engineer faces an information overload problem.