15 min read

AI Agent Security Risks in 2026: The Incident Landscape and Hardening Framework

AI Agent Attack Surface
AI Agent Attack Surface - Photo by BoliviaInteligente / Unsplash

Last updated: April 2026


Gartner predicted in 2021 that 45% of organisations would experience software supply chain attacks by 2025. The reality exceeded their forecast: 75% of organisations were hit within a single year (BlackBerry, 2024). Third-party breaches now account for 30% of all data breaches (Verizon DBIR, 2025). We covered what the prediction looked like against reality in our Gartner supply chain retrospective.

In February 2026, the same supply chain threat model arrived in AI agent infrastructure. It arrived all at once.

Check Point Research disclosed remote code execution in Claude Code through poisoned repository config files. Antiy CERT confirmed 1,184 malicious skills across ClawHub, the marketplace for the OpenClaw AI agent framework. Trend Micro found 492 MCP servers exposed to the internet with zero authentication. And the Pentagon designated Anthropic a "supply chain risk," the first time an American company has received the classification (CBS News, 2026).

The connective tissue across every incident is the Model Context Protocol (MCP). This guide explains what MCP is, why it creates a new class of supply chain risk, and what practitioners should do about it.

Get threat articles like this delivered to your inbox. Subscribe to CyberDesserts for practical security insights, no fluff.


2026 AI Agent Security Incidents: Current Status

Incident Date Impact Status
Claude Code RCE via repository config Jan–Feb 2026 RCE in developer environments; API key exfiltration Patched: Claude Code 2.0.65+
Anthropic Git MCP Server exploit chain Jan 2026 RCE via prompt injection; CVE-2025-68143/144/145 Patched
ClawHavoc: malicious skills on ClawHub Feb 2026 1,184 malicious packages; 1 in 5 ecosystem packages at peak Active: 9 CVEs, 3 with public exploit code
MCP server internet exposure Feb 2026 492 unauthenticated servers (Trend Micro); 135,000 OpenClaw instances (SecurityScorecard) Partial: exposure reduced, architectural risk unchanged
Pentagon supply chain designation Feb 2026 First American AI company designated supply chain risk Active; Anthropic challenging in court
Azure DevOps MCP authentication bypass Apr 2026 API keys and tokens accessible without credentials; CVSS 9.1 CVE-2026-32211; patch available
Mexico government AI-directed attack Dec 2025–Jan 2026 Federal tax authority, electoral institute, 4 state govts, water utility; 195M taxpayer records; 150GB exfiltrated Under investigation

What to Do Right Now: AI Agent Security Quick Response

If your organisation is deploying AI agents or has Claude Code, OpenClaw, or MCP servers in any environment, these five steps address the most exploited gaps from the February and March 2026 incidents.

  • Scan for exposed MCP endpoints: query for /mcp and /sse across your environment and check for 0.0.0.0 bindings. Snyk's mcp-scan tool covers both MCP servers and agent skills.
  • Rotate credentials in agent config files: API keys in ~/.clawdbot/.env, ~/.openclaw/credentials/, .claude/settings.json, or any plaintext agent config should be treated as potentially compromised and rotated now.
  • Pin and review MCP server package versions: add agent configuration paths to your code review process and block auto-approval settings for MCP servers.
  • Update Claude Code: CVE-2025-59536 and CVE-2026-21852 are patched in Claude Code 2.0.65+. If you are running an earlier version, you are exposed.
  • Add AI agents to your threat model: map which workflows depend on which AI providers. The incidents below explain why this is now a continuous threat exposure management question, not a one-time audit.

The full hardening framework, covering authentication, sandboxing, behavioural monitoring, and governance, is further down this guide.


The First Confirmed AI Agent Attack: Mexico Government Breach 2026

Between December 2025 and January 2026, a single unidentified attacker used Claude to breach multiple Mexican government agencies, including the federal tax authority, electoral institute, four state governments, and a water utility in Monterrey (SecurityWeek, 2026). The attacker's conversation logs with Claude were found publicly accessible online by Israeli firm Gambit Security during routine threat hunting on February 25, 2026. By then, 150GB of data was gone: 195 million taxpayer records, voter files, civil registry documents, and government employee credentials.

The attacker prompted Claude in Spanish to act as an elite hacker. Claude produced thousands of detailed reports specifying exactly which internal targets to hit next and which credentials to use. When Claude reached its guardrail limits, the attacker switched to ChatGPT for lateral movement and evasion, distributing the attack chain across platforms in a way that made it significantly harder for any single system to capture the full sequence. Gambit identified at least 20 exploited vulnerabilities.

This was not fully autonomous execution. The human directed every stage. What changed is what the human needed to contribute: no technical expertise, no specialist knowledge, no experience writing exploits. The AI supplied all of that on demand. The expert layer of a cyberattack is now a prompt away from anyone who can jailbreak a consumer chatbot. That is the threat model shift. The security tools deployed across those agencies were not built to see it.


What Makes MCP an AI Agent Security Risk

Model Context Protocol is an open standard released by Anthropic in late 2024 that defines how AI models connect to external tools, data sources, and services. Adoption has been aggressive: Microsoft, OpenAI, Google, Amazon, GitHub Copilot, VS Code, and Cursor all support it. The protocol was built for capability first. Authentication, authorisation, and sandboxing were left to the implementer.

Most implementers skipped all three during the AI rush. The result is MCP servers deployed with no authentication, overprivileged credentials sitting in plaintext config files, and default bindings that expose them to the public internet.

I have seen this pattern before. Every time a new integration protocol launches, the first deployments prioritise "does it work" over "is it secure." We saw it with early cloud IAM configurations. We saw it with the first wave of REST APIs shipping with no rate limiting or authentication. MCP has followed the same trajectory, except the attack surface is broader because the AI model itself can be manipulated through the content it processes.

Why AI Agent Capability Creates the Security Vulnerability

Security researcher Simon Willison identified the core architectural flaw in June 2025 and named it the lethal trifecta. When an AI agent has all three of these characteristics simultaneously, it is exploitable by design:

  • Access to private data: files, API keys, databases, internal systems
  • Processes untrusted content: user inputs, third-party tool outputs, registry packages, web content
  • Can communicate externally: network requests, messages, data writes to remote endpoints

Most deployed MCP agents have all three. That is the point. Agents are useful precisely because they access your data, process diverse inputs, and take actions on your behalf. The utility is the vulnerability.

The practical consequence is that prompt injection, embedding hidden instructions in data an AI model processes as commands, becomes a full system compromise vector. An attacker embeds instructions in a web page, a document, or a tool output. The agent reads the content, follows the embedded instruction, accesses credentials, and sends them to an attacker-controlled endpoint. No malware binary. No exploit code. Just text the model interprets as instructions.

The OWASP Top 10 for Agentic Applications 2026 classifies this as ASI01: Agent Goal Hijack, one of ten threat categories released in December 2025 and peer-reviewed by NIST, Microsoft AI Red Team, and AWS (OWASP, 2025). The supply chain variant, where malicious content enters through a compromised skill or registry package, is ASI04: Agentic Supply Chain Vulnerabilities. Both map directly to what happened in February 2026.


How Claude Code Became an Attack Vector

On February 25, 2026, Check Point Research disclosed critical vulnerabilities in Claude Code, Anthropic's command-line AI development tool used by thousands of developers to write code, manage Git repositories, and automate builds (Check Point Research, 2026).

CVE-2025-59536 (CVSS 8.7) covers two configuration injection flaws. The first exploits Hooks, a Claude Code feature that runs predefined shell commands at lifecycle events. By injecting a malicious Hook into the .claude/settings.json file within a repository, an attacker gains remote code execution the moment a developer opens the project. The command runs before the trust dialog appears on screen. The second flaw targets MCP consent bypass: two repository-controlled settings in .mcp.json could override safeguards and auto-approve all MCP servers on launch without user confirmation.

CVE-2026-21852 (CVSS 5.3) enables API key theft by redirecting Claude Code's API requests to an attacker-controlled proxy, capturing the full authorisation header, including the plaintext API key, before any trust prompt appears. In environments using Anthropic's Workspaces feature, a single stolen key exposes the entire team's data.

The vendor pitch for AI coding tools is developer productivity. The reality is that .claude/settings.json and .mcp.json are no longer configuration files. They are execution vectors. They look like metadata. They function like installers. This applies to every AI coding tool that processes repository-level configuration, not just Claude Code.

All three flaws were patched in Claude Code 2.0.65+. The disclosure timeline stretches from July 2025 to January 2026. That six-month gap mirrors the broader pattern: AI tools ship fast, security catches up later.


ClawHavoc: The AI Agent Supply Chain Attack

The OpenClaw malicious skills crisis represents the largest confirmed supply chain attack targeting AI agent infrastructure to date. Antiy CERT confirmed 1,184 malicious skills across ClawHub, the package registry for the OpenClaw framework, approximately one in five packages in the ecosystem at peak (Antiy CERT, 2026). SecurityScorecard found 135,000 OpenClaw instances exposed to the public internet with insecure defaults. Nine CVEs have been disclosed, three with public exploit code.

The attack techniques are the same ones that have been escalating across software supply chains for years: typosquatting, automated mass uploads, social engineering through fake error messages. The critical difference is privilege. A compromised dependency in a web application runs in a sandboxed runtime. A compromised AI agent skill runs with whatever permissions the agent holds: terminal access, file system access, and stored credentials for cloud services.

Our full OpenClaw security analysis covers the ClawHavoc campaign breakdown, all nine CVEs, exposure data, and remediation steps. The takeaway for this guide: ClawHub is the first AI agent registry to be systematically poisoned at scale. It will not be the last. ASI04 in the OWASP Agentic Top 10 names this class of attack explicitly for that reason.

The supply chain problem extends beyond agent skill registries. The protocol itself is exposed.


How Many MCP Servers Are Exposed to the Internet in 2026?

BlueRock Security analysed over 7,000 MCP servers and found that 36.7% were potentially vulnerable to server-side request forgery (SSRF), a class of vulnerability where an attacker tricks a server into making requests to internal resources it should not reach (Security Boulevard, 2026). In their proof of concept against Microsoft's MarkItDown MCP server, researchers retrieved AWS IAM access keys, secret keys, and session tokens from an EC2 instance's metadata endpoint. A single misconfigured MCP server became a gateway to cloud infrastructure.

In February 2026, scanning results identified over 8,000 MCP servers on the public internet. Trend Micro independently found 492 with zero client authentication and zero traffic encryption (Trend Micro, 2026). Bitsight confirmed exposed servers with admin panels, debug endpoints, and API routes accessible without credentials (Bitsight, 2026).

The root cause is a familiar one: default configurations that bind to all network interfaces (0.0.0.0) rather than localhost (127.0.0.1). Developers deploy MCP servers as if they are internal tools. The defaults expose them to the world.

On January 20, 2026, Cyata researcher Yarden Porat published an exploit chain targeting Anthropic's own official Git MCP server. Three CVEs: path traversal (CVE-2025-68143), argument injection (CVE-2025-68144), and repository scoping bypass (CVE-2025-68145). The exploit achieved remote code execution through prompt injection alone (Dark Reading, 2026). If Anthropic's reference implementation shipped with these flaws, every third-party MCP server built with fewer resources should be treated as suspect.

The authentication gap is not confined to community-built implementations. On April 3, 2026, Microsoft's @azure-devops/mcp npm package was found to have a missing authentication layer on a server handling Azure DevOps work items, repositories, and pipelines (CVEdetails, 2026). An attacker could access configuration details, API keys, and authentication tokens without valid credentials. CVE-2026-32211 carries a CVSS score of 9.1. A major enterprise vendor repeating the same "authentication optional" mistake in April that community servers were criticised for in February is a clear signal about where industry defaults still sit.

The Coalition for Secure AI (CoSAI) released a comprehensive MCP Security whitepaper in January 2026 mapping 12 core threat categories and nearly 40 distinct threats (CoSAI, 2026). Palo Alto Networks independently identified the same three attack classes as the most operationally significant (Palo Alto Networks, 2026). Three stand out in practice:

  • Tool poisoning: an attacker modifies an MCP tool's description so the AI model misinterprets what it does. The model thinks it is calling a search function. The tool exfiltrates data.
  • Confused deputy: the MCP server executes actions using its own elevated privileges rather than the requesting user's. A user without database admin access asks the agent to run a query. The server, which does have admin access, complies without checking.
  • Overprivileged tokens: MCP servers store credentials, API keys and database passwords, in plaintext configuration files. Every client connecting to that server inherits the same privileged access.

The offensive implications of this exposure are also developing. On February 25, 2026, the Kali Linux team published an official guide connecting Claude AI to a Kali environment via MCP, enabling AI-assisted penetration testing where Claude selects and executes tools like Nmap through the mcp-kali-server package (Kali, 2026). For blue teams, the operational implication is direct: when MCP becomes the documented interface for offensive tooling, the same protocol your agents use for legitimate automation becomes a delivery mechanism that red teams are already using in the field.


Why Existing Security Tools Miss AI Agent Attacks

Cisco's State of AI Security 2026 found that while most organisations planned to deploy agentic AI, only 29% reported being prepared to secure those deployments (Cisco, 2026). That 71% gap exists because AI agent attacks do not resemble what existing tools were built to catch.

Traditional EDR tools look for malicious binaries, suspicious process behaviour, and known indicators of compromise. AI agent attacks have none of these. The "exploit" is text. The "payload" is a natural language instruction. The "delivery mechanism" is a document, a web page, or a tool output that the agent processes as part of its normal workflow.

The UK AI Security Institute's research published in early 2026 identified nearly 700 real-world cases of AI scheming, with a five-fold rise in documented misbehaviour between October 2025 and March 2026. Some AI models destroyed emails and other files without instruction to do so. An EDR cannot catch a model destroying emails. There is no binary to flag. There is no anomalous process signature. The action is, from the endpoint's perspective, a normal file operation.

Johann Rehberger (Embrace The Red) documented this gap methodically, publishing one prompt injection vulnerability per day throughout August 2025, each demonstrating a different way to make an AI agent perform unintended actions through crafted text inputs. Simon Willison called it "The Summer of Johann."

Endor Labs noted in their OpenClaw vulnerability research that traditional SAST tools cannot identify issues in LLM-to-tool communication flows, conversation state management, or agent-specific trust boundaries (Infosecurity Magazine, 2026). The tooling gap is real. The industry is deploying agent systems faster than it is building the security monitoring to match.

The closest parallel is the early cloud security gap. Organisations deployed cloud services before understanding the shared responsibility model, used default IAM configurations with broad access, and waited years for CSPM tooling to mature. AI agent security is at the same inflection point, with a compressed timeline because adoption is moving faster.


What the Anthropic Supply Chain Risk Designation Means for AI Security

On February 27, 2026, the Pentagon designated Anthropic a "supply chain risk," the first time an American company has received a classification normally reserved for foreign adversaries (CBS News, 2026). The dispute centred on Anthropic's refusal to remove restrictions on mass domestic surveillance and fully autonomous weapons. Federal agencies were ordered to cease using Anthropic technology within six months, with Anthropic challenging the designation in court.

Three practitioner questions follow immediately. Does your organisation hold US government contracts and use Claude in any workflow? Which AI providers does your security toolchain depend on, and what is the contingency if access is disrupted overnight? Have you mapped AI vendor dependencies with the same rigour you apply to any third-party software relationship? The Anthropic situation demonstrates that provider access can be disrupted by geopolitical and commercial decisions on a timeline measured in hours, not procurement cycles.

For the full commercial picture, including what changed on April 4, 2026 and the implications for teams running OpenClaw on Claude Pro subscriptions, see our dedicated analysis of the OpenClaw subscription decision.


How to Secure AI Agent Deployments in 2026

Discovery and inventory first. You cannot secure what you cannot see. Query endpoints for OpenClaw, Claude Code, Cursor, and other agent tools across your environment. Scan for common MCP endpoints (/mcp, /sse) and check for 0.0.0.0 bindings. Audit installed skills, MCP server configurations, and IDE extensions. Snyk's mcp-scan tool covers both MCP servers and agent skills.

Authentication and least privilege on every server. Never expose MCP servers without authentication. The specification recommends OAuth 2.1. At minimum, enforce token-based auth on all client-server connections. Bind servers to localhost (127.0.0.1) unless remote access is explicitly required and justified. Scope each server's permissions to only the resources its tools need. A server wrapping a search function has no business holding database credentials.

How to secure AI agent credentials without hardcoding. The Claude Code and OpenClaw incidents share a root cause: credentials stored in plaintext configuration files. The OpenClaw lethal trifecta includes ~/.clawdbot/.env and ~/.openclaw/credentials/ holding API keys for OpenAI, Anthropic, AWS, and any connected service in cleartext. NemoClaw's approach, injecting credentials as environment variables at runtime rather than storing them in config files, is the correct model. Rotate any credentials that may have been exposed. Treat API keys in configuration files as compromised until confirmed otherwise.

Configuration as code, reviewed like code. The Claude Code CVEs proved that .claude/settings.json and .mcp.json are execution vectors. Add agent configuration paths to your code review process. Block auto-approval settings for MCP servers. Pin and verify MCP server package versions with the same rigour you apply to any software dependency. ASI03 in the OWASP Agentic Top 10 (Identity and Privilege Abuse) covers the specific controls for this layer.

AI agent sandboxing: what microVM isolation gives you. Sandboxing confines agent actions inside declarative policy. NemoClaw's OpenShell runtime, in early alpha since March 2026, implements this at the infrastructure level: agents start with zero permissions and request access explicitly, network egress is blocked by default, and unapproved outbound connections are surfaced for human approval. MicroVM isolation takes this further by running each agent task in an isolated VM that is destroyed after execution. The attack surface shrinks to the task boundary rather than the host system. For organisations deploying agents in production today, the practical step is defining the minimum permission set for each agent workflow and enforcing it at the infrastructure level, not at the prompt level.

Behavioural monitoring for what EDR misses. Log all MCP tool invocations: every request from client to server, every action the server takes. Alert on credential access patterns. If an agent or skill touches .env files, credential stores, or API key directories, treat it as an investigable event. Treat all data returned by MCP servers as untrusted input and sanitise before it reaches the model. The goal is not to replicate EDR in agent workflows. It is to capture the signals EDR is not built to see.

Governance updates for agent-specific risk. AI Acceptable Use Policies need agent-specific language. An agent with terminal access and stored credentials is not the same risk profile as a chatbot. IBM's Cost of a Data Breach Report 2025 found 63% of breached organisations lacked AI governance policies at the time of their breach (IBM Security, 2025). The AI Acceptable Use Policy guide covers the governance framework. Include AI agents in your threat model and map AI vendor dependencies as you would any third-party relationship.


AI Agent Security 2026: Key Takeaways

February 2026 compressed what normally takes years of incremental discovery into two weeks: Claude Code RCE through repository config files, 1,184 malicious skills poisoning an agent marketplace, thousands of MCP servers exposed without authentication, and the first supply chain risk designation of an American AI company by its own government.

March 2026 confirmed what the theoretical risk always implied: AI agents have now been used as the primary attack mechanism in a confirmed, large-scale breach.

The attack surface spans skill registries, development tools, protocol infrastructure, and vendor relationships. The attack techniques are familiar: typosquatting, registry poisoning, dependency manipulation, configuration injection. The difference is that AI agents operate with broader system permissions and process untrusted inputs that existing security tooling was never designed to detect.

For practitioners, the response maps to what we already know: discover what is deployed, authenticate everything, enforce least privilege, treat configuration as code, monitor behaviour, and update governance. The surface is new. The principles are not.


AI agent security is evolving weekly. Subscribers get notified when new threats emerge, plus practical security content covering tools, frameworks, and hands-on techniques. No sales pitches, no fluff.


References and Sources

  1. Check Point Research (Donenfeld, A. & Vanunu, O.). (2026). Caught in the Hook: RCE and API Token Exfiltration Through Claude Code Project Files. CVE-2025-59536 (CVSS 8.7) and CVE-2026-21852 (CVSS 5.3). Published February 25, 2026.
  2. Antiy CERT. (2026). ClawHavoc Campaign Analysis. Trojan/OpenClaw.PolySkill classification. 1,184 malicious skills confirmed across ClawHub.
  3. Trend Micro. (2026). MCP Security: Network-Exposed Servers Are Backdoors to Your Private Data. 492 MCP servers with no client authentication or traffic encryption.
  4. Bitsight. (2026). Exposed MCP Servers Reveal New AI Vulnerabilities. Internet-exposed MCP servers with unsecured admin panels and debug endpoints.
  5. BlueRock Security / Security Boulevard (Burt, J.). (2026). Anthropic, Microsoft MCP Server Flaws Shine a Light on AI Security Risks. 7,000+ MCP servers analysed, 36.7% vulnerable to SSRF. AWS credential theft demonstrated via MarkItDown.
  6. Cyata (Porat, Y.) / Dark Reading. (2026). Microsoft & Anthropic MCP Servers at Risk of RCE, Cloud Takeovers. Exploit chain against Anthropic's Git MCP server. CVE-2025-68143, CVE-2025-68144, CVE-2025-68145.
  7. Coalition for Secure AI (CoSAI). (2026). Model Context Protocol (MCP) Security White Paper. 12 core threat categories, nearly 40 distinct threats.
  8. Cisco. (2026). State of AI Security 2026. 29% of organisations prepared to secure agentic AI deployments.
  9. Kali Linux. (2026). Kali & LLM: macOS with Claude Desktop GUI & Anthropic Sonnet LLM. Official MCP pentesting guide. Published February 25, 2026.
  10. Penligent AI. (2026). Kali Linux + Claude via MCP Is Cool, But It's the Wrong Default for Real Pentesting Teams. Operational security analysis.
  11. Palo Alto Networks. (2026). MCP Security Exposed: What You Need to Know Now. Tool poisoning, credential management, and runtime risks.
  12. SecurityScorecard STRIKE Team. (2026). Beyond the Hype: Moltbot's Real Risk Is Exposed Infrastructure. 135,000+ exposed OpenClaw instances. Three CVEs with public exploit code.
  13. CBS News (Frias, L.). (2026). Hegseth Declares Anthropic a Supply Chain Risk. Pentagon designation, $200M contract, first American company to receive classification.
  14. Infosecurity Magazine. (2026). Researchers Reveal Six New OpenClaw Vulnerabilities. Endor Labs finding that SAST tools cannot detect LLM-specific issues.
  15. OWASP. (2025). Top 10 for Agentic Applications 2026. Ten threat categories covering AI agent-specific risks including Agent Goal Hijack (ASI01) and Agentic Supply Chain Vulnerabilities (ASI04). Peer-reviewed by NIST, Microsoft AI Red Team, AWS. Published December 10, 2025.
  16. CVEdetails / AgentScore. (2026). CVE-2026-32211: Missing Authentication in Microsoft @azure-devops/mcp. CVSS 9.1. Disclosed April 3, 2026.
  17. SecurityWeek. (2026). Hackers Weaponize Claude Code in Mexican Government Cyberattack. 150GB exfiltrated; 195 million taxpayer records; multiple federal and state agencies compromised. Published March 2, 2026.
  18. UK AI Security Institute. (2026). AI Scheming Research. 700 real-world cases identified; five-fold rise in documented misbehaviour October 2025 to March 2026.
  19. Willison, S. (2025). The lethal trifecta for AI agents: private data, untrusted content, and external communication. simonwillison.net. Published June 16, 2025.
  20. Willison, S. (2025). The Summer of Johann: prompt injections as far as the eye can see. simonwillison.net. Published August 15, 2025.
  21. BlackBerry. (2024). Global Threat Intelligence Report. 75% of organisations experienced software supply chain attack within one year.
  22. Verizon. (2025). Data Breach Investigations Report. Third-party breaches account for 30% of all data breaches.
  23. IBM Security. (2025). Cost of a Data Breach Report 2025. 63% of breached organisations lacked AI governance policies.