14 min read

AI Agent Security Risks in 2026: A Practitioner's Guide

AI Agent Attack Surface
AI Agent Attack Surface - Photo by BoliviaInteligente / Unsplash

Gartner predicted in 2021 that 45% of organisations would experience software supply chain attacks by 2025. The reality exceeded their forecast: 75% of organisations were hit within a single year (BlackBerry, 2024). Third-party breaches now account for 30% of all data breaches (Verizon DBIR, 2025).

In February 2026, the same supply chain threat model arrived in AI agent infrastructure, and it arrived all at once.

Check Point Research disclosed remote code execution in Claude Code through poisoned repository config files (Check Point Research, 2026). Antiy CERT confirmed 1,184 malicious skills across ClawHub, the marketplace for the OpenClaw AI agent framework (Antiy CERT, 2026). Trend Micro found 492 MCP servers exposed to the internet with zero authentication (Trend Micro, 2026). Kali Linux shipped an official AI-assisted pentesting workflow through the same protocol (Kali, 2026). And the Pentagon designated Anthropic a "supply chain risk," the first time an American company has received the classification (CBS News, 2026).

The connective tissue across every incident is the Model Context Protocol (MCP). This guide explains what MCP is, why it creates a new class of supply chain risk, and what practitioners should do about it.

Get threat articles like this delivered to your inbox. Subscribe to CyberDesserts for practical security insights, no fluff.

Subscribe

What Is MCP and Why Is It a Security Problem?

Model Context Protocol (MCP) is an open standard released by Anthropic in late 2024 that defines how AI models connect to external tools, data sources, and services. It is, in practical terms, a universal connector for AI agents: one protocol, many tools.

MCP uses a client-server architecture. The client sits inside a host application, typically an AI assistant like Claude Desktop, an IDE like Cursor, or a coding tool like Claude Code. The client tells the AI model what tools are available. The model decides which tool to use. The request goes to an MCP server, a lightweight program that wraps a specific capability: running terminal commands, querying a database, scanning a network, accessing a file system. The server executes the action and returns the result.

Adoption has been aggressive. Microsoft, OpenAI, Google, Amazon, and dozens of development tools now support MCP. GitHub Copilot, VS Code, Cursor, Autogen, and LangChain all use it. Deployments span financial services, healthcare, and customer support.

The security problem is architectural. Anthropic designed MCP for capability first and left authentication, authorisation, and sandboxing to the implementer. Most implementers skipped all three. The result: MCP servers deployed with no authentication, overprivileged credentials stored in plaintext, and default bindings that expose them to the public internet. Uma Reddy, founder of Uptycs, described the situation plainly: connecting an LLM directly to internal systems without guardrails is leaving your digital front door open (Security Boulevard, 2026).

I have seen this pattern before. Every time a new integration protocol launches, the first deployments prioritise "does it work" over "is it secure." We saw it with early cloud IAM configurations. We saw it with the first wave of REST APIs with no rate limiting or authentication. MCP is following the same trajectory, except the attack surface is broader because the AI model itself can be manipulated through the data it processes.

That manipulation has a name.

What Is the Lethal Trifecta for AI Agents?

Security researcher Simon Willison identified a structural problem with AI agent architectures in June 2025 that applies to every MCP deployment. He calls it the "lethal trifecta." When an AI agent has all three of these characteristics simultaneously, it is exploitable by design:

It has access to private data. The agent reads files, retrieves API keys, queries databases, or connects to internal systems.

It processes untrusted content. The agent handles inputs from sources outside the operator's control: user prompts, third-party tool outputs, web content, or installed skills from community registries.

It can communicate externally. The agent makes network requests, sends messages, or writes data to endpoints beyond the local system.

Most deployed MCP agents have all three. That is the point. Agents are useful precisely because they access your data, process diverse inputs, and take actions on your behalf. The vulnerability is the value proposition.

The practical consequence is that prompt injection, the technique of embedding hidden instructions in data that an AI model processes as commands, becomes a full system compromise vector. An attacker embeds instructions in a web page, a document, or a tool's output. The agent reads the content, follows the embedded instruction, accesses your credentials, and sends them to an attacker-controlled endpoint. No malware binary. No exploit code. Just text the model interprets as instructions.

This is why the threat requires a different detection approach. Endpoint security looks for malicious binaries. Network monitoring looks for known command-and-control patterns. Neither catches a natural language instruction hidden in otherwise legitimate content.

With that architectural context, here is what happened in February 2026.

How Claude Code Became an Attack Vector

On February 25, 2026, Check Point Research disclosed critical vulnerabilities in Claude Code, Anthropic's command-line AI development tool used by thousands of developers to write code, manage Git repositories, and automate builds (Check Point Research, 2026).

CVE-2025-59536 (CVSS 8.7) covers two configuration injection flaws. The first exploits Hooks, a Claude Code feature that runs predefined shell commands at specific lifecycle events (before sending a message, after receiving a response). By injecting a malicious Hook into the .claude/settings.json file within a repository, an attacker gains remote code execution the moment a developer opens the project. The command runs before the trust dialog appears on screen.

The second flaw targets MCP consent bypass. Claude Code uses .mcp.json to configure which MCP servers a project connects to. That file is version-controlled. Check Point found that two repository-controlled settings could override safeguards and auto-approve all MCP servers, triggering execution on launch without user confirmation.

CVE-2026-21852 (CVSS 5.3) enables API key theft. Claude Code communicates with Anthropic's cloud services using an API key transmitted in every request. The ANTHROPIC_BASE_URL environment variable controls where those requests go. It can be overridden in the project configuration. By redirecting it to a proxy, an attacker captures the full authorisation header, including the plaintext API key, before the user ever sees a trust prompt.

In environments using Anthropic's Workspaces feature, where multiple API keys share access to cloud-stored project files, a single stolen key exposes the entire team's data.

The vendor pitch for AI coding tools is developer productivity. The reality is that .claude/settings.json and .mcp.json are no longer configuration files. They are execution vectors. They look like metadata. They function as installers. This applies to every AI coding tool that processes repository-level configuration, not just Claude Code.

All three flaws were patched in Claude Code 2.0.65+. The disclosure timeline stretches from July 2025 to January 2026. That gap matters because it mirrors the broader pattern: AI tools ship fast, and security catches up later.

The Claude Code vulnerabilities demonstrate the risk at the individual developer level. The next incident demonstrates it at ecosystem scale.

ClawHavoc: The AI Agent Supply Chain Attack

The OpenClaw malicious skills crisis represents the largest confirmed supply chain attack targeting AI agent infrastructure to date. Antiy CERT confirmed 1,184 malicious skills across ClawHub, the package registry for the OpenClaw framework, approximately one in five packages in the ecosystem (Antiy CERT, 2026). SecurityScorecard found 135,000 OpenClaw instances exposed to the public internet with insecure defaults. Nine CVEs have been disclosed, three with public exploit code.

The attack techniques are the same ones that have been escalating across software supply chains for years: typosquatting, automated mass uploads, social engineering through fake error messages. The critical difference is privilege. A compromised dependency in a web application runs in a sandboxed runtime. A compromised AI agent skill runs with whatever permissions the agent has been granted: terminal access, file system access, and stored credentials for cloud services.

Our full OpenClaw security analysis covers the ClawHavoc campaign breakdown, all nine CVEs, exposure data, and remediation steps. The takeaway for this guide: ClawHub is the first AI agent registry to be systematically poisoned. It will not be the last.

The supply chain problem extends beyond agent skill registries. The protocol itself is exposed.

How Many MCP Servers Are Exposed to the Internet?

BlueRock Security analysed over 7,000 MCP servers and found that 36.7% were potentially vulnerable to server-side request forgery (SSRF), a class of vulnerability where an attacker tricks a server into making requests to internal resources it should not reach (Security Boulevard, 2026). In their proof of concept against Microsoft's MarkItDown MCP server, researchers retrieved AWS IAM access keys, secret keys, and session tokens from an EC2 instance's metadata endpoint. A single misconfigured MCP server became a gateway to cloud infrastructure.

The scale is significant. In February 2026, scanning results reported on r/cybersecurity identified over 8,000 MCP servers on the public internet. Trend Micro independently found 492 with zero client authentication and zero traffic encryption (Trend Micro, 2026). Bitsight confirmed exposed servers with admin panels, debug endpoints, and API routes accessible without credentials (Bitsight, 2026).

The root cause is a familiar one: default configurations that bind to all network interfaces (0.0.0.0) rather than localhost (127.0.0.1). Developers deploy MCP servers as if they are internal tools, but the defaults expose them to the world.

On January 20, 2026, Cyata researcher Yarden Porat published an exploit chain targeting Anthropic's own official Git MCP server. Three CVEs: path traversal (CVE-2025-68143), argument injection (CVE-2025-68144), and repository scoping bypass (CVE-2025-68145). The exploit achieved remote code execution through prompt injection alone (Dark Reading, 2026). If Anthropic's reference implementation had these flaws, every third-party MCP server built with fewer resources should be treated as suspect.

The Coalition for Secure AI (CoSAI) released a comprehensive MCP Security whitepaper in January 2026 mapping 12 core threat categories and nearly 40 distinct threats (CoSAI, 2026). Three stand out in practice:

Tool poisoning: an attacker modifies an MCP tool's description so the AI model misinterprets what it does. The model thinks it is calling a search function. The tool exfiltrates data.

Confused deputy: the MCP server executes actions using its own elevated privileges rather than the requesting user's. A user without database admin access asks the agent to run a query. The server, which does have admin access, complies without checking.

Overprivileged tokens: MCP servers store credentials such as API keys and database passwords in plaintext configuration files. Every client connecting to that server inherits the same privileged access.

The exposure data establishes the defensive challenge. The next development complicates it: the same protocol is now being used offensively.

AI-Assisted Pentesting: What Kali Linux and MCP Mean for Red Teams

On February 25, 2026, the Kali Linux team published an official guide connecting Claude AI to a Kali environment via MCP (Kali, 2026). A pentester types "Port scan scanme.nmap.org and check if a security.txt file exists." Claude interprets the request, selects Nmap, executes the scan on the Kali host through the mcp-kali-server package, parses the results, and follows up with additional commands if needed. Hassan Aftab documented completing a full web application assessment in roughly 15 minutes, a task he estimated at two to three hours manually.

The architecture connects three layers: Claude Desktop on macOS as the interface, Claude Sonnet 4.5 in the cloud as the AI engine, and a Kali instance running mcp-kali-server via Flask on localhost:5000. Communication runs over SSH with key-based authentication.

The Kali team framed this carefully: it is "a method, not necessarily the best method." That caveat matters. All reconnaissance data, target IPs, open ports, vulnerability findings, routes through Anthropic's cloud-hosted model. For engagements with strict data handling requirements, that may violate client agreements.

Penligent AI published a critique worth reading: the directness that makes this powerful is exactly what makes it a poor default for teams on real assets. When you wire a general-purpose LLM into a privileged execution environment, prompt injection stops being a text output problem and becomes a command execution problem.

For solo researchers in controlled lab environments, this is a genuine productivity multiplier. For production engagements, the missing controls are execution sandboxing, granular audit logging, and output validation before action. Those gaps apply to any MCP-based workflow where the AI agent has command execution authority.

The Kali integration illustrates that MCP is becoming infrastructure, used both defensively and offensively. The final development in February 2026 introduces a different kind of risk entirely.

What the Anthropic Blacklisting Means for Enterprise AI

On February 27, 2026, the Pentagon designated Anthropic a "supply chain risk," the first time an American company has received a classification normally reserved for foreign adversaries like Huawei (CBS News, 2026). President Trump ordered all federal agencies to cease using Anthropic technology within six months. Defence Secretary Hegseth directed that no contractor, supplier, or partner doing business with the US military may conduct commercial activity with Anthropic.

The dispute centred on two red lines Anthropic held: no mass domestic surveillance of Americans and no fully autonomous weapons. The Pentagon demanded access to Claude for "all lawful purposes" without those restrictions. Negotiations broke down after a deadline of 5:01 PM ET on February 27.

This is not a political analysis. It is a practitioner question: if your organisation uses Claude and has any government contracts or subcontracts, what is your exposure?

The supply chain risk designation means companies doing business with the US military must certify they do not use Claude in Pentagon-related work. Anthropic recently stated that eight of the ten largest US companies use Claude. Many hold government contracts. Palantir, which powers its most sensitive military applications with Claude, now needs alternatives. CNN reported the Pentagon acknowledged replacing Claude would be a significant effort since it is the only AI model deployed on classified military networks.

Anthropic is challenging the designation in court. OpenAI announced its own classified network deal hours later, stating it shares the same red lines on surveillance and autonomous weapons (NPR, 2026). Over 430 Google and OpenAI employees signed a solidarity petition.

Three risk scenarios to evaluate now:

Vendor concentration. If your security toolchain, coding workflows, or automation pipelines depend on a single AI provider, this week demonstrated how access can be disrupted overnight. In 20 years of working with enterprise security teams, I have watched organisations build deep dependencies on single vendors. The exit cost is always higher than the procurement team estimated.

Compliance exposure. If you hold US government contracts and use Claude-powered tools anywhere in your workflows, verify whether the designation applies. The legal scope is contested, but general counsels will be asking.

Contingency planning. Document which workflows depend on which AI providers. Identify where a provider becoming unavailable creates operational gaps. This is a continuous threat exposure management problem. Your AI vendor relationships are now part of your threat surface.

Why Existing Security Tools Miss AI Agent Attacks

Cisco's State of AI Security 2026 found that while most organisations planned to deploy agentic AI, only 29% reported being prepared to secure those deployments (Cisco, 2026). That 71% gap exists because the attacks do not resemble what existing tools are designed to catch.

Traditional endpoint detection and response (EDR) tools look for malicious binaries, suspicious process behaviour, and known indicators of compromise. AI agent attacks have none of these. The "exploit" is text. The "payload" is a natural language instruction. The "delivery mechanism" is a document, a web page, or a tool output that the agent processes as part of its normal workflow.

Johann Rehberger (Embrace The Red) published one prompt injection vulnerability per day throughout August 2025, each demonstrating a different way to make an AI agent perform unintended actions through crafted text inputs. Simon Willison called it "The Summer of Johann."

Endor Labs noted in their OpenClaw vulnerability research that traditional static application security testing (SAST) tools cannot identify issues in LLM-to-tool communication flows, conversation state management, or agent-specific trust boundaries (Infosecurity Magazine, 2026). The tooling gap is real. The industry is deploying agent systems faster than it is building the security tools to monitor them.

The closest parallel is the early cloud security gap. Organisations deployed cloud services before understanding the shared responsibility model. Teams used default IAM configurations that gave broad access. It took years for cloud security posture management (CSPM) tools to mature. AI agent security is at that same inflection point, but with a compressed timeline because adoption is moving faster.

Securing AI Agent Deployments: A Practitioner's Framework

The threats described above span skill registries, development tools, protocol infrastructure, offensive tooling, and vendor relationships. The hardening framework maps to each layer.

Discovery and Inventory

You cannot secure what you cannot see. Query endpoints for OpenClaw, Claude Code, Cursor, and other agent tools. Scan your network for common MCP endpoints (/mcp, /sse) and check for 0.0.0.0 bindings. Audit installed skills, MCP server configurations, and IDE extensions. Snyk's mcp-scan tool covers both MCP servers and agent skills.

Authentication and Least Privilege

Never expose MCP servers without authentication. The specification recommends OAuth 2.1. At minimum, enforce token-based auth on all client-server connections. Bind servers to localhost (127.0.0.1) unless remote access is explicitly required. Scope each server's permissions to only the resources its tools need. A server wrapping Nmap has no business holding database credentials.

Configuration as Code

The Claude Code CVEs proved that .claude/settings.json and .mcp.json are execution vectors. Add agent configuration paths to your code review process. Block auto-approval settings for MCP servers. Pin and verify MCP server package versions with the same rigour you apply to any software dependency.

Behavioural Monitoring

Log all MCP tool invocations: every request from client to server, every action the server takes. Alert on credential access patterns. If an agent or skill touches .env files, credential stores, or API key directories, that is investigable. Treat all data returned by MCP servers as untrusted input, sanitise before it reaches the model.

Governance Updates

AI Acceptable Use Policies need agent-specific language. An agent with terminal access and stored credentials is not the same risk profile as a chatbot in a browser tab. The AI Acceptable Use Policy guide covers the governance framework. Include AI agents in your threat model. Map AI vendor dependencies. The Anthropic situation demonstrated that provider access can be disrupted overnight.

Building your security career? Our Cybersecurity Skills Roadmap covers the fundamentals including the AI security skills employers want in 2026.

Summary

February 2026 compressed what normally takes years of incremental discovery into two weeks. Claude Code RCE through repository config files. 1,184 malicious skills poisoning an agent marketplace. Thousands of MCP servers exposed without authentication. AI-assisted pentesting shipped as a default workflow. And the first supply chain risk blacklisting of an American AI company by its own government.

The common thread is MCP becoming the connective tissue for AI agent deployments across the industry. The supply chain threat model that security teams spent years learning to manage, from Gartner's initial predictions to today's reality of 75% of organisations being hit, now extends to agent skills, MCP server packages, repository configuration files, and AI vendor relationships.

The attack techniques are familiar: typosquatting, registry poisoning, social engineering, dependency manipulation. The difference is that AI agents operate with broader system permissions and process untrusted inputs that existing security tooling was never designed to detect.

For practitioners, the response maps to what we already know: discover what is deployed, authenticate everything, enforce least privilege, treat configuration as code, monitor behaviour, and update governance. The surface is new. The principles are not.

AI agent security is evolving weekly. Subscribers get notified when new threats emerge, plus practical security content covering tools, frameworks, and hands-on techniques. No sales pitches, no fluff.


Last updated: March 2026

References and Sources

  1. Check Point Research (Donenfeld, A. & Vanunu, O.). (2026). Caught in the Hook: RCE and API Token Exfiltration Through Claude Code Project Files. CVE-2025-59536 (CVSS 8.7) and CVE-2026-21852 (CVSS 5.3). Published February 25, 2026.
  2. Antiy CERT. (2026). ClawHavoc Campaign Analysis. Trojan/OpenClaw.PolySkill classification. 1,184 malicious skills confirmed across ClawHub.
  3. Trend Micro. (2026). MCP Security: Network-Exposed Servers Are Backdoors to Your Private Data. 492 MCP servers with no client authentication or traffic encryption.
  4. Bitsight. (2026). Exposed MCP Servers Reveal New AI Vulnerabilities. Internet-exposed MCP servers with unsecured admin panels and debug endpoints.
  5. BlueRock Security / Security Boulevard (Burt, J.). (2026). Anthropic, Microsoft MCP Server Flaws Shine a Light on AI Security Risks. 7,000+ MCP servers analysed, 36.7% vulnerable to SSRF. AWS credential theft demonstrated via MarkItDown.
  6. Cyata (Porat, Y.) / Dark Reading. (2026). Microsoft & Anthropic MCP Servers at Risk of RCE, Cloud Takeovers. Exploit chain against Anthropic's Git MCP server. CVE-2025-68143, CVE-2025-68144, CVE-2025-68145.
  7. Coalition for Secure AI (CoSAI). (2026). Model Context Protocol (MCP) Security White Paper. 12 core threat categories, nearly 40 distinct threats.
  8. Cisco. (2026). State of AI Security 2026. 29% of organisations prepared to secure agentic AI deployments.
  9. Kali Linux. (2026). Kali & LLM: macOS with Claude Desktop GUI & Anthropic Sonnet LLM. Official MCP pentesting guide. Published February 25, 2026.
  10. Penligent AI. (2026). Kali Linux + Claude via MCP Is Cool, But It's the Wrong Default for Real Pentesting Teams. Operational security analysis.
  11. Palo Alto Networks. (2026). MCP Security Exposed: What You Need to Know Now. Tool poisoning, credential management, and runtime risks.
  12. SecurityScorecard STRIKE Team. (2026). Beyond the Hype: Moltbot's Real Risk Is Exposed Infrastructure. 135,000+ exposed OpenClaw instances. Three CVEs with public exploit code.
  13. CBS News (Frias, L.). (2026). Hegseth Declares Anthropic a Supply Chain Risk. Pentagon designation, $200M contract, first American company to receive classification.
  14. NPR. (2026). OpenAI Announces Pentagon Deal After Trump Bans Anthropic. OpenAI classified network deal. Same red lines on surveillance and autonomous weapons.
  15. Infosecurity Magazine. (2026). Researchers Reveal Six New OpenClaw Vulnerabilities. Endor Labs finding that SAST tools cannot detect LLM-specific issues.
  16. Red Hat. (2025). Model Context Protocol (MCP): Understanding Security Risks and Controls. Confused deputy problems, command injection, authorisation failures.
  17. OWASP. (2025). Top 10 for LLM Applications. Prompt injection as LLM01. Maintained by 600+ experts from 18 countries.
  18. BlackBerry. (2024). Global Threat Intelligence Report. 75% of organisations experienced software supply chain attack within one year.
  19. Verizon. (2025). Data Breach Investigations Report. Third-party breaches account for 30% of all data breaches.
  20. IBM Security. (2025). Cost of a Data Breach Report 2025. 63% of breached organisations lacked AI governance policies.