Agentic AI Failure Modes Taxonomy Updated by Microsoft
Summary
Microsoft has updated its taxonomy of failure modes in agentic AI systems after a year of red teaming against real-world deployments. The v2.0 framework adds seven new risk categories and expanded mitigations, giving security teams a more practical model for assessing agentic AI threats such as MCP/plugin abuse, goal hijacking, and session context contamination.
Introduction
Microsoft has released a major update to its taxonomy of failure modes in agentic AI systems, reflecting lessons learned from 12 months of red teaming. For security leaders, architects, and IT administrators evaluating AI agents, this matters because the threat model is evolving quickly as plugins, MCP integrations, and computer-use agents move into production.
The new v2.0 taxonomy is more than a theoretical framework. It is based on observed attack patterns in deployed environments and highlights where existing controls are falling short.
What’s new in the updated taxonomy
Microsoft added seven new failure mode categories:
- Agentic supply chain compromise: Malicious instructions delivered through plugins, MCP servers, prompt templates, or third-party integrations.
- Goal hijacking: Adversarial instructions subtly redirect an agent’s objective without fully compromising it.
- Inter-agent trust escalation: A compromised agent abuses weak identity or permission checks in multi-agent workflows.
- Computer Use Agent visual attack: GUI-based agents are manipulated through hidden or adversarial visual content.
- Session context contamination: Early session inputs bias later reasoning across multi-step tasks.
- MCP/plugin abuse: Tool description poisoning, server-side instruction injection, and cross-server override attacks.
- Capability/architecture disclosure: Agents expose internal prompts, schemas, tools, or approval logic that attackers can weaponize.
Key red team findings
Microsoft says several patterns appeared consistently across engagements:
- Human-in-the-loop bypass was one of the most frequently exploited weaknesses.
- Cross-domain prompt injection remained a reliable initial access method.
- Memory poisoning and XPIA often worked together to persist malicious influence.
- Zero-click attack chains were demonstrated in some cases, leading to exfiltration or lateral movement.
- Capability disclosure often enabled deeper exploitation by turning black-box probing into white-box attacks.
Why this matters for IT and security teams
Organizations deploying agentic AI can no longer treat these systems like standard chatbots. Agents interact with tools, memory, external services, and sometimes graphical interfaces, which creates new attack paths that traditional application security models do not fully cover.
For administrators and security teams, the update reinforces that AI governance must include supply chain controls, behavioral monitoring, and stronger trust validation across tools and agents.
Recommended next steps
This quarter, teams should consider:
- Reviewing all MCP servers, plugins, and third-party agent components as part of the software supply chain.
- Verifying signatures, provenance, and dependency inventories for agent-connected tools.
- Testing for prompt injection, memory poisoning, and approval bypass in red team or tabletop exercises.
- Limiting unnecessary disclosure of system prompts, tool schemas, and internal architecture details.
- Adding monitoring that evaluates full-session behavior, not just single prompts or events.
Microsoft’s update is a useful signal for enterprises: if you are deploying agentic AI, your security model needs to evolve just as quickly as the technology.
Need help with Security?
Our experts can help you implement and optimize your Microsoft solutions.
Talk to an ExpertStay updated on Microsoft technologies