Securing AI Agents: MCP Tool Poisoning Risks
Summary
Microsoft Incident Response warns that as AI agents move from reading content to taking actions, poisoned Model Context Protocol (MCP) tool metadata can silently redirect agent behavior and expose sensitive data. The guidance outlines how to detect, contain, and prevent this emerging supply chain risk using controls across Copilot Studio, Entra, Purview, Defender, and Sentinel.
Securing AI agents against MCP tool poisoning
Introduction
As enterprise AI agents evolve from passive assistants into tools that can act on behalf of users, the security stakes are rising quickly. Microsoft Incident Response has highlighted a growing attack pattern where malicious changes to Model Context Protocol (MCP) tool metadata can manipulate an agent into taking unintended actions, including exposing sensitive business data.
This matters for organizations building agents with Microsoft 365 Copilot, Copilot Studio, and Azure AI Foundry, especially where external tools and third-party MCP servers are part of production workflows.
What’s new
Microsoft’s latest guidance focuses on MCP tool poisoning, an attack that targets the agentic AI supply chain rather than a vulnerability in Copilot itself.
How the attack works
- A trusted third-party MCP tool is updated with malicious instructions hidden inside its natural-language description.
- The agent reads that metadata as part of its decision-making context.
- Without visible warning, the agent performs extra steps beyond the user’s request.
- Sensitive data can then be sent to an attacker-controlled endpoint through an otherwise approved tool call.
Microsoft illustrates this with a finance workflow in Copilot Studio, where an invoice validation tool is modified to quietly collect and forward summaries of unpaid invoices.
Why it matters for IT and security teams
The key risk is the trust boundary between agents and external tools. Even when permissions, allowlists, and connectors appear valid, poisoned metadata can alter agent behavior in production.
For administrators, this means:
- Tool descriptions should be treated like system prompts
- MCP servers must be governed as supply chain dependencies
- Runtime monitoring is needed for suspicious tool use and outbound activity
- Human approval may be necessary for high-impact actions
Microsoft-recommended protections
Microsoft maps several controls to this threat:
- Govern the supply chain: Maintain an allowlist of approved MCP publishers and servers, and avoid broad “Allow all” MCP access.
- Inspect tool metadata: Use Prompt Shields in Azure AI Content Safety and Defender for Cloud AI workload protection to inspect prompts, tool outputs, and metadata.
- Guard sensitive actions: Apply Microsoft Purview DLP to inspect outbound tool parameters and block sensitive data exfiltration.
- Use identity controls: Assign agents a non-human identity with Microsoft Entra Agent ID and enforce Conditional Access.
- Correlate telemetry: Forward MCP telemetry to Microsoft Sentinel, review Purview audit logs, and watch for new endpoints in Defender for Cloud Apps.
Next steps
Organizations using agentic AI should review all production MCP integrations now. Start by inventorying approved MCP servers, requiring change review for metadata updates, and adding human-in-the-loop approvals for critical workflows such as finance, external sharing, and account changes.
The broader takeaway is clear: as AI tools move from reading to acting, security controls must extend beyond prompts and permissions to the full agent supply chain.
Need help with Security?
Our experts can help you implement and optimize your Microsoft solutions.
Talk to an ExpertStay updated on Microsoft technologies