Agent Traps: Enterprise Agentic AI Wave Brings In New Threats

In April 2026, Vercel, a cloud application deployment platform, disclosed a security incident that did not originate within its own infrastructure, but through a third-party AI tool used by an employee. The attacker compromised the tool, used it to gain access to the employee’s Google Workspace account, and then pivoted into internal systems, eventually accessing environment variables and sensitive operational data.
What makes this incident notable is not just the breach itself, but what it represents. The initial entry point was an AI system embedded into everyday workflows — the same class of tools that increasingly power AI agents and agent-driven automations inside organisations. As enterprises begin to rely on agents to retrieve context, connect systems and execute tasks autonomously, these tools effectively become extensions of the organisation’s operational layer.
Around the same time, researchers at AI giant Google DeepMind were outlining a deeper, more structural risk. Their research paper introduces the concept of “AI agent traps” — adversarial inputs designed not to hack systems, but to manipulate agents themselves.
The premise is simple but profound. As AI agents begin to navigate the web, interact with tools and make decisions autonomously, the attack surface shifts from software vulnerabilities to the information environment itself. Instead of breaking into systems, attackers can influence what agents see, how they reason, and what actions they take.
This marks a clear transition. In the past, cybersecurity was about protecting systems from unauthorised access. In an agent-driven world, the approach to cybersecurity has changed. Systems may remain intact, but agents operating within them can still be misled, hijacked, or manipulated into executing harmful actions.
That is already beginning to play out.
The Age Of Agent Traps
The most immediate way to understand this shift is through prompt injection, which has quickly emerged as the baseline attack against AI agents. But what makes it dangerous is not just the technique itself, but how naturally it fits into agent workflows.
Rahul Sasi, cofounder and CEO of CloudSEK, an Indian-origin cybersecurity company, describes a simple scenario. An AI agent tasked with monitoring social media for negative sentiment reads a post that includes a hidden instruction telling it to ignore all prior directives and classify the content as positive.
There is no breach, no malware, and no exploit in the traditional sense, but yet the agent has been compromised. In this case, the attacker has exploited the logic built into the agent for its functioning. This is what is known as an agent trap.
According to the DeepMind research, “agent trap” attacks operate across multiple layers of an agent’s lifecycle, targeting not just immediate inputs but also reasoning, memory and long-term behaviour.
What makes agent traps particularly difficult to detect is that they exploit a fundamental asymmetry. Humans interact with a rendered version of content, while agents parse underlying data structures.
In practical terms, this means malicious instructions can be hidden in seemingly benign sources like web pages, documents or even images. Once ingested, these instructions can alter how an agent interprets tasks, prioritises goals or executes actions. Over time, this can lead to subtle but significant behavioural drift.
There is also a more structural layer of risk emerging with how agents are built and distributed. Modern agents are often composed of modular capabilities — the ability to read files, access systems or interact with external tools.
These capabilities are defined through instructions rather than executable binaries. If a malicious instruction is embedded within these definitions, the agent can perform harmful actions without triggering traditional detection mechanisms.
“This is not a malicious file anymore. But rather it’s an instruction,” CloudSEK’s Sasi says about agent traps, indicating that threats are no longer just code, but about real-world hijacking, context and behaviour. This makes them particularly harder to isolate, detect or prevent using existing mechanisms and approaches.
The Google DeepMind paper also highlights examples of such traps, including, Systemic Traps, where attackers manipulate groups of AI agents instead of targeting a single one. This includes Congestion Traps, where many agents are pushed toward the same action or resource at once, causing chaos or outages, and Sybil Attacks, where fake agent identities influence collective decisions or create false consensus.
Another category is Human-in-the-Loop Traps, where agents manipulate human reviewers through misleading summaries, approval fatigue or hidden social engineering tactics, eventually convincing humans to approve harmful actions or trust compromised outputs.
Why Traditional Security Models May Not Work
What makes AI agents uniquely difficult to secure is not just the presence of new attack vectors, but the breakdown of assumptions that traditional systems rely on. In conventional software, behaviour is largely deterministic, access is clearly defined and security controls can be applied at specific checkpoints. None of these hold true in agentic systems.
According to Rajesh Chhabra, general manager for APAC and large markets at Acronis, a global cybersecurity major, AI agents operate with a level of autonomy and interconnectedness that fundamentally changes how risk manifests. Agents continuously ingest external data, make decisions in real time and interact with multiple systems simultaneously.
This creates an environment where threats are no longer isolated events but evolving processes. An agent can be influenced at one point in time, and the effects of that manipulation may only surface later, making attribution and detection significantly harder.
Another challenge lies in establishing what constitutes normal behaviour. Because agents adapt based on context, their outputs are inherently variable. This makes it difficult to detect anomalies using traditional baselines. A compromised agent may still appear to function correctly, while gradually deviating in ways that are not immediately obvious.
At the same time, agents are often granted broad access to maximise utility. They are integrated into workflows, connected to APIs and given permissions to execute actions across systems. This creates a situation where a compromised agent is not just a faulty component, but a high-privilege actor operating within the organisation.
Attempts to mitigate these risks using guardrails — rules that restrict inputs or outputs — have proven limited. CloudSEK’s Sasi compares them to blacklists, where it is impossible to anticipate every possible variation of an attack. In agentic systems, where interactions are multi-step and context-dependent, these controls can be bypassed in ways that are difficult to predict.
The deeper issue is that security is no longer confined to a single layer. It spans the entire lifecycle of the agent, from how it processes inputs to how it stores memory and executes actions.
Balancing Agent Autonomy And Human Control
As enterprises scale the deployment of AI agents, the emerging consensus is not to slow down adoption, but to fundamentally change how these systems are governed.
Srinivasan Raghavan, chief product officer at Nasdaq-listed Freshworks, says the company’s agentic models are designed to operate within clearly defined boundaries, with scoped permissions and strong governance layers. If the system detects that an agent is potentially being manipulated mid-task, it immediately tries to contain the impact and escalate decisions back to human operators.
The emphasis is on maintaining trust through transparency and accountability, ensuring that agents can function effectively without becoming uncontrollable.
A similar approach is being adopted at AI transformation company Sonata Software, where CTO Manu Swami highlights the importance of treating AI agents as governed digital identities. Instead of allowing open-ended autonomy, agents are constrained within specific workflows, with human oversight for high-impact actions and continuous monitoring to detect anomalies.
This has led to the emergence of a more structured approach to agent security, where multiple layers of control are embedded directly into the system:
- Identity and access controls that define what an agent can do
- Policy-based constraints that limit how tasks are executed
- Runtime monitoring to detect deviations in behaviour
- Isolation mechanisms to prevent lateral movement across systems
While not industry-wide standards yet, these first measures are part of a broader shift toward zero-trust architectures, where no entity — human or machine — is trusted by default. In the context of AI agents, this extends beyond access control to include memory, data flows and decision-making processes.
Acronis’s Chhabra notes that organisations must treat all inputs as untrusted and continuously verify their origin and intent. This applies not just to external data, but also to the agent’s own memory and learned context, which can be manipulated over time.
Other companies are revisiting their processes and workflows for how threats are analysed. CloudSEK, for instance, is building systems that model how an AI agent would navigate an organisation’s infrastructure, identifying potential attack paths based on machine-level reasoning rather than human intuition.
It is evident that the nature of security is undergoing a deeper transformation, “Existing LLM guardrails are insufficient for autonomous agents because they were made for static, single turn interactions instead of dynamic.” added Subhajeet Singha, a senior researcher at Acronis’s threat research unit.
Security in this context is also more about maintaining predictable behaviour than preventing unauthorised access. A threat can even come from agents being hijacked to do things beyond their scope. So the security challenge today is balancing the trade-offs between agent autonomy and human control.
The post Agent Traps: Enterprise Agentic AI Wave Brings In New Threats appeared first on Inc42 Media.


Superadmin 










