Back to Blog
ai-agent-securityindirect-prompt-injectionai-pentesttoolingappsec

Best tools for testing indirect prompt injection in AI agents

Compare the best tools for testing indirect prompt injection in AI agents, including Promptfoo, PyRIT, RAMPART, AgentDojo, and garak, with a focus on poisoned context, tool misuse, and regression coverage.

ByClaire Song14 min read
Pen name disclosure: Claire Song is a pen name used by the 0xClaw editorial team for articles on AppSec operations, evidence quality, and remediation workflows. It is a disclosed byline persona rather than a public individual identity.
Quick answer
Infrastructure note

Compare the best tools for testing indirect prompt injection in AI agents, including Promptfoo, PyRIT, RAMPART, AgentDojo, and garak, with a focus on poisoned context, tool misuse, and regression coverage.

Key takeaways
  • Best tools for testing indirect prompt injection in AI agents should explain infrastructure choices in a way that is easy to quote, compare, and operationalize.
  • Tie architecture explanations back to how local execution, governance, and evidence handling work in practice.
  • Use official docs plus product pages so the page can rank for definitions and support AI citation.
Related next steps

Quick answer:

For most teams, Promptfoo is the best tool for testing indirect prompt injection in AI agents because it targets the real problem instead of the toy version. It lets you inject hostile instructions into untrusted variables like retrieved documents, emails, profiles, and tickets, then fail the test if the agent follows them. If your agent browses the web, Promptfoo also has an indirect web attack flow that can check for behavior hijacking and data exfiltration attempts. PyRIT is the better choice when you need a custom attack harness. RAMPART is the best fit once you want indirect prompt injection findings turned into CI tests. AgentDojo matters when you want to measure defenses on realistic agent tasks rather than on one narrow app. garak is useful as supporting coverage, but it is not the main event here.

That ranking comes from one simple distinction: indirect prompt injection in agents is not just a model-alignment issue. It is a system-security issue. OWASP describes prompt injection as a flaw that exploits the fact that instructions and data are processed together without a hard boundary. In an agent, that weakness becomes more dangerous because the model can browse, retrieve, call tools, and do things on your behalf. If the hostile text lands in the right document chunk or tool output, the damage is no longer theoretical.

Indirect prompt injection testing tools for AI agents

Why indirect prompt injection in agents needs different tooling

Direct prompt injection is obvious. A user types "ignore previous instructions" into the main chat box and hopes the model obeys. Indirect prompt injection is the quieter version. The payload is hidden in something the agent treats as content rather than as an attack: a web page, wiki page, support ticket, email, profile, retrieved chunk, or tool result.

That difference matters more in agents than in plain chatbots. A chatbot can still fail badly, but the blast radius is usually the response itself. An agent has more reach. It may fetch another page, query a private store, choose a different tool, send a message, write code, or hit an internal API. Once that happens, grading the final answer alone is not enough.

Promptfoo's indirect prompt injection docs lay out the core pattern cleanly: the payload lives in external content inserted into the prompt, the attacker may be a third party controlling a data source, and common vectors include RAG documents, emails, profiles, and tickets. OWASP frames the underlying risk even more broadly. The model does not naturally separate instructions from data, so a defensive design has to compensate.

I keep seeing teams test this category with the wrong harness. They run a few jailbreak strings against the chat input, get a refusal, and mark the agent "safe." That is not an indirect prompt injection test. A real test has to prove what happened after the poisoned content reached the agent:

  • Did the agent follow the injected instruction?
  • Did it call a tool it should not have called?
  • Did it exfiltrate data through a URL, request body, or hidden parameter?
  • Did it cross a permission boundary while still sounding calm and helpful?

If your tooling cannot answer those questions, it is incomplete for agent security.

Best tools for testing indirect prompt injection in AI agents compared

| Tool | Best fit | What it proves well | Main limitation | | --- | --- | --- | --- | | Promptfoo | Most product teams shipping real agents | Poisoned context variables, browsing-agent attacks, behavior hijacking, exfiltration attempts, team-friendly regressions | Some grading still depends on LLM judgment, and the web attack flow needs Promptfoo Cloud | | PyRIT | Security engineers building custom attack paths | Cross-domain prompt injection, stored hostile artifacts, multi-step workflows, programmable targets | More setup, more engineering time | | RAMPART | Turning incidents into durable engineering tests | Repeatable CI coverage for adversarial and benign scenarios | Better for regression than first-pass discovery | | AgentDojo | Benchmarking defenses and comparing robustness claims | Realistic agent tasks over untrusted data, broader defense evaluation | Not a plug-and-play scanner for your own staging endpoint | | garak | Model-layer probe coverage around the agent | Fast prompt injection and jailbreak pressure on LLM components | Does not model your full agent workflow by itself |

If you want the shortest practical answer, use Promptfoo first, PyRIT when the workflow gets custom, RAMPART once you have a confirmed bug, AgentDojo when you want to keep your claims honest, and garak as a support tool rather than the center of the stack.

Promptfoo is the strongest default for most teams

Promptfoo sits at the top because it matches how indirect prompt injection actually shows up in production systems. Its plugin asks you to identify the untrusted variable, then injects adversarial payloads into that variable and fails the test if the model follows the inserted instructions. The docs call out the right injection points too: context, documents, retrieved_chunks, email_body, bio, notes, ticket_description, and other fields teams really feed into prompts.

That might sound obvious, but it is not. Plenty of security tooling still treats prompt injection like a special flavor of jailbreak prompt. Promptfoo's own deprecated strategy page explicitly says the old "prompt-injection" strategy name was misleading because static jailbreak templates do not cover modern structured-data or indirect prompt injection cases. That is exactly the right correction.

Promptfoo gets even more useful when the agent browses. Its indirect web attack strategy generates realistic web pages with embedded prompt injection payloads, serves those pages, and then checks whether the agent followed the malicious instructions. In exfiltration mode, the system can deterministically track whether the agent made HTTP requests to the exfiltration endpoint. That is much better evidence than a vague "the response looked suspicious."

Where Promptfoo works especially well:

  • RAG-heavy assistants that merge retrieved chunks into prompts
  • support, sales, or operations agents that ingest tickets, notes, or email
  • browsing agents that summarize external pages
  • teams that want artifacts engineers can own without a research-grade setup

Its main weakness is not conceptual. It is operational. The more you depend on hosted web attack generation or LLM grading, the more you need to understand which results are deterministic and which are judgment-based. For most teams that is still a very good trade.

PyRIT is better when you need a custom attack harness

PyRIT is the tool I would hand to a security engineer who wants control, not convenience. The framework docs are a good example. They explicitly call out cross-domain prompt injection attacks where one target stores poisoned content and a later target processes it, and they note that intermediate interactions can be saved to memory when you need to inspect the chain rather than just the final outcome.

That matters because indirect prompt injection often stops looking neat the moment you leave the demo path. Real agents are messy. One retrieves content from a knowledge base. Another opens URLs from tool output. Another mixes email bodies with user instructions and then picks between several tools. The attack path may include uploads, stateful sessions, staging APIs, approval steps, or browser automation. A rigid off-the-shelf harness can become a straightjacket in that environment.

PyRIT is strong when you need to model things like:

  • a poisoned document upload that later reaches the agent through retrieval
  • a browsing agent that reads attacker-controlled HTML
  • a multi-turn workflow where the hostile content persists across steps
  • a custom scoring rule tied to business side effects rather than text alone

The tradeoff is predictable. You get flexibility because you are willing to assemble more pieces yourself. That is a good bargain for a research-minded security team. It is less attractive for a product squad that needs quick coverage and clean ownership.

RAMPART matters because findings need to become tests

The most common failure in agent security is not missing the first bug. It is failing to keep the bug fixed.

Microsoft introduced RAMPART on May 20, 2026 as an open-source framework for encoding adversarial and benign scenarios as repeatable CI tests. That positioning matters. RAMPART is not trying to be the coolest exploitation demo. It is trying to turn red-team lessons and incident learnings into engineering assets that survive beyond the meeting where everyone agreed the bug was serious.

That is exactly what indirect prompt injection needs. These bugs reappear in boring ways:

  • a new retrieval field gets added without review
  • a tool description quietly becomes an injection surface
  • an approval step gets weakened in a refactor
  • a previously blocked action path returns under a different tool schema

RAMPART is useful once you already understand the dangerous workflow and want to enforce a stable expectation around it. This is where many teams should get stricter. If an incident once proved that poisoned email content can push an agent into using the wrong tool, the fix should become a failing test in CI, not a paragraph in a retro deck.

I would not lead with RAMPART for first-pass exploration. I would add it right after the first confirmed issue. That is when it becomes the difference between "we learned something" and "we changed the system."

AgentDojo and garak answer two different questions

These two tools are often grouped together because both sit outside the narrow product-team workflow, but they solve different problems.

AgentDojo is a benchmark. According to the paper, it provides 97 realistic tasks and 629 security test cases for agents that execute tools over untrusted data. That makes it useful when you want to test whether a defense actually generalizes. If your prompt separation trick only works on one internal support bot and collapses on email, travel, or banking-style tasks, a benchmark environment is more likely to expose that.

This is why AgentDojo matters even if you never run it in your day-to-day release workflow. It puts pressure on overconfident claims. A defense that passes one app-specific suite can still be fragile. A benchmark does not make it production-safe, but it is one of the few ways to check whether you are merely overfitting to your own environment.

garak is different. garak focuses on risks that are inherent to LLM deployment, including prompt injection, jailbreaks, and guardrail bypass. That makes it a solid support tool for broad model-side pressure. It is good for quick sweeps, comparing models, and keeping an eye on obvious regressions in the underlying language layer.

What garak does not do on its own is understand your agent's business logic, retrieval topology, or side-effect path. It can tell you the model is susceptible to a family of attacks. It cannot fully tell you whether your browsing assistant just sent a secret to the wrong place unless you wire that context around it.

So the split is simple:

  • Use AgentDojo to benchmark robustness claims.
  • Use garak to widen model-side coverage.
  • Do not confuse either one with a complete indirect prompt injection harness for your shipping agent.

How to build a testing stack that catches real failures

Most teams should stop looking for the single magic tool and build a layered stack instead.

Start with Promptfoo or PyRIT against the real entrypoint. Inject hostile instructions through the same channel the agent really trusts: retrieved chunks, emails, documents, notes, or web content. Then instrument the action layer. The interesting question is not whether the model said something odd. The interesting question is whether it did something it should not have done.

That stack usually looks like this:

  1. Use Promptfoo for fast discovery against named untrusted variables.
  2. Use PyRIT when the workflow includes browser state, uploads, or custom orchestration.
  3. Capture side effects such as tool calls, outbound requests, and approval bypasses.
  4. Turn confirmed issues into regression tests with RAMPART or your existing CI harness.
  5. Run garak against important model components to broaden pressure.
  6. Use AgentDojo when you need a more defensible robustness story.

This is also where category discipline matters. If you only test chat input jailbreaks, you are doing a useful but narrower kind of safety work. If you test poisoned context that changes tool behavior or causes exfiltration, you are doing indirect prompt injection testing for agents.

That distinction affects how you spend time and money. If your current problem is broader application security around the agent, not just prompt injection, the rest of the blog, the product compare hub, current pricing, and the local workflow on download are the natural next pages to review.

What teams still get wrong

The first mistake is grading only the final answer. An agent can claim it behaved safely after it already hit the wrong tool. That is not a hypothetical edge case. It is a common reporting mistake.

The second mistake is pretending indirect prompt injection is just "jailbreaks, but hidden." It is a data-flow problem, a trust-boundary problem, and often a tool-permission problem at the same time.

The third mistake is stopping at discovery. Once you confirm a real workflow bug, lock it into CI. Otherwise the same issue will come back with different field names and a new owner who has never seen the original report.

The fourth mistake is skipping benchmark pressure altogether. Internal harnesses are useful, but they make it easy to believe your defense is better than it is. That is why AgentDojo exists.

The fifth mistake is assuming model-layer coverage alone is enough. garak can tell you something important about the model. It cannot stand in for end-to-end agent validation.

If I had to reduce the whole topic to one sentence, it would be this: an indirect prompt injection test is only convincing when it proves the agent stayed safe after reading hostile content, not when it merely produced a safe-sounding answer.

FAQ

What is the best open-source tool for testing indirect prompt injection in AI agents?

For most teams, Promptfoo is the best open-source starting point because it directly supports indirect prompt injection against untrusted variables and also has a web-browsing attack flow for agents that fetch outside content.

When should I choose PyRIT over Promptfoo?

Choose PyRIT when the workflow is unusual enough that you need a custom harness: uploaded artifacts, browser automation, multi-step orchestration, stateful sessions, or scoring logic tied to business side effects rather than response text.

Is RAMPART a discovery tool or a regression tool?

Mostly a regression tool. You can explore with it, but its real value is converting known attack scenarios and incident lessons into tests that run continuously in CI.

Do I need AgentDojo if I already test my own agent?

Not always, but AgentDojo is useful when you want to validate that a defense generalizes beyond your own product. It is harder to fool yourself when the benchmark includes many realistic tasks and security cases.

Is garak enough by itself for indirect prompt injection?

Usually no. garak helps pressure the model layer, but indirect prompt injection in agents depends on retrieved content, tool use, browsing, permissions, and side effects. That requires a broader harness.

What should a passing indirect prompt injection test prove?

A strong passing test should show that hostile content reached the agent, the agent did not follow the injected instruction, it did not call forbidden tools or leak data, and the system stayed inside its intended permission boundary across repeated runs.

Bottom line

If you want one name, pick Promptfoo. It is the best default tool for testing indirect prompt injection in AI agents today.

If you need finer control over the attack path, use PyRIT. If you want to keep a confirmed issue fixed, add RAMPART. If you need a benchmark to keep your confidence in check, use AgentDojo. If you want broader model-side pressure, keep garak nearby.

The ranking matters less than the mindset. Indirect prompt injection is dangerous because agents treat outside content as fuel for actions. The right tool is the one that follows that fuel all the way to the side effect.

Ready to run your first AI pentest?

Get 0xClaw up and running in under 3 minutes. No infrastructure setup. No cloud dependency.

Continue Reading

More AI Pentest Guides

Continue through the local AI pentesting cluster with related guides on workflow, evidence, comparisons, and remediation.