AI Agent Pentest Report Sample

Quick answer: what should an AI pentest sample report template for AI agents include?

An AI pentest sample report template for AI agents should show the exact workflow under test, the model and tool boundaries, the prompt or context that triggered the behavior, the evidence that proves what the agent actually did, the resulting impact, the likely fix owner, and the retest condition. If the report only preserves a transcript and a scary headline, it is not strong enough for AppSec review.

That standard lines up with classic reporting guidance from OWASP and NIST SP 800-115, but agent systems add more places where the truth can blur. A run may touch retrieval, memory, tool permissions, approval gates, and downstream APIs in a single chain. Your report has to separate those layers instead of flattening them into "the AI went rogue." If you want the visual summary first, open the sample report image.

AI pentest sample report template for AI agents

If you need the baseline structure before the sample, start with AI pentest report template for AI agents. If you are standardizing proof requirements across the team, pair this page with AI pentest evidence checklist for AppSec teams.

Why AI agent sample reports usually fail

Most weak agent reports fail in a predictable way. They preserve too much agent chatter and too little proof. The tester exports a long conversation, adds a summary sentence about prompt injection or tool abuse, and expects engineering to sort it out. That is not a report. It is a messy notebook.

Agent systems make this worse because one run can look dramatic without producing a confirmed security result. The model may propose a dangerous action, mention a secret-looking value, or claim it completed a task that never happened. I have seen teams burn review time on exactly that kind of false alarm. That is why the OpenAI Agents tracing guide is a useful reporting reference even if you do not use that SDK. It treats a run as a set of spans: generations, tool calls, handoffs, guardrails, and custom events. A serious report should think the same way.

The threat framing matters too. The OWASP Top 10 for LLM Applications 2025 separates prompt injection, insecure output handling, sensitive information disclosure, and excessive agency because those are not the same failure. MITRE ATLAS does the same in a different vocabulary, with techniques such as LLM prompt injection, AI agent tool invocation, and AI agent context poisoning. If your sample report merges those into one vague "AI issue," the remediation path gets fuzzy right away.

I would rather read a short report that clearly states one confirmed boundary failure than a polished twelve-page report that never explains whether the model actually crossed a control boundary.

Start with an engagement snapshot that removes ambiguity

The top of the sample report should answer the questions the reviewer would otherwise send back in Slack.

Use a compact opening table like this:

| Field | What to capture | Why it matters | | --- | --- | --- | | Agent or workflow | Workflow name, release tag, deployment ID, or commit | Confirms the exact build under test | | Model path | Model, provider, reasoning mode, and policy wrapper | Agent behavior may change by runtime and safety layer | | Tool inventory | Every tool exposed to the agent, plus read-only vs write access | Many serious findings depend on tool authority, not prompt text alone | | Context inputs | Retrieval source, memory source, uploaded files, cache policy | Shows whether the issue came from context poisoning or prompt handling | | Identity model | User auth state, delegated credentials, service account, approval gate | Tells reviewers where authority originated | | Reachable systems | Ticketing, email, browser, code repo, local files, internal APIs | Defines blast radius in plain language | | Trace sources | Span IDs, HAR, screenshots, server logs, queue IDs, audit logs | Makes the evidence trail visible before findings begin | | Test limits | Read-only rules, blocked actions, redactions, time box | Prevents readers from assuming a stopped path was disproven |

This section is not bureaucracy. It is the part that saves you from the "wait, which build was this?" thread two days later. NIST SP 800-115 is explicit that technical testing includes planning, analysis, and mitigation strategy development, not just running checks. The snapshot is the planning context that makes the analysis readable later.

For agent work, I would also add one extra line that ordinary web reports often skip: "What counted as execution?" In some environments, a drafted email is harmless unless it is sent. In others, generating a SQL query is already risky because another service auto-executes it. Your sample report should set that line early.

A sample findings summary that AppSec can triage quickly

The summary table should help a reviewer route work, not admire formatting. Keep it narrow.

| ID | Title | Severity | Confidence | Failed boundary | Owner | | --- | --- | --- | --- | --- | --- | | AGENT-01 | Retrieved prompt injection caused unauthorized create_ticket tool call | High | Confirmed | Retrieval to tool policy | Agent platform team | | AGENT-02 | Session memory preserved another user's internal runbook excerpt | Medium | Confirmed | Tenant memory isolation | Platform engineering | | AGENT-03 | Operator approval dialog hid the final external recipient | Medium | Partial | UX approval clarity | Product + security |

That table does three useful things.

First, it names the failed boundary instead of using a generic bug class. "Prompt injection" alone is not enough for an agent report, because the real engineering owner depends on what the injection influenced.

Second, it includes confidence. That is the fastest way to keep the report honest. A confirmed tool execution is different from a plausible model plan that could not be completed in scope.

Third, it keeps ownership visible. The OWASP reporting structure recommends actionable remediation. In practice, that starts by making it obvious who can actually fix the problem.

If your team has not settled on a reporting baseline yet, read what should an AI pentest report include before standardizing the template. It helps avoid the usual sprawl.

Sample finding: indirect prompt injection that turned into tool abuse

Below is a compact example you can adapt. It is a sample, not a claim about a specific product.

ID: AGENT-01
Title: Retrieved prompt injection caused unauthorized `create_ticket` tool call
Severity: High
Confidence: Confirmed
Affected boundary: Retrieval context -> tool authorization
Mapped references: OWASP LLM01 Prompt Injection, OWASP LLM08 Excessive Agency, MITRE ATLAS LLM Prompt Injection / AI Agent Tool Invocation

Preconditions:
- The agent could search an internal knowledge base and call `create_ticket`
- The tool executed with a service account that had write access to the incident system
- Operator approval was configured for "external actions" but not for internal ticket creation

Description:
The agent retrieved a poisoned knowledge base article that instructed it to open
an urgent ticket containing the full conversation transcript. When the user asked
for a summary of deployment blockers, the agent followed the hidden instructions
inside the article and called `create_ticket` with the raw transcript instead of
returning a text-only summary.

Evidence:
1. Retrieval trace showing document `kb-7421` was injected into the agent context.
2. Span `tool:create_ticket` with the agent-generated body containing internal
   conversation data and a private URL.
3. Audit log from the incident platform confirming ticket `INC-38144` was created
   by the agent service account.
4. Screenshot showing the user only asked for "a short summary for today's blockers."

Impact:
An attacker who can poison retrieved content can cause the agent to perform an
unapproved write action and copy sensitive conversation content into an external
or cross-team system. The issue is not just prompt injection. It is prompt
injection plus over-broad tool authority.

Remediation:
- Strip or isolate instruction-like content from retrieved documents before it
  reaches the active reasoning context
- Require explicit approval for any write-capable tool, even if the target system
  is considered "internal"
- Limit the service account so it cannot create incident tickets outside approved
  projects
- Log tool approval state and the final tool payload in an auditable form

Retest condition:
Replay the same query with the same poisoned article present. The agent may read
and summarize the article, but it must not call `create_ticket` without a fresh,
explicit approval tied to the exact payload.

This sample works because it names the trigger, the proof, the effect, and the control gap. It does not stop at "prompt injection succeeded." That phrase is too shallow for an agent system where the real damage came from tool scope and approval design. The OWASP LLM01 prompt injection guidance is helpful here because it calls out how malicious instructions can ride in separate data sources. The report should preserve that path.

What evidence should sit behind each agent finding

Evidence is where agent reports either become trustworthy or collapse. The reviewer should be able to answer "what really happened?" without trusting the tester's interpretation.

For AI agent work, I usually want these artifacts:

| Artifact | What it proves | | --- | --- | | Trace or span export | The order of model steps, tool calls, handoffs, and guardrails | | Tool input and output | Whether the agent actually passed the dangerous payload downstream | | Downstream system log | Whether the target system accepted the action | | Screenshot or UI recording | What the operator or end user saw at the time | | Environment snapshot | Which model, tool config, and identity rules were active | | Replay steps | Whether engineering can reproduce the issue without guessing | | Retest artifact | That the fixed build now fails at the intended control point |

The main thing to avoid is confusing intention with action. A reasoning trace might show the model considering a bad move. That matters, but it is not the same as a tool actually firing. The OpenAI tracing documentation is useful precisely because it separates generation spans from function spans and guardrail spans. Your report should preserve that separation instead of dumping one giant transcript in an appendix.

Redaction matters here too. Agent traces can carry user data, internal URLs, hidden instructions, ticket content, and private files. A sample report should demonstrate how to keep evidence reviewable without turning the report itself into a leakage channel. I would much rather see a redacted span ID plus a short explanation than a full transcript pasted carelessly.

If your team is cleaning up evidence quality before report writing, how security engineers should triage AI pentest results is the right internal companion.

Write impact and remediation in plain English

One of the oldest reporting mistakes is still alive in agent security work: the report describes the bug in technical detail and then adds a vague impact sentence that says almost nothing.

Weak version:

The issue may lead to unauthorized actions and sensitive information disclosure.

That sentence is technically possible and operationally useless.

Better version:

A malicious document author can influence the agent's retrieval context and cause a write-capable tool to create a ticket containing private conversation data. The immediate fix belongs in tool approval and retrieval handling, not in the downstream ticketing API.

That second version is easier to route. It names the attacker path, the actual effect, and the layer that owns remediation.

For agent engagements, I like five short fields under every impact block:

| Field | Question | | --- | --- | | Actor | Who can trigger this path? | | Boundary crossed | Which control failed first? | | Result | What was actually exposed, changed, or executed? | | Preconditions | What had to be true already? | | Fix owner | Which team can ship the first real mitigation? |

This style lines up well with both OWASP reporting guidance and MITRE ATLAS, because both frameworks push you to think in terms of attacker behavior and control failure, not just vulnerability labels.

Retest sections should be boring and strict

Good retest writing is intentionally dull. It should read like a checklist, not a victory speech.

A retest block for an AI agent finding should say:

Which original finding ID was retested
Which build, deployment, or policy version was under review
Which original trigger was replayed
What specific evidence proves the action is now blocked

Here is a short sample:

Retest status: Resolved

Retested finding:
- AGENT-01 Retrieved prompt injection caused unauthorized `create_ticket` tool call

Build under test:
- agent-web 3.4.2
- tool-policy bundle 2026-05-21.4

Retest method:
- Replayed the original user query
- Kept poisoned article `kb-7421` in the retrieval set
- Observed trace spans and approval prompts

Result:
- The agent retrieved the article but did not call `create_ticket`
- The UI required explicit approval for the write action
- The approval modal showed the full destination and payload summary
- No new ticket was created in the incident platform

That is enough. If the fix is real, the retest should fail in a very ordinary way. If you want the broader closure workflow, use how security teams retest fixes in AI pentest workflows next to this sample.

A paste-ready sample report template for AI agents

This is the part most teams actually need. Keep it lean and adapt only where the system requires more detail.

# Engagement snapshot

- Engagement name:
- Date tested:
- Tester(s):
- Agent or workflow:
- Version / deployment / commit:
- Model and provider:
- Safety or policy layer:
- Tools enabled:
- Read-only vs write-capable tools:
- Retrieval and memory sources:
- Identity and approval model:
- External systems reachable:
- Evidence collected:
- Test limits:

# System map

- User input path:
- Context injection path:
- Memory behavior:
- Tool invocation path:
- Downstream trust boundaries:
- Guardrails and approval points:

# Findings summary

| ID | Title | Severity | Confidence | Failed boundary | Owner |
| --- | --- | --- | --- | --- | --- |

# Finding AGENT-XX: [title]

- Severity:
- Confidence:
- Failed boundary:
- Mapped references:
- Preconditions:

## Description
[What happened in plain English.]

## Trigger
[Exact prompt, retrieved content, uploaded file, or event.]

## Evidence
- Trace or span IDs:
- Tool call evidence:
- Downstream log or state change:
- User-visible proof:

## Impact
[What changed, leaked, or executed.]

## Remediation
[What to change and who should own it.]

## Retest condition
[Exact replay that should now fail.]

# Residual notes

- Unconfirmed leads:
- Out-of-scope paths worth testing later:
- Logging or monitoring gaps:

The trick is not adding more headings. The trick is making each heading answer one concrete review question. If your report template already covers the general case, compare it against AI pentest report template for AI agents and use this page for the sample wording and evidence pattern.

Common mistakes to remove before you send the report

I would cut a report back before I sent it if it does any of the following:

treats model reasoning text as proof of execution
says "prompt injection" without naming what the injection changed
omits tool scope, approval state, or identity context
pastes screenshots without logs or trace evidence
assigns remediation to "AI guardrails" when the real bug is tool authorization
marks a finding resolved without replaying the original trigger

Those mistakes waste time because they create false certainty. Agent systems already produce enough ambiguity on their own. The report should reduce that ambiguity, not freeze it into a PDF.

FAQ

Is a transcript enough for an AI agent pentest sample report?

No. A transcript can support the report, but it is not the report. You still need scope, trigger, proof, impact, remediation, and retest language that another engineer can use.

What is the most important field in the sample?

The evidence block. If the evidence does not prove a real tool action, real data exposure, or a real downstream effect, the finding should stay tentative.

Should prompt injection always be the headline category?

No. Prompt injection may be the trigger, but the real finding may sit in tool authorization, approval UX, memory isolation, or downstream validation. Name the failed boundary, not just the opening move.

How many sources should an AI agent report cite?

Cite enough sources to ground your method and terminology, then focus on direct evidence from the test itself. For agent-specific work, OWASP, NIST, MITRE ATLAS, and your own trace artifacts are usually more useful than a long bibliography.

When should a finding stay partial instead of confirmed?

Keep it partial when the dangerous condition is real but the final impact step was blocked by scope limits, environment limits, or a control you did not fully bypass. That distinction protects report quality.

Bottom line

The best AI pentest sample report template for AI agents is the one that makes the agent workflow legible: what was in scope, what triggered the behavior, what the agent actually did, what boundary failed, and how the team should retest the fix. Everything else is secondary.

If you are building a full reporting workflow, read what should an AI pentest report include, then AI pentest evidence checklist for AppSec teams, then AI pentest report template for AI agents.

AI Agent Pentest Report Sample | 0xClaw