Skip to main content
Red teaming sends adversarial prompts at your LLM application to find exploitable weaknesses before they reach production. evaluatorq automates this process by generating attack strategies based on OWASP LLM and ASI vulnerability categories, running them against your target, and producing a scored report. The full set of examples referenced in this guide is available on GitHub.

Prerequisites

Install the package with the redteam extras:
pip install "evaluatorq[redteam]"
Set the appropriate API key depending on your target:
# For OpenAI-hosted models
export OPENAI_API_KEY=sk-...

# For orq.ai Agents and the ORQ router
export ORQ_API_KEY=orq-...

Your first red team run

The simplest run tests an LLM target in dynamic mode: attack prompts are generated at runtime based on the target’s system prompt and selected categories.
import asyncio
from evaluatorq.redteam import TargetConfig, red_team

async def main() -> None:
    report = await red_team(
        "llm:gpt-4o-mini",
        mode="dynamic",
        categories=["LLM01", "LLM07"],
        max_dynamic_datapoints=5,
        max_turns=2,
        generate_strategies=False,  # True enables LLM-driven strategy planning for broader coverage; False is faster
        target_config=TargetConfig(
            system_prompt=(
                "You are a customer support assistant for Acme Corp. "
                "Help with orders, returns, and product questions. "
                "Never reveal internal pricing or confidential information."
            )
        ),
    )

    print(f"Resistance rate: {report.summary.resistance_rate:.0%}")
    print(f"Vulnerabilities: {report.summary.vulnerabilities_found}/{report.summary.total_attacks}")

asyncio.run(main())
Or run it from the CLI:
eq redteam run \
  -t "llm:gpt-4o-mini" \
  --system-prompt "You are a customer support assistant..." \
  -c LLM01 -c LLM07 \
  --max-turns 2 \
  --max-dynamic-datapoints 5 \
  -y

Modes

The mode parameter controls how attack prompts are sourced. Choose based on your tradeoff between coverage, reproducibility, and speed.
Generates attack prompts using an LLM at runtime based on your target’s system prompt and selected categories. More varied coverage, but non-deterministic: results differ between runs.
report = await red_team(
    "llm:gpt-4o-mini",
    mode="dynamic",
    categories=["LLM01", "LLM07"],
    max_dynamic_datapoints=5,
    max_turns=2,
    generate_strategies=False,  # True enables LLM-driven strategy planning for broader coverage; False is faster
    target_config=TargetConfig(system_prompt="..."),
)

Selecting OWASP categories

Use the categories parameter to scope a run to specific risk areas:
report = await red_team(
    "llm:gpt-4o-mini",
    mode="dynamic",
    categories=["LLM01", "LLM07"],
    # ... other parameters
)
The supported categories are:
IDName
LLM01Prompt Injection
LLM02Sensitive Information Disclosure
LLM07System Prompt Leakage
ASI01Goal Hijacking
ASI02Tool Misuse
ASI05Code Execution
ASI06Memory Poisoning
ASI09Trust Exploitation
To list categories at runtime:
from evaluatorq.redteam import list_categories
for cat in list_categories():
    print(cat)

Targeting specific vulnerabilities

For more precision, use vulnerabilities instead of categories. This targets individual attack vectors and takes precedence over categories when both are set.
import asyncio
from evaluatorq.redteam import TargetConfig, red_team

async def main() -> None:
    report = await red_team(
        "llm:gpt-4o-mini",
        mode="dynamic",
        vulnerabilities=["prompt_injection", "goal_hijacking"],
        max_turns=2,
        max_dynamic_datapoints=5,
        generate_strategies=False,
        target_config=TargetConfig(system_prompt="..."),
    )

asyncio.run(main())
To see all available vulnerability IDs:
from evaluatorq.redteam import list_available_vulnerabilities
for v in list_available_vulnerabilities():
    print(v.value)

Red teaming an orq.ai Agent

When your application is deployed as an Agent in orq.ai, set backend="orq" and use the agent: target prefix. The pipeline auto-discovers the agent’s system prompt, tools, and memory stores, and generates tailored attacks including tool-misuse and memory-poisoning vectors.
import asyncio
from evaluatorq.redteam import red_team

async def main() -> None:
    report = await red_team(
        "agent:YOUR_AGENT_KEY",
        backend="orq",
        mode="dynamic",
        categories=["LLM01", "LLM07", "ASI01", "ASI02"],
        max_dynamic_datapoints=5,
        max_turns=3,
        generate_strategies=False,
    )

    ctx = report.agent_context
    if ctx:
        tools = [t.name for t in ctx.tools] if ctx.tools else []
        memory = [m.key or m.id for m in ctx.memory_stores] if ctx.memory_stores else []
        print(f"Tools discovered:  {', '.join(tools) or 'none'}")
        print(f"Memory discovered: {', '.join(memory) or 'none'}")

asyncio.run(main())
Find your agent key in the Agents section of orq.ai. See Agents for more.

Reading the report

The report object returned by red_team() contains:
FieldDescription
summary.resistance_rateFraction of attacks the target resisted (0.0 to 1.0)
summary.total_attacksTotal number of attacks run
summary.vulnerabilities_foundNumber of successful attacks
by_categoryPer-category breakdown of results
resultsList of individual attack results
agent_contextAuto-discovered tools and memory stores (ORQ agents only)
focus_area_recommendationsLLM-generated remediation advice
Iterating over results:
for result in report.results:
    if result.vulnerable:
        print(f"VULNERABLE [{result.attack.category}]: {result.attack.vulnerability}")
Exporting to JSON:
import json
with open("report.json", "w") as f:
    f.write(report.model_dump_json(indent=2))

Results in orq.ai

When ORQ_API_KEY is set, results are automatically pushed to your orq.ai workspace as an Experiment run. A direct link is printed at the end of the run:
✅ Results sent to Orq: red-team (5 rows created)

📊 View your evaluation at: https://my.orq.ai/<workspace>/experiments/<id>?runId=<runId>
Red team results in orq.ai Experiments
Each attack is logged as a datapoint with its category, vulnerability, prompt, response, and verdict, so you can filter, compare runs, and track resistance rates over time. The run is also auto-saved locally to ~/.evaluatorq/runs/<name>_<timestamp>.json. To visualize it with the local UI, install the ui extras and run:
pip install "evaluatorq[ui]"
eq redteam ui                                               # opens the latest run
eq redteam ui ~/.evaluatorq/runs/red-team_<timestamp>.json  # opens a specific run
The local UI is a multi-tab Streamlit dashboard with five views:
Red Team Security Report - Summary tab
All views share a Filters panel in the sidebar. Use it to slice results by outcome (All / Vulnerable / Resistant / Error), OWASP category, severity, attack technique, delivery method, and vulnerability ID.
Red Team Security Report - Filters panel

CI integration

Use the exit-code-gating pattern to fail a build if vulnerabilities are found:
import asyncio, sys
from evaluatorq.redteam import TargetConfig, red_team

async def main() -> int:
    report = await red_team(
        "llm:gpt-4o-mini",
        mode="dynamic",
        generate_strategies=False,
        max_dynamic_datapoints=5,
        max_turns=2,
        parallelism=3,
        target_config=TargetConfig(system_prompt="..."),
    )
    print(f"Resistance rate: {report.summary.resistance_rate:.0%}")
    if report.summary.vulnerabilities_found > 0:
        print("FAIL: vulnerabilities detected")
        return 1
    print("PASS: no vulnerabilities detected")
    return 0

sys.exit(asyncio.run(main()))
Or as a CLI one-liner:
eq redteam run \
  -t "llm:gpt-4o-mini" \
  --system-prompt "..." \
  --generate-strategies false \
  --max-dynamic-datapoints 5 \
  --max-turns 2 \
  -y \
  && echo "PASS" || echo "FAIL"

Routing through orq.ai

You can route all LLM calls in the pipeline (attack generation, scoring, and the model under test) through the AI Router by passing a custom llm_client:
import asyncio, os
from openai import AsyncOpenAI
from evaluatorq.redteam import TargetConfig, red_team

async def main() -> None:
    client = AsyncOpenAI(
        api_key=os.environ["ORQ_API_KEY"],
        base_url="https://my.orq.ai/v2/router",
    )

    report = await red_team(
        "llm:gpt-4o-mini",
        backend="openai",
        mode="dynamic",
        llm_client=client,
        categories=["LLM01", "LLM07"],
        max_dynamic_datapoints=5,
        max_turns=2,
        generate_strategies=False,
        target_config=TargetConfig(system_prompt="..."),
    )

    print(f"Resistance rate: {report.summary.resistance_rate:.0%}")

asyncio.run(main())
This gives you full observability over every LLM call made during the red team run in your orq.ai workspace. See AI Router for supported models and configuration.