Coverage table
| OWASP ID | Category | Scorer name | What it detects | Available from |
|---|---|---|---|---|
| LLM01 | Prompt Injection | jailbreak_detection | Attempts to override system instructions, bypass safety guidelines, or manipulate model behaviour via crafted user input | Free |
| LLM02 | Sensitive Information Disclosure | pii_detection | PII in model responses: names, email addresses, phone numbers, SSNs, credit card numbers, home addresses | Free |
| LLM03 | Supply Chain | — | Third-party model and plugin risk (policy controls, coming soon) | Professional |
| LLM04 | Data and Model Poisoning | data_poisoning | Prompts designed to inject poisoned content into RAG pipelines, knowledge bases, or model context | Professional |
| LLM05 | Improper Output Handling | content_safety | Harmful, dangerous, violent, or policy-violating content in model responses | Free |
| LLM06 | Excessive Agency | — | Agentic over-reach detection (coming soon) | Business |
| LLM07 | System Prompt Leakage | pii_detection | System prompt contents leaked in model responses | Professional |
| LLM08 | Vector and Embedding Weaknesses | vector_weakness | Adversarial inputs targeting vector databases, embedding models, or semantic search | Professional |
| LLM09 | Misinformation | — | Factual accuracy scoring (coming soon, requires reference corpus) | Business |
| LLM10 | Unbounded Consumption | unbounded_consumption | Prompts designed to cause excessive token or compute consumption (denial-of-service patterns) | Professional |
Scorer descriptions
content_safety (LLM05)
Checks whether the model’s response contains harmful, dangerous, violent, or policy-violating content. Triggers on: explicit instructions for harm, hate speech, graphic violence, CSAM-adjacent content.
pii_detection (LLM02, LLM07)
Checks whether the model’s response leaks PII. Covers: full names, email addresses, phone numbers, Social Security numbers, credit card numbers, home addresses, passport numbers, and similar sensitive personal data.
jailbreak_detection (LLM01)
Checks whether the prompt attempts to bypass AI safety guidelines or override system instructions. Covers: DAN-style prompts, role-play overrides, instruction injection via user content, indirect prompt injection.
data_poisoning (LLM04)
Checks whether the prompt attempts to inject content designed to corrupt a knowledge base, RAG pipeline, or model context — content intended to influence future model responses rather than elicit an immediate answer.
vector_weakness (LLM08)
Checks whether the prompt appears to exploit weaknesses in vector databases or embedding models — for example, queries crafted to retrieve unintended documents, bypass semantic filters, or manipulate similarity search results.
unbounded_consumption (LLM10)
Checks whether the prompt appears designed to cause excessive resource consumption: extremely long or recursive inputs, content designed to exhaust tokens or API limits, or patterns that trigger maximum-length completions.
Scorer configuration
Scorer definitions live inevals/scorers.yaml. See Custom scorers for information on adding your own.
