Cognisafe scores every LLM request and response against the OWASP LLM Top 10 — the industry-standard taxonomy of security risks in LLM-powered applications. Scoring is asynchronous: it happens entirely off the proxy hot path. Your users see no added latency. Scores appear in the dashboard within seconds of each request. All scores use the 1–5 Likert severity scale.Documentation Index
Fetch the complete documentation index at: https://cognisafeltd.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Coverage table
| OWASP ID | Category | Scorer name | What it detects | Available from |
|---|---|---|---|---|
| LLM01 | Prompt Injection | jailbreak_detection | Attempts to override system instructions, bypass safety guidelines, or manipulate model behaviour via crafted user input | Free |
| LLM02 | Sensitive Information Disclosure | pii_detection | PII in model responses: names, email addresses, phone numbers, SSNs, credit card numbers, home addresses | Free |
| LLM03 | Supply Chain | — | Third-party model and plugin risk (policy controls, coming soon) | Professional |
| LLM04 | Data and Model Poisoning | data_poisoning | Prompts designed to inject poisoned content into RAG pipelines, knowledge bases, or model context | Professional |
| LLM05 | Improper Output Handling | content_safety | Harmful, dangerous, violent, or policy-violating content in model responses | Free |
| LLM06 | Excessive Agency | — | Agentic over-reach detection (coming soon) | Business |
| LLM07 | System Prompt Leakage | pii_detection | System prompt contents leaked in model responses | Professional |
| LLM08 | Vector and Embedding Weaknesses | vector_weakness | Adversarial inputs targeting vector databases, embedding models, or semantic search | Professional |
| LLM09 | Misinformation | — | Factual accuracy scoring (coming soon, requires reference corpus) | Business |
| LLM10 | Unbounded Consumption | unbounded_consumption | Prompts designed to cause excessive token or compute consumption (denial-of-service patterns) | Professional |
Scorer descriptions
content_safety (LLM05)
Checks whether the model’s response contains harmful, dangerous, violent, or policy-violating content. Triggers on: explicit instructions for harm, hate speech, graphic violence, CSAM-adjacent content.
pii_detection (LLM02, LLM07)
Checks whether the model’s response leaks PII. Covers: full names, email addresses, phone numbers, Social Security numbers, credit card numbers, home addresses, passport numbers, and similar sensitive personal data.
jailbreak_detection (LLM01)
Checks whether the prompt attempts to bypass AI safety guidelines or override system instructions. Covers: DAN-style prompts, role-play overrides, instruction injection via user content, indirect prompt injection.
data_poisoning (LLM04)
Checks whether the prompt attempts to inject content designed to corrupt a knowledge base, RAG pipeline, or model context — content intended to influence future model responses rather than elicit an immediate answer.
vector_weakness (LLM08)
Checks whether the prompt appears to exploit weaknesses in vector databases or embedding models — for example, queries crafted to retrieve unintended documents, bypass semantic filters, or manipulate similarity search results.
unbounded_consumption (LLM10)
Checks whether the prompt appears designed to cause excessive resource consumption: extremely long or recursive inputs, content designed to exhaust tokens or API limits, or patterns that trigger maximum-length completions.
Scorer configuration
Scorer definitions live inevals/scorers.yaml. See Custom scorers for information on adding your own.
