> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognisafe.uk/llms.txt
> Use this file to discover all available pages before exploring further.

# OWASP LLM Top 10 coverage

> How Cognisafe maps to the OWASP LLM Top 10.

Cognisafe scores every LLM request and response against the [OWASP LLM Top 10](https://owasp.org/www-project-top-10-for-large-language-model-applications/) — the industry-standard taxonomy of security risks in LLM-powered applications.

Scoring is **asynchronous**: it happens entirely off the proxy hot path. Your users see no added latency. Scores appear in the dashboard within seconds of each request.

All scores use the [1–5 Likert severity scale](/safety/severity-scale).

## Coverage table

| OWASP ID | Category                         | Scorer name             | What it detects                                                                                                          | Available from |
| -------- | -------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------ | -------------- |
| LLM01    | Prompt Injection                 | `jailbreak_detection`   | Attempts to override system instructions, bypass safety guidelines, or manipulate model behaviour via crafted user input | Free           |
| LLM02    | Sensitive Information Disclosure | `pii_detection`         | PII in model responses: names, email addresses, phone numbers, SSNs, credit card numbers, home addresses                 | Free           |
| LLM03    | Supply Chain                     | —                       | Third-party model and plugin risk (policy controls, coming soon)                                                         | Professional   |
| LLM04    | Data and Model Poisoning         | `data_poisoning`        | Prompts designed to inject poisoned content into RAG pipelines, knowledge bases, or model context                        | Professional   |
| LLM05    | Improper Output Handling         | `content_safety`        | Harmful, dangerous, violent, or policy-violating content in model responses                                              | Free           |
| LLM06    | Excessive Agency                 | —                       | Agentic over-reach detection (coming soon)                                                                               | Business       |
| LLM07    | System Prompt Leakage            | `pii_detection`         | System prompt contents leaked in model responses                                                                         | Professional   |
| LLM08    | Vector and Embedding Weaknesses  | `vector_weakness`       | Adversarial inputs targeting vector databases, embedding models, or semantic search                                      | Professional   |
| LLM09    | Misinformation                   | —                       | Factual accuracy scoring (coming soon, requires reference corpus)                                                        | Business       |
| LLM10    | Unbounded Consumption            | `unbounded_consumption` | Prompts designed to cause excessive token or compute consumption (denial-of-service patterns)                            | Professional   |

## Scorer descriptions

### `content_safety` (LLM05)

Checks whether the model's **response** contains harmful, dangerous, violent, or policy-violating content. Triggers on: explicit instructions for harm, hate speech, graphic violence, CSAM-adjacent content.

### `pii_detection` (LLM02, LLM07)

Checks whether the model's **response** leaks PII. Covers: full names, email addresses, phone numbers, Social Security numbers, credit card numbers, home addresses, passport numbers, and similar sensitive personal data.

### `jailbreak_detection` (LLM01)

Checks whether the **prompt** attempts to bypass AI safety guidelines or override system instructions. Covers: DAN-style prompts, role-play overrides, instruction injection via user content, indirect prompt injection.

### `data_poisoning` (LLM04)

Checks whether the **prompt** attempts to inject content designed to corrupt a knowledge base, RAG pipeline, or model context — content intended to influence future model responses rather than elicit an immediate answer.

### `vector_weakness` (LLM08)

Checks whether the **prompt** appears to exploit weaknesses in vector databases or embedding models — for example, queries crafted to retrieve unintended documents, bypass semantic filters, or manipulate similarity search results.

### `unbounded_consumption` (LLM10)

Checks whether the **prompt** appears designed to cause excessive resource consumption: extremely long or recursive inputs, content designed to exhaust tokens or API limits, or patterns that trigger maximum-length completions.

## Scorer configuration

Scorer definitions live in `evals/scorers.yaml`. See [Custom scorers](/safety/custom-scorers) for information on adding your own.
