Ollama (self-hosted)

How it works

Ollama exposes an OpenAI-compatible HTTP API on port 11434. Because the Cognisafe proxy speaks the same protocol, you can observe all Ollama calls by pointing the proxy’s UPSTREAM_URL at your Ollama instance — no changes to Ollama itself. This is particularly useful in air-gapped environments: all traffic stays on your internal network, and Cognisafe’s safety scoring (if using local scoring) never leaves your infrastructure.

Proxy configuration

Set UPSTREAM_URL on the Cognisafe proxy to your Ollama instance:

# Proxy service env vars
UPSTREAM_URL=http://localhost:11434

If Ollama runs on a different host:

UPSTREAM_URL=http://ollama-host.internal:11434

SDK setup

Use patch_openai() — the OpenAI client speaks the same protocol as Ollama’s API:

import cognisafe
from openai import OpenAI

cognisafe.configure(
    api_key="csk_your_key_here",
    project_id="my-app",
    proxy_url="http://localhost:8080",  # Cognisafe proxy
)
cognisafe.patch_openai()

# No real OpenAI API key needed — Ollama ignores it
client = OpenAI(api_key="ollama")

response = client.chat.completions.create(
    model="llama3.2",   # any model you have pulled in Ollama
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
)

print(response.choices[0].message.content)

Air-gapped safety scoring

By default, Cognisafe’s safety worker uses gpt-4o-mini (via OPENAI_API_KEY) to score requests. In an air-gapped environment, you have two options:

Disable scoring: if OPENAI_API_KEY is not set, the worker falls back gracefully with score_label: "unscored". Requests are still logged and cost/latency data is captured.

Use a local scoring model: configure an Ollama-backed scoring model by pointing the safety worker at a local OpenAI-compatible endpoint:

# safety_worker env vars
OPENAI_API_KEY=ollama        # any non-empty value
SCORER_MODEL=llama3.2        # local model name
# Override the OpenAI base URL used by PyRIT:
OPENAI_BASE_URL=http://localhost:11434/v1

Models like llama3.2 or mistral pulled into Ollama work well as scoring models for content_safety and pii_detection. For jailbreak_detection, larger models (70B+) produce more reliable results.

Supported Ollama models

Any model available in the Ollama library works — Cognisafe does not constrain the model field. The proxy passes model through to Ollama unchanged.

# Pull a model before use
ollama pull llama3.2
ollama pull mistral
ollama pull phi4

Getting Started

SDKs

LLM Providers

Self-hosting

Safety & Scoring

Ollama (self-hosted)

How it works

Proxy configuration

SDK setup

Air-gapped safety scoring

Supported Ollama models

Getting Started

SDKs

LLM Providers

Self-hosting

Safety & Scoring

Documentation Index

​How it works

​Proxy configuration

​SDK setup

​Air-gapped safety scoring

​Supported Ollama models

How it works

Proxy configuration

SDK setup

Air-gapped safety scoring

Supported Ollama models