How it works
Ollama exposes an OpenAI-compatible HTTP API on port 11434. Because the Cognisafe proxy speaks the same protocol, you can observe all Ollama calls by pointing the proxy’s UPSTREAM_URL at your Ollama instance — no changes to Ollama itself.
This is particularly useful in air-gapped environments: all traffic stays on your internal network, and Cognisafe’s safety scoring (if using local scoring) never leaves your infrastructure.
Proxy configuration
Set UPSTREAM_URL on the Cognisafe proxy to your Ollama instance:
# Proxy service env vars
UPSTREAM_URL=http://localhost:11434
If Ollama runs on a different host:
UPSTREAM_URL=http://ollama-host.internal:11434
SDK setup
Use patch_openai() — the OpenAI client speaks the same protocol as Ollama’s API:
import cognisafe
from openai import OpenAI
cognisafe.configure(
api_key="csk_your_key_here",
project_id="my-app",
proxy_url="http://localhost:8080", # Cognisafe proxy
)
cognisafe.patch_openai()
# No real OpenAI API key needed — Ollama ignores it
client = OpenAI(api_key="ollama")
response = client.chat.completions.create(
model="llama3.2", # any model you have pulled in Ollama
messages=[{"role": "user", "content": "Why is the sky blue?"}],
)
print(response.choices[0].message.content)
Air-gapped safety scoring
By default, Cognisafe’s safety worker uses gpt-4o-mini (via OPENAI_API_KEY) to score requests. In an air-gapped environment, you have two options:
-
Disable scoring: if
OPENAI_API_KEY is not set, the worker falls back gracefully with score_label: "unscored". Requests are still logged and cost/latency data is captured.
-
Use a local scoring model: configure an Ollama-backed scoring model by pointing the safety worker at a local OpenAI-compatible endpoint:
# safety_worker env vars
OPENAI_API_KEY=ollama # any non-empty value
SCORER_MODEL=llama3.2 # local model name
# Override the OpenAI base URL used by PyRIT:
OPENAI_BASE_URL=http://localhost:11434/v1
Models like llama3.2 or mistral pulled into Ollama work well as scoring models for content_safety and pii_detection. For jailbreak_detection, larger models (70B+) produce more reliable results.
Supported Ollama models
Any model available in the Ollama library works — Cognisafe does not constrain the model field. The proxy passes model through to Ollama unchanged.
# Pull a model before use
ollama pull llama3.2
ollama pull mistral
ollama pull phi4