Configuration

AIProxyGuard uses a YAML configuration file. Environment variables are supported with ${VAR} or ${VAR:-default} syntax.

Minimal Config

server:
  port: 8080

upstreams:
  openai:
    url: "https://api.openai.com"

Full Reference

# Server settings
server:
  host: "0.0.0.0"           # Bind address
  port: 8080                 # Listen port
  workers: 2                 # Number of workers (currently unused, reserved)

# Upstream LLM providers
upstreams:
  openai:
    url: "https://api.openai.com"
    timeout: 60s                      # Request timeout
    auth_header: "Authorization"      # Header containing API key
  anthropic:
    url: "https://api.anthropic.com"
    timeout: 60s
    auth_header: "x-api-key"
  openrouter:
    url: "https://openrouter.ai/api"
    timeout: 120s
    auth_header: "Authorization"
  ollama:
    url: "http://localhost:11434"
    timeout: 300s
    auth_header: null                 # Ollama doesn't require auth

# Scanner settings
scanner:
  enabled: true              # Master switch for all scanning
  regex: true                # Enable regex pattern matching
  heuristics: true           # Enable heuristic detection (base64, encoding, etc.)
  ml_classifier: false       # ML classifier (not yet implemented)
  response:                  # Response scanning settings
    enabled: false           # Scan responses for sensitive data
    mode: "buffered"         # "passthrough", "buffered", or "full"
    buffer_size: 1024        # Chars to buffer before scanning (buffered mode)
    categories: []           # Categories to scan for (empty = all)

# Policy engine
policy:
  default_action: "block"    # Default action: "allow", "log", "warn", "block"
  categories:
    prompt-injection:
      action: "block"
      threshold: 0.8         # Confidence threshold (0.0-1.0)
    jailbreak:
      action: "block"
      threshold: 0.7
    encoding-evasion:
      action: "warn"
      threshold: 0.6
  allowlists:                # Bypass scanning for specific clients
    - client_id: "internal-service-*"
      categories: ["prompt-injection"]

# Signature location
signatures:
  path: "/app/signatures"    # Path to signature YAML files

# Security settings
security:
  failure_mode: "open"       # "open" = allow on error, "closed" = block on error
  scanner_timeout_ms: 100    # Max scanner execution time before timeout
  upstream_timeout_s: 60     # Upstream request timeout
  max_request_size: 10485760   # 10 MB max request body
  max_response_size: 52428800  # 50 MB max response body
  expose_details: false      # Never expose signature patterns to clients

# Prometheus metrics
metrics:
  enabled: true
  path: "/metrics"

# Structured logging
logging:
  level: "info"              # "debug", "info", "warning", "error"
  format: "json"             # "json" or "text"
  redact_keys: true          # Redact API keys in logs

# Client identity resolution
identity:
  method: "ip"               # "ip", "header", "token", "mtls"
  header_name: "X-Client-ID" # Header to extract client ID from
  fallback_header: null      # Fallback header if primary is missing
  trust_xff: false           # Trust X-Forwarded-For for IP resolution
  hash_token: true           # Hash tokens for privacy

# Control plane - fleet registration and management
control_plane:
  enabled: false                                              # Enable fleet registration
  url: "${AIPROXYGUARD_CONTROL_PLANE_URL:-https://aiproxyguard.com}"
  api_key: "${AIPROXYGUARD_CONTROL_PLANE_API_KEY}"           # Required when enabled
  heartbeat_interval: 60                                      # Seconds between heartbeats
  sync_signatures: true                                       # Auto-sync signatures from control plane
  report_telemetry: true                                      # Report detection metrics
  report_usage: true                                          # Report provider-billed token usage (cost analytics)

# Cost optimization (opt-in; off by default). See the Cost Optimization guide.
# Usually managed from the control plane (cloud Optimization page).
cost_optimization:
  anthropic_prompt_cache: false   # Inject Anthropic cache_control on /v1/messages system prompts
  response_cache: false           # Per-policy opt-in for the exact-match response cache (needs `cache` below)
  response_cache_routes: []       # Optional fnmatch route allowlist (empty = all eligible routes)

# Smart model routing (#305). Route to a cheaper same-provider model.
routing:
  tasks: {}                       # task -> {ordered_pool: [...cheapest first], fallback: [...]}
  downgrades: []                  # [{provider, from, to}] complexity-scored downgrades
  dry_run: true                   # observe-only (emit headers, don't rewrite) until set false

# Exact-match response cache (#307). Requires Redis; off by default.
cache:
  enabled: false                                       # Master switch (also needs redis_url)
  redis_url: "${AIPROXYGUARD_CACHE_REDIS_URL:-}"        # rediss://… (TLS recommended)
  ttl_seconds: 3600                                    # Capped at 1h
  namespace: ""                                        # Empty = hash of the control-plane API key (tenant isolation)

# TLS interception (optional, advanced)
tls:
  enabled: false
  ca_cert: "/etc/aiproxyguard/ca.crt"
  ca_key: "/etc/aiproxyguard/ca.key"
  cert_cache_size: 1000
  cert_validity_days: 30

Environment Variables

Use ${VAR} or ${VAR:-default} syntax:

upstreams:
  openai:
    url: "${OPENAI_BASE_URL:-https://api.openai.com}"

control_plane:
  api_key: "${CONTROL_PLANE_API_KEY}"

Policy Actions

Action	Behavior
`allow`	Pass through without scanning
`log`	Scan and log detections, allow request
`warn`	Scan and log detections with warning, allow request
`block`	Scan and block if detection confidence >= threshold

Threshold vs Sensitivity

You can configure detection strictness using either threshold or sensitivity:

Threshold (Technical)

The threshold parameter sets the minimum confidence score required to trigger an action. Lower threshold = more strict (catches more).

policy:
  categories:
    prompt-injection:
      action: "block"
      threshold: 0.8  # Only block if confidence >= 80%

Sensitivity (Intuitive)

The sensitivity parameter is an intuitive alternative where higher values = more strict. Internally converted to threshold via threshold = 1 - sensitivity.

policy:
  categories:
    prompt-injection:
      action: "block"
      sensitivity: 0.9  # High sensitivity = catch more attacks (threshold = 0.1)

Sensitivity	Threshold	Behavior
1.0	0.0	Block everything detected (most strict)
0.9	0.1	Very aggressive - catch almost everything
0.7	0.3	Aggressive - good for high-security
0.5	0.5	Balanced (default)
0.3	0.7	Conservative - fewer false positives
0.0	1.0	Only 100% confidence detections (least strict)

When both are provided, sensitivity takes precedence.

Cloud Policy Sync: When connected to the control plane, you can configure sensitivity per-category in the cloud portal under Policies > Detection Rules. Changes sync to all fleet instances.

Failure Modes

Mode	Behavior
`open`	On scanner error/timeout, allow the request
`closed`	On scanner error/timeout, block the request

Use open for availability-focused deployments, closed for security-focused.

Response Scanning Modes

Mode	Behavior
`passthrough`	Forward response chunks immediately, scan asynchronously
`buffered`	Buffer N chars before scanning, then stream
`full`	Buffer entire response, scan, then return

Control Plane (Fleet Registration)

The control plane enables centralized fleet management, automatic signature updates, and telemetry reporting.

Enabling Fleet Registration

Option 1: Environment Variables (Recommended for Docker)

docker run -d -p 8080:8080 \
  -e AIPROXYGUARD_CONTROL_PLANE_ENABLED=true \
  -e AIPROXYGUARD_CONTROL_PLANE_URL=https://aiproxyguard.com \
  -e AIPROXYGUARD_CONTROL_PLANE_API_KEY=your-api-key-here \
  ainvirion/aiproxyguard:latest

Option 2: Config File

control_plane:
  enabled: true
  url: "https://aiproxyguard.com"
  api_key: "your-api-key-here"
  heartbeat_interval: 60
  sync_signatures: true
  report_telemetry: true

Control Plane Settings

Setting	Description	Default
`enabled`	Enable fleet registration	`false`
`url`	Control plane API URL	`https://aiproxyguard.com`
`api_key`	Your API key (required when enabled)	-
`heartbeat_interval`	Seconds between heartbeats	`60`
`sync_signatures`	Auto-download new signatures	`true`
`report_telemetry`	Report detection events	`true`

What Happens When Enabled

Registration: On startup, the proxy registers with the fleet, sending instance metadata (hostname, OS, version)
Heartbeats: Periodic heartbeats report status and check for updates
Signature Sync: New detection signatures are automatically downloaded and hot-reloaded
Policy Sync: Policy changes from the control plane are applied without restart
Telemetry: Detection events (counts, categories) are reported for analytics

Getting an API Key

API keys are free to create:

Sign up free at aiproxyguard.com
Navigate to Settings → API Keys
Click Create API Key
Enable the fleet scope for control plane features (signature sync, telemetry)
Copy your API key (starts with apg_)

Updating or Rotating API Keys

If your API key is revoked, expired, or you need to rotate it, you must update the configuration and restart the proxy.

What happens when the API key is invalid:

The proxy detects 401/403 errors and stops retrying:

{"level": "error", "message": "API key invalid or revoked. Control plane features disabled. Update your API key in the config and restart the proxy."}
{"level": "info", "message": "Heartbeat loop stopped due to invalid API key. Proxy continues in offline mode."}

The proxy continues running in offline mode with:

Bundled free-tier signatures
Bundled free-tier ML model
Local configuration (no cloud sync)

To update the API key:

Option 1: Environment Variable (Docker)

# Update the environment variable and restart
docker stop aiproxyguard
docker run -d --name aiproxyguard -p 8080:8080 \
  -e AIPROXYGUARD_CONTROL_PLANE_API_KEY=your-new-api-key \
  ainvirion/aiproxyguard:latest

# Or with docker-compose
docker-compose down
# Edit .env or docker-compose.yml with new key
docker-compose up -d

Option 2: Config File (Volume Mount)

# 1. Edit the mounted config file
vim /path/to/config.yaml
# Update: api_key: "your-new-api-key"

# 2. Restart the container
docker restart aiproxyguard

Option 3: Kubernetes

# Update the secret
kubectl create secret generic aiproxyguard-secrets \
  --from-literal=api-key=your-new-api-key \
  --dry-run=client -o yaml | kubectl apply -f -

# Restart the deployment
kubectl rollout restart deployment/aiproxyguard

Note: A restart is required because the API key is loaded at startup. Hot-reload of API keys may be added in a future version.

Cloud Policies (Control Plane)

When connected to the control plane, policies are managed centrally and synced to all fleet instances. Cloud policies override local policy: configuration.

Detection Thresholds

Each detection category has a configurable threshold (0.0-1.0) that controls sensitivity:

Threshold	Behavior	Use Case
0.3	Aggressive - catches more attacks, higher false positive risk	High-security environments
0.5	Balanced - good accuracy with minimal false positives	Most deployments (default)
0.7	Conservative - prioritizes avoiding false positives	User-facing applications
0.9	Very conservative - only high-confidence detections	When false positives are unacceptable

Default Cloud Policy Thresholds

Category	Default Threshold	Action
`prompt-injection`	0.5	block
`jailbreak`	0.5	block
`pii`	0.5	warn
`data-exfil`	0.5	block
`harmful-content`	0.5	block
`encoding-bypass`	0.7	block
`delimiter-injection`	0.7	block
`indirect-injection`	0.7	block
`unicode-evasion`	0.7	block
`role-manipulation`	0.7	block

Tuning for Your Use Case

High Recall (catch more attacks):

Lower thresholds to 0.3-0.4
Accept some false positives
Good for internal tools, security-critical apps

High Precision (minimize false positives):

Keep thresholds at 0.5-0.7
Some attacks may pass through
Good for user-facing chatbots, customer support

Balanced:

Use defaults (0.5 for common attacks, 0.7 for evasion techniques)
Monitor metrics and adjust per-category as needed

Modifying Thresholds

Thresholds are configured in the cloud portal under Policies > Detection Rules. Changes sync to all fleet instances within 60 seconds.

Cost Optimization

AIProxyGuard can reduce LLM token spend on traffic routed through the proxy (forward-proxy mode). All features are off by default and are usually managed from the control plane. See the full Cost Optimization guide for details and savings reporting.

Section / key	Purpose	Default
`cost_optimization.anthropic_prompt_cache`	Inject Anthropic `cache_control` for cached-prefix discount	`false`
`routing.tasks` / `routing.downgrades` / `routing.dry_run`	Route/downgrade to a cheaper same-provider model	empty / dry-run
`cache.*`	Redis-backed exact-match response cache (`enabled`, `redis_url`, `ttl_seconds`, `namespace`)	disabled
`cost_optimization.response_cache` / `response_cache_routes`	Per-policy (and per-route) opt-in for the response cache	`false` / all routes
`control_plane.report_usage`	Report billed-token usage for cost analytics	`true`

These apply only to requests forwarded through the proxy. The detection-only /check endpoint (and SDK .check() calls) never make the LLM call, so they get security scanning but no cost savings.

Rate Limiting (DDoS Protection)

AIProxyGuard includes an iptables-based rate limiting script for DDoS protection. This runs at the host level and protects the proxy from excessive requests.

Enabling Rate Limiting

The rate limiting script is located at deploy/rate-limit.sh. It uses Linux iptables with the hashlimit module for per-IP rate limiting.

Requirements:

Linux host with iptables
Root/sudo access
Docker (uses DOCKER-USER chain for compatibility)

Environment Variables

Variable	Default	Description
`RATE_LIMIT_ENABLED`	`false`	Enable rate limiting
`RATE_LIMIT_PORT`	`8080`	Port to protect
`RATE_LIMIT_RATE`	`100/minute`	Requests per interval (e.g., `100/minute`, `10/second`)
`RATE_LIMIT_BURST`	`50`	Burst allowance before limiting kicks in
`RATE_LIMIT_CONN`	`100`	Max concurrent connections per IP
`RATE_LIMIT_WHITELIST`	``	Comma-separated IPs to exclude (e.g., `10.0.0.1,192.168.1.0/24`)
`RATE_LIMIT_BLOCKLIST`	``	Comma-separated IPs to always block

Usage

Option 1: Run directly on host

# Enable rate limiting
sudo RATE_LIMIT_ENABLED=true \
     RATE_LIMIT_PORT=8080 \
     RATE_LIMIT_RATE=100/minute \
     RATE_LIMIT_BURST=50 \
     ./deploy/rate-limit.sh

# Disable rate limiting
sudo RATE_LIMIT_ENABLED=false ./deploy/rate-limit.sh

Option 2: Docker entrypoint (privileged mode)

docker run -d --name aiproxyguard \
  --privileged \
  --cap-add=NET_ADMIN \
  -p 8080:8080 \
  -e RATE_LIMIT_ENABLED=true \
  -e RATE_LIMIT_RATE=100/minute \
  -e RATE_LIMIT_BURST=50 \
  -e RATE_LIMIT_WHITELIST=10.0.0.0/8 \
  ainvirion/aiproxyguard:latest

Option 3: Systemd service

# /etc/systemd/system/aiproxyguard-ratelimit.service
[Unit]
Description=AIProxyGuard Rate Limiting
After=docker.service

[Service]
Type=oneshot
Environment="RATE_LIMIT_ENABLED=true"
Environment="RATE_LIMIT_PORT=8080"
Environment="RATE_LIMIT_RATE=100/minute"
ExecStart=/opt/aiproxyguard/deploy/rate-limit.sh
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Rate Limit Tuning

Use Case	Rate	Burst	Conn
Public API	`30/minute`	`10`	`20`
Internal service	`500/minute`	`100`	`200`
High-traffic app	`1000/minute`	`200`	`500`
Development	`100/minute`	`50`	`100`

Viewing Active Rules

# List all rules in DOCKER-USER chain
sudo iptables -L DOCKER-USER -n -v

# List hashlimit stats
cat /proc/net/ipt_hashlimit/aiproxyguard_*

Clearing Rules

sudo RATE_LIMIT_ENABLED=false ./deploy/rate-limit.sh

Docker Volume Mounts

docker run -d -p 8080:8080 \
  -v $(pwd)/config.yaml:/etc/aiproxyguard/config.yaml \
  -v $(pwd)/signatures:/app/signatures \
  ainvirion/aiproxyguard:latest

Example Configs

Minimal (OpenAI only)

server:
  port: 8080
upstreams:
  openai:
    url: "https://api.openai.com"
scanner:
  enabled: true

Production (Multiple providers, strict policy)

server:
  host: "0.0.0.0"
  port: 8080

upstreams:
  openai:
    url: "https://api.openai.com"
    auth_header: "Authorization"
  anthropic:
    url: "https://api.anthropic.com"
    auth_header: "x-api-key"

scanner:
  enabled: true
  regex: true
  heuristics: true
  response:
    enabled: true
    mode: "buffered"

policy:
  default_action: "block"
  categories:
    prompt-injection:
      action: "block"
      threshold: 0.7
    jailbreak:
      action: "block"
      threshold: 0.7

security:
  failure_mode: "closed"
  scanner_timeout_ms: 50
  max_request_size: 1048576  # 1 MB

metrics:
  enabled: true

logging:
  level: "info"
  format: "json"
  redact_keys: true

Local Development (Ollama)

server:
  port: 8080

upstreams:
  ollama:
    url: "http://localhost:11434"
    timeout: 300s

scanner:
  enabled: true
  regex: true
  heuristics: true

policy:
  default_action: "warn"  # Log but don't block during development

logging:
  level: "debug"
  format: "text"