Documentation Index
Fetch the complete documentation index at: https://docs.knowledgestack.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
LiteLLM acts as a centralized LLM and embedding gateway for Knowledge Stack. All AI-powered features route through LiteLLM, which provides per-tenant API key management, spend tracking, and model aliasing.
For an overview of what LiteLLM does and why it is used, see LLM Gateway (LiteLLM).
Architecture
All LLM and embedding traffic from the API, Worker, and Agent services routes through the LiteLLM proxy:
+----------+ +----------+ +----------+
| API | | Worker | | Agent |
+----+-----+ +----+-----+ +----+-----+
| | |
+-------------+--------------+
|
+------v------+
| LiteLLM | :4000
| Proxy |
+------+------+
|
+--------+
v v
OpenAI PostgreSQL
Prerequisites
LiteLLM requires its own PostgreSQL database:
Deployment Steps
Create a secrets configuration with the following values:
| Secret | Description |
|---|
master-key | LiteLLM admin master key (you generate this) |
db-host | PostgreSQL host |
db-username | Database username |
db-password | Database password |
LLM_PROVIDER_OPENAI_API_KEY | Your OpenAI API key (or other provider key) |
Set up model aliases that map logical names to provider models:
| Logical Model | Description |
|---|
general-purpose | General-purpose LLM tasks |
ingestion-chunk-enrichment | Document processing enrichment |
text-embedding-3-small | Vector embeddings |
agent-general-purpose | AI assistant conversations |
3. Deploy LiteLLM
Docker Compose
Add LiteLLM to your Docker Compose stack:
litellm:
image: ghcr.io/berriai/litellm-database:latest
ports:
- "4000:4000"
environment:
- LITELLM_MASTER_KEY=your-master-key
- DATABASE_URL=postgresql://user:pass@postgres:5432/litellm
depends_on:
- postgres
Kubernetes (Helm)
For Kubernetes deployments, use the official litellm-helm chart. Configure model routing in your Helm values file, with API keys referenced from environment variables resolved at runtime.
Point your Knowledge Stack services to the LiteLLM proxy:
| Service | Environment Variable | Value |
|---|
| All services | LITELLM_PROXY_URL | http://litellm:4000 |
| API | GP_LLM_API_URL | http://litellm:4000/v1 |
| API | GP_LLM_MODEL | openai/general-purpose |
| API | EMBEDDING_API_BASE_URL | http://litellm:4000/v1 |
| Worker | EMBEDDING_API_BASE_URL | http://litellm:4000/v1 |
| Worker | ENRICHMENT_LLM_API_BASE_URL | http://litellm:4000/v1 |
| Worker | ENRICHMENT_MODEL | openai/ingestion-chunk-enrichment |
| Agent | GP_LLM_API_URL | http://litellm:4000/v1 |
| Agent | GP_LLM_MODEL | openai/agent-general-purpose |
5. Verify the Deployment
Check that LiteLLM is running and healthy:
# Docker
curl http://localhost:4000/health/liveliness
# Kubernetes
kubectl exec -n your-namespace deploy/litellm -- \
python -c "import urllib.request; print(urllib.request.urlopen('http://localhost:4000/health/liveliness').read())"
Upgrading
To upgrade LiteLLM, update the image version and redeploy. The upgrade is idempotent — re-running the deployment with the same or new configuration is safe.
If you update secrets, redeploy both LiteLLM and the dependent services.
Important Notes
- The LiteLLM master key must match the
LITELLM_MASTER_KEY configured on the API service
- LiteLLM listens on port 4000 and should only be accessible within your cluster (no public ingress)
- Only the API service provisions teams and keys; Worker and Agent services use per-tenant keys at runtime