LLM Gateway (LiteLLM)

Overview

Knowledge Stack routes all LLM and embedding requests through a LiteLLM proxy. This gateway provides three key capabilities:

Per-tenant cost tracking — Every LLM request is attributed to the tenant that triggered it, giving you full visibility into AI usage costs
Budget enforcement — You can set spending limits per tenant to control costs
Model routing — Services reference logical model names (like general-purpose or ingestion-chunk-enrichment) that map to specific provider models in your configuration

How It Works

All AI-powered features in Knowledge Stack — document ingestion, the AI assistant, embeddings, and general-purpose LLM calls — route through the LiteLLM proxy instead of calling LLM providers directly.

API / Worker / Agent
        |
   LiteLLM Proxy
        |
   LLM Provider (e.g., OpenAI)

Per-Tenant Virtual Keys

When a new tenant is created, Knowledge Stack automatically provisions:

A LiteLLM team mapped to the tenant
An ingestion key for document processing (no budget limit)
An agent key for AI assistant usage (configurable budget, default $5)

This separation ensures that ingestion workloads do not consume the agent budget, and each tenant’s spending is tracked independently.

Model Name Mapping

Your services use logical model names that are mapped to actual provider models in the LiteLLM configuration:

Logical Model Name	Typical Use
`general-purpose`	General LLM tasks in the API
`ingestion-chunk-enrichment`	Document processing enrichment
`text-embedding-3-small`	Vector embeddings
`agent-general-purpose`	AI assistant conversations

You can change the underlying provider model without modifying any application code — just update the LiteLLM configuration.

Tenant Usage and Quotas

You can monitor tenant LLM usage through the tenant API:

GET /v1/tenants/{tenant_id}/usage

This returns current spending against the tenant’s budget limits.

Self-Hosted Configuration

If you are self-hosting Knowledge Stack, you need to deploy a LiteLLM proxy instance alongside your other services.

Environment Variables

Configure these on your Knowledge Stack services to point to LiteLLM:

Variable	Value	Description
`LITELLM_PROXY_URL`	`http://litellm:4000`	LiteLLM proxy base URL
`LITELLM_MASTER_KEY`	Your master key	Admin key for team/key provisioning
`GP_LLM_API_URL`	`http://litellm:4000/v1`	LLM API endpoint
`EMBEDDING_API_BASE_URL`	`http://litellm:4000/v1`	Embedding API endpoint

Prerequisites

LiteLLM requires its own PostgreSQL database for storing team, key, and usage data:

CREATE DATABASE litellm;

Deployment

See the LiteLLM Deployment Guide for detailed instructions on deploying and configuring LiteLLM for your self-hosted installation.

Get Started

SDKs & MCP

Cookbook

Concepts

Ingestion Pipeline

Agent

Infrastructure

Design

Operations

Overview

How It Works

Per-Tenant Virtual Keys

Model Name Mapping

Tenant Usage and Quotas

Self-Hosted Configuration

Environment Variables

Prerequisites

Deployment

Get Started

SDKs & MCP

Cookbook

Concepts

Ingestion Pipeline

Agent

Infrastructure

Design

Operations

Documentation Index

​Overview

​How It Works

​Per-Tenant Virtual Keys

​Model Name Mapping

​Tenant Usage and Quotas

​Self-Hosted Configuration

​Environment Variables

​Prerequisites

​Deployment

Overview

How It Works

Per-Tenant Virtual Keys

Model Name Mapping

Tenant Usage and Quotas

Self-Hosted Configuration

Environment Variables

Prerequisites

Deployment