Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.knowledgestack.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Knowledge Stack routes all LLM and embedding requests through a LiteLLM proxy. This gateway provides three key capabilities:
  1. Per-tenant cost tracking — Every LLM request is attributed to the tenant that triggered it, giving you full visibility into AI usage costs
  2. Budget enforcement — You can set spending limits per tenant to control costs
  3. Model routing — Services reference logical model names (like general-purpose or ingestion-chunk-enrichment) that map to specific provider models in your configuration

How It Works

All AI-powered features in Knowledge Stack — document ingestion, the AI assistant, embeddings, and general-purpose LLM calls — route through the LiteLLM proxy instead of calling LLM providers directly.
API / Worker / Agent
        |
   LiteLLM Proxy
        |
   LLM Provider (e.g., OpenAI)

Per-Tenant Virtual Keys

When a new tenant is created, Knowledge Stack automatically provisions:
  • A LiteLLM team mapped to the tenant
  • An ingestion key for document processing (no budget limit)
  • An agent key for AI assistant usage (configurable budget, default $5)
This separation ensures that ingestion workloads do not consume the agent budget, and each tenant’s spending is tracked independently.

Model Name Mapping

Your services use logical model names that are mapped to actual provider models in the LiteLLM configuration:
Logical Model NameTypical Use
general-purposeGeneral LLM tasks in the API
ingestion-chunk-enrichmentDocument processing enrichment
text-embedding-3-smallVector embeddings
agent-general-purposeAI assistant conversations
You can change the underlying provider model without modifying any application code — just update the LiteLLM configuration.

Tenant Usage and Quotas

You can monitor tenant LLM usage through the tenant API:
GET /v1/tenants/{tenant_id}/usage
This returns current spending against the tenant’s budget limits.

Self-Hosted Configuration

If you are self-hosting Knowledge Stack, you need to deploy a LiteLLM proxy instance alongside your other services.

Environment Variables

Configure these on your Knowledge Stack services to point to LiteLLM:
VariableValueDescription
LITELLM_PROXY_URLhttp://litellm:4000LiteLLM proxy base URL
LITELLM_MASTER_KEYYour master keyAdmin key for team/key provisioning
GP_LLM_API_URLhttp://litellm:4000/v1LLM API endpoint
EMBEDDING_API_BASE_URLhttp://litellm:4000/v1Embedding API endpoint

Prerequisites

LiteLLM requires its own PostgreSQL database for storing team, key, and usage data:
CREATE DATABASE litellm;

Deployment

See the LiteLLM Deployment Guide for detailed instructions on deploying and configuring LiteLLM for your self-hosted installation.