Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.knowledgestack.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Knowledge Stack’s AI assistant is powered by a fleet of stateless agents that can search your knowledge base, process information, and stream responses in real time. Each agent invocation is independent and self-contained — no state persists between conversations.

Key Properties

PropertyDescription
StatelessEach invocation is independent. No state persists between runs.
SandboxedAgents run in isolated containers with restricted network access.
Low latencyA pool of warm containers provides sub-second dispatch for agent requests.
Real-time streamingResponses stream token-by-token to your client as they are generated.
ScalableThe agent pool scales horizontally. In production, 10 workers handle up to 50 concurrent agent conversations.

How It Works

When you send a message to the AI assistant, the following sequence occurs:
  1. You send a message via POST /v1/threads/{id}/run
  2. The API returns immediately with 202 Accepted and a workflow_id
  3. An agent worker picks up the task from the queue
  4. The agent retrieves context — your thread history and any memory documents you have configured
  5. The agent processes your request, potentially searching your knowledge base or using other tools
  6. Responses stream in real time via SSE (Server-Sent Events) as the agent generates them
  7. The final message is saved to your thread

Performance Targets

PhaseTarget Latency
Request acceptance and dispatch< 100ms
Thread history retrieval< 200ms
Document context retrieval< 500ms
First token from LLM< 1 second
Full response5-30 seconds

Agent Capabilities

What Agents Can Do

Agents have access to a focused set of tools for working with your knowledge base:
ToolDescription
Search documentsSemantic search across your knowledge base
Read document sectionsRead the full content of a specific section
Browse foldersList the contents of a folder
Get thread contextAccess metadata about the current conversation
Save notesSave intermediate findings to the thread

What Agents Cannot Do

By design, agents have a restricted scope:
  • Agents can read from your knowledge base and write to threads only
  • Agents cannot modify documents, folders, or any other content
  • Agents cannot access external networks or services beyond the knowledge base and LLM provider

Memory Documents

You can influence agent behavior by providing memory documents — regular documents in your knowledge base that are injected as context for the agent’s system prompt. This means you can:
  • Give the agent domain-specific instructions
  • Define response formatting guidelines
  • Provide background context that the agent should always consider
Changes to memory documents take effect on the next agent invocation — no redeployment needed.

Streaming Responses

Agent responses are streamed in real time using Server-Sent Events (SSE). See the Real-Time Notifications documentation for details on connecting to the streaming endpoint and handling events.

Security

Agent execution includes multiple layers of security:
ControlDescription
Network isolationContainers can only reach the Knowledge Stack API, Redis, and the LLM provider
No persistent stateContainers are stateless — no data persists between invocations
Read-only filesystemContainer filesystems are read-only (with a small temporary directory)
Resource limitsCPU and memory caps prevent runaway resource usage
Token budgetsMaximum token and step limits prevent unbounded LLM spending
Scoped permissionsAgents operate with the permissions of the requesting user
Audit trailAll tool calls are logged for accountability

Configuration

If you are self-hosting Knowledge Stack, you can configure the agent system:
SettingDefaultDescription
default_modelopenai:gpt-4oThe default LLM model for agent conversations
max_steps10Maximum tool-use steps per agent invocation
max_tokens4096Maximum tokens in the agent response
max_concurrent_activities5Maximum concurrent agent runs per worker

Scaling

EnvironmentWorkersConcurrent Agents
Development12
Staging315
Production1050