Documentation Index
Fetch the complete documentation index at: https://docs.knowledgestack.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Knowledge Stack’s AI assistant is powered by a fleet of stateless agents that can search your knowledge base, process information, and stream responses in real time. Each agent invocation is independent and self-contained — no state persists between conversations.
Key Properties
| Property | Description |
|---|
| Stateless | Each invocation is independent. No state persists between runs. |
| Sandboxed | Agents run in isolated containers with restricted network access. |
| Low latency | A pool of warm containers provides sub-second dispatch for agent requests. |
| Real-time streaming | Responses stream token-by-token to your client as they are generated. |
| Scalable | The agent pool scales horizontally. In production, 10 workers handle up to 50 concurrent agent conversations. |
How It Works
When you send a message to the AI assistant, the following sequence occurs:
- You send a message via
POST /v1/threads/{id}/run
- The API returns immediately with
202 Accepted and a workflow_id
- An agent worker picks up the task from the queue
- The agent retrieves context — your thread history and any memory documents you have configured
- The agent processes your request, potentially searching your knowledge base or using other tools
- Responses stream in real time via SSE (Server-Sent Events) as the agent generates them
- The final message is saved to your thread
| Phase | Target Latency |
|---|
| Request acceptance and dispatch | < 100ms |
| Thread history retrieval | < 200ms |
| Document context retrieval | < 500ms |
| First token from LLM | < 1 second |
| Full response | 5-30 seconds |
Agent Capabilities
What Agents Can Do
Agents have access to a focused set of tools for working with your knowledge base:
| Tool | Description |
|---|
| Search documents | Semantic search across your knowledge base |
| Read document sections | Read the full content of a specific section |
| Browse folders | List the contents of a folder |
| Get thread context | Access metadata about the current conversation |
| Save notes | Save intermediate findings to the thread |
What Agents Cannot Do
By design, agents have a restricted scope:
- Agents can read from your knowledge base and write to threads only
- Agents cannot modify documents, folders, or any other content
- Agents cannot access external networks or services beyond the knowledge base and LLM provider
Memory Documents
You can influence agent behavior by providing memory documents — regular documents in your knowledge base that are injected as context for the agent’s system prompt.
This means you can:
- Give the agent domain-specific instructions
- Define response formatting guidelines
- Provide background context that the agent should always consider
Changes to memory documents take effect on the next agent invocation — no redeployment needed.
Streaming Responses
Agent responses are streamed in real time using Server-Sent Events (SSE). See the Real-Time Notifications documentation for details on connecting to the streaming endpoint and handling events.
Security
Agent execution includes multiple layers of security:
| Control | Description |
|---|
| Network isolation | Containers can only reach the Knowledge Stack API, Redis, and the LLM provider |
| No persistent state | Containers are stateless — no data persists between invocations |
| Read-only filesystem | Container filesystems are read-only (with a small temporary directory) |
| Resource limits | CPU and memory caps prevent runaway resource usage |
| Token budgets | Maximum token and step limits prevent unbounded LLM spending |
| Scoped permissions | Agents operate with the permissions of the requesting user |
| Audit trail | All tool calls are logged for accountability |
Configuration
If you are self-hosting Knowledge Stack, you can configure the agent system:
| Setting | Default | Description |
|---|
default_model | openai:gpt-4o | The default LLM model for agent conversations |
max_steps | 10 | Maximum tool-use steps per agent invocation |
max_tokens | 4096 | Maximum tokens in the agent response |
max_concurrent_activities | 5 | Maximum concurrent agent runs per worker |
Scaling
| Environment | Workers | Concurrent Agents |
|---|
| Development | 1 | 2 |
| Staging | 3 | 15 |
| Production | 10 | 50 |