Agent Architecture

Overview

Knowledge Stack’s AI assistant is powered by a fleet of stateless agents that can search your knowledge base, process information, and stream responses in real time. Each agent invocation is independent and self-contained — no state persists between conversations.

Key Properties

Property	Description
Stateless	Each invocation is independent. No state persists between runs.
Sandboxed	Agents run in isolated containers with restricted network access.
Low latency	A pool of warm containers provides sub-second dispatch for agent requests.
Real-time streaming	Responses stream token-by-token to your client as they are generated.
Scalable	The agent pool scales horizontally. In production, 10 workers handle up to 50 concurrent agent conversations.

How It Works

When you send a message to the AI assistant, the following sequence occurs:

You send a message via POST /v1/threads/{id}/run
The API returns immediately with 202 Accepted and a workflow_id
An agent worker picks up the task from the queue
The agent retrieves context — your thread history and any memory documents you have configured
The agent processes your request, potentially searching your knowledge base or using other tools
Responses stream in real time via SSE (Server-Sent Events) as the agent generates them
The final message is saved to your thread

Performance Targets

Phase	Target Latency
Request acceptance and dispatch	< 100ms
Thread history retrieval	< 200ms
Document context retrieval	< 500ms
First token from LLM	< 1 second
Full response	5-30 seconds

Agent Capabilities

What Agents Can Do

Agents have access to a focused set of tools for working with your knowledge base:

Tool	Description
Search documents	Semantic search across your knowledge base
Read document sections	Read the full content of a specific section
Browse folders	List the contents of a folder
Get thread context	Access metadata about the current conversation
Save notes	Save intermediate findings to the thread

What Agents Cannot Do

By design, agents have a restricted scope:

Agents can read from your knowledge base and write to threads only
Agents cannot modify documents, folders, or any other content
Agents cannot access external networks or services beyond the knowledge base and LLM provider

Memory Documents

You can influence agent behavior by providing memory documents — regular documents in your knowledge base that are injected as context for the agent’s system prompt. This means you can:

Give the agent domain-specific instructions
Define response formatting guidelines
Provide background context that the agent should always consider

Changes to memory documents take effect on the next agent invocation — no redeployment needed.

Streaming Responses

Agent responses are streamed in real time using Server-Sent Events (SSE). See the Real-Time Notifications documentation for details on connecting to the streaming endpoint and handling events.

Security

Agent execution includes multiple layers of security:

Control	Description
Network isolation	Containers can only reach the Knowledge Stack API, Redis, and the LLM provider
No persistent state	Containers are stateless — no data persists between invocations
Read-only filesystem	Container filesystems are read-only (with a small temporary directory)
Resource limits	CPU and memory caps prevent runaway resource usage
Token budgets	Maximum token and step limits prevent unbounded LLM spending
Scoped permissions	Agents operate with the permissions of the requesting user
Audit trail	All tool calls are logged for accountability

Configuration

If you are self-hosting Knowledge Stack, you can configure the agent system:

Setting	Default	Description
`default_model`	`openai:gpt-4o`	The default LLM model for agent conversations
`max_steps`	`10`	Maximum tool-use steps per agent invocation
`max_tokens`	`4096`	Maximum tokens in the agent response
`max_concurrent_activities`	`5`	Maximum concurrent agent runs per worker

Scaling

Environment	Workers	Concurrent Agents
Development	1	2
Staging	3	15
Production	10	50

Get Started

SDKs & MCP

Cookbook

Concepts

Ingestion Pipeline

Agent

Infrastructure

Design

Operations

Overview

Key Properties

How It Works

Performance Targets

Agent Capabilities

What Agents Can Do

What Agents Cannot Do

Memory Documents

Streaming Responses

Security

Configuration

Scaling

Get Started

SDKs & MCP

Cookbook

Concepts

Ingestion Pipeline

Agent

Infrastructure

Design

Operations

Documentation Index

​Overview

​Key Properties

​How It Works

​Performance Targets

​Agent Capabilities

​What Agents Can Do

​What Agents Cannot Do

​Memory Documents

​Streaming Responses

​Security

​Configuration

​Scaling

Overview

Key Properties

How It Works

Performance Targets

Agent Capabilities

What Agents Can Do

What Agents Cannot Do

Memory Documents

Streaming Responses

Security

Configuration

Scaling