Documentation Index
Fetch the complete documentation index at: https://docs.knowledgestack.ai/llms.txt
Use this file to discover all available pages before exploring further.
What the Agent Does
The agent worker handles everything involved in generating a response to your message:
- Authenticates — obtains credentials to act on your behalf, ensuring all knowledge base access respects your permissions
- Loads context — fetches your conversation history to understand follow-up questions
- Runs the AI agent — the LLM decides which tools to use, retrieves information, and generates a response
- Streams in real-time — as the agent thinks and writes, events are published to your browser immediately
- Saves the response — the complete message with citations is persisted to the database
- Closes the stream — sends completion signals so your browser knows the response is finished
Agent Capabilities
The AI agent is built on a framework that manages:
- LLM communication — handles model requests, response parsing, and retries
- Streaming — provides real-time event iteration as the model generates output
- Conversation history — maintains context across messages in a thread
- Tool execution — the agentic loop where the LLM decides to call tools, receives results, and continues reasoning
- Structured output — validates agent responses
Everything outside the LLM interaction — workflow orchestration, streaming delivery, message persistence, authentication — is handled by the platform.
Request Processing
When a message arrives, the worker executes this flow:
- Initialize — prepare the agent and generate a unique message ID
- Authenticate — obtain a token to access the API on behalf of the user
- Load context — fetch the user’s message and recent conversation history (up to 10 messages)
- Open the stream — publish a
message_start event
- Run the agent — the AI agent processes the request with streaming:
- Each text token is published as a
text_delta event
- Each tool call and result is published as a
step event
- The agent sends heartbeats every 10 seconds to prove it is still working
- Process citations — resolve inline chunk references into structured citation objects
- Save the message — persist the complete response with citations and tool steps
- Close the stream — publish
message_end and done events; clear the streaming flag
This entire flow runs inside a durable workflow, so if the worker crashes at any point, the work is rescheduled to another worker. See Reliability and Durability for details.
Streaming Protocol
The agent publishes events that follow an industry-standard streaming protocol:
Event Lifecycle
message_start -> text_start -> text_delta* -> text_end -> (citations)? -> message_end -> done
With tool calls interleaved as step events:
message_start -> text_start -> text_delta* -> text_end
-> step(call) -> step(result)
-> text_start -> text_delta* -> text_end
-> message_end -> done
| Event | Description |
|---|
message_start | New assistant message beginning |
text_start | Opens a text content block |
text_delta | Incremental text fragment |
text_end | Closes the text block |
step | Tool call or result snapshot |
citations | Final citations list |
message_end | Response complete |
done | Stream finished |
Reconnection
If a client disconnects and reconnects:
- If the message is still streaming, the client resumes from where it left off with missed events replayed
- If the message is no longer streaming, the client receives a
message_not_streaming event and can fetch the complete message via REST
Conversation History
The agent loads your recent conversation history to understand context for follow-up questions.
| Message Role | How It Is Used |
|---|
| Your messages | Provided as conversation context |
| Agent responses | Provided as conversation context |
| System messages | Handled through agent instructions (not included in history) |
History is loaded with a configurable depth (default: 10 messages) and converted into a format the AI framework understands. The agent sees your questions and its previous answers, enabling natural multi-turn conversations.
API Endpoints
Send a Message
POST /v1/threads/{thread_id}/user_message
Sends your message and starts the agent. Returns 202 Accepted immediately with a workflow_id.
- Returns
409 Conflict if the agent is already processing a message on this thread
Request:
{
"input_text": "What is the retention policy?"
}
Response (202):
{
"workflow_id": "agent-{thread_id}"
}
Stream the Response
GET /v1/threads/{thread_id}/stream
Opens an SSE connection to receive real-time agent output.
Query parameters:
| Param | Type | Description |
|---|
last_message_id | UUID (optional) | Message ID to resume from |
last_entry_id | string (optional) | Stream entry ID to resume from |
Security Model
The agent operates under your permissions. When the worker processes your message:
- It obtains a token scoped to your user identity and tenant
- All knowledge base access goes through the API’s authorization layer
- The agent can only see and search content you have permission to access
This means the agent respects the same access controls as the rest of the platform — it cannot access documents in folders you do not have permission to view.