Custom workflow definitions let you automate multi-step processing pipelines over your knowledge base. A workflow definition describes what to process (source documents), how to process it (instructions), where to write output, and which runner executes the work.
Concepts
| Term | Description |
|---|
| Workflow definition | A reusable template that specifies sources, instructions, output locations, and runner configuration |
| Workflow run | A single execution of a definition, triggered on demand |
| Self-hosted runner | Your own HTTP endpoint that Knowledge Stack calls with the run payload |
1. Create a workflow definition
curl -X POST https://api-staging.knowledgestack.ai/v1/workflow-definitions \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"name": "Summarise Policy Documents",
"description": "Generates executive summaries for all documents in the policy folder.",
"runner_type": "SELF_HOSTED",
"runner_config": {
"url": "https://runner.yourcompany.com/workflow",
"webhook_secret": "<your-webhook-secret>"
},
"source_path_part_ids": ["<policy-folder-path-part-id>"],
"instruction_path_part_ids": ["<instructions-doc-path-part-id>"],
"output_path_part_ids": ["<summaries-folder-path-part-id>"],
"max_run_duration_seconds": 600
}'
Request fields
| Field | Type | Required | Description |
|---|
name | string | Yes | Definition name (max 255 characters) |
description | string | No | Human-readable description of what this workflow does |
runner_type | string | Yes | Must be SELF_HOSTED — the only supported runner type |
runner_config | object | No | Configuration for the self-hosted runner (see below) |
source_path_part_ids | UUID[] | Yes | Path parts whose content the workflow reads (1–20 IDs) |
instruction_path_part_ids | UUID[] | Yes | Path parts containing processing instructions (1–20 IDs) |
output_path_part_ids | UUID[] | Yes | Path parts where the runner writes results (1–20 IDs) |
template_path_part_id | UUID | No | Optional template document to use during processing |
max_run_duration_seconds | integer | No | Maximum allowed run time in seconds (60–86400, default 300) |
Self-hosted runner configuration
When runner_type is SELF_HOSTED, Knowledge Stack calls your endpoint with the run payload. The runner_config object controls how it connects:
| Field | Description |
|---|
url | HTTPS URL of your runner endpoint (must be a valid URI, max 2083 characters) |
webhook_secret | Secret string used to verify the webhook signature |
The webhook_secret is write-only — it is never returned in API responses. Store it securely on your runner.
Response (201)
{
"id": "wd1a2b3c-...",
"name": "Summarise Policy Documents",
"runner_type": "SELF_HOSTED",
"max_run_duration_seconds": 600,
"source_path_part_ids": ["..."],
"instruction_path_part_ids": ["..."],
"output_path_part_ids": ["..."],
"created_at": "2024-10-15T09:00:00Z",
"updated_at": "2024-10-15T09:00:00Z"
}
2. Invoke a workflow
Trigger a run of an existing definition. Knowledge Stack captures a snapshot of the current path configuration and dispatches the run to your self-hosted runner.
curl -X POST https://api-staging.knowledgestack.ai/v1/workflow-definitions/<definition-id>/invoke \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"idempotency_key": "run-2024-10-15-001"
}'
Request fields
| Field | Type | Description |
|---|
idempotency_key | string | Optional. Prevents duplicate runs from retries — if a run with this key already exists, the existing run is returned instead of creating a new one (max 255 characters) |
Response (202)
{
"id": "wr1a2b3c-...",
"workflow_definition_id": "wd1a2b3c-...",
"status": "PENDING",
"runner_type": "SELF_HOSTED",
"started_at": "2024-10-15T09:05:00Z",
"completed_at": null,
"run_snapshot": {
"workflow_name": "Summarise Policy Documents",
"sources": [...],
"instructions": [...],
"outputs": [...]
},
"error": null,
"created_at": "2024-10-15T09:05:00Z"
}
Save the id of the returned run to monitor its progress.
3. Monitor workflow runs
List runs for a definition
curl -X GET "https://api-staging.knowledgestack.ai/v1/workflow-definitions/<definition-id>/runs?limit=20&offset=0" \
-H "Authorization: Bearer <your-api-key>"
Get a specific run
curl -X GET "https://api-staging.knowledgestack.ai/v1/workflow-runs/<run-id>" \
-H "Authorization: Bearer <your-api-key>"
Run statuses
| Status | Description |
|---|
PENDING | Run is queued and waiting to be dispatched to the runner |
RUNNING | Runner has received the payload and is actively processing |
COMPLETED | Runner reported success via the callback endpoint |
FAILED | Runner reported failure, or the run exceeded max_run_duration_seconds |
Poll GET /v1/workflow-runs/{run_id} until the status reaches COMPLETED or FAILED. Check the error field on failure for a description of what went wrong.
4. Callback endpoint
Your self-hosted runner must call the callback endpoint when it finishes processing. This transitions the run from RUNNING to COMPLETED or FAILED.
# Called by your runner, not by API consumers directly
curl -X POST https://api-staging.knowledgestack.ai/v1/workflow-runs/<run-id>/callback \
-H "Authorization: Bearer <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"status": "COMPLETED"
}'
Request fields
| Field | Type | Required | Description |
|---|
status | string | Yes | Final status: COMPLETED or FAILED |
error | string | No | Error description when status is FAILED (max 8192 characters) |
5. Cancel a run
To stop an in-progress run, delete the underlying workflow by its workflow_id. The workflow_id is returned in the run response.
curl -X DELETE "https://api-staging.knowledgestack.ai/v1/workflows/<workflow-id>" \
-H "Authorization: Bearer <your-api-key>"
Cancellation is best-effort. If your self-hosted runner has already begun processing, it may not stop immediately. Your runner should handle a late callback gracefully.
Document version workflows
In addition to custom workflow definitions, Knowledge Stack runs an ingestion workflow for every document version. You can monitor and rerun these directly.
List all document ingestion workflows
curl -X GET "https://api-staging.knowledgestack.ai/v1/workflows/document_versions?limit=20" \
-H "Authorization: Bearer <your-api-key>"
Tenant admins see all workflows; members see only workflows for document versions they have read access to.
Get a specific document ingestion workflow
curl -X GET "https://api-staging.knowledgestack.ai/v1/workflows/document_versions/<workflow-id>" \
-H "Authorization: Bearer <your-api-key>"
The response includes live pipeline execution status, along with persisted fields like status, error, and chunks_processed.
Rerun a document ingestion workflow
If a document ingestion failed or you need to reprocess a version (for example after a configuration change), rerun it without re-uploading the file:
curl -X POST "https://api-staging.knowledgestack.ai/v1/workflows/document_versions/<workflow-id>" \
-H "Authorization: Bearer <your-api-key>"
Knowledge Stack reuses the existing file in storage — no re-upload needed.
End-to-end example
Create a workflow definition
DEFINITION=$(curl -sX POST https://api-staging.knowledgestack.ai/v1/workflow-definitions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Nightly Report Generator",
"runner_type": "SELF_HOSTED",
"runner_config": {
"url": "https://runner.yourcompany.com/nightly",
"webhook_secret": "'$WEBHOOK_SECRET'"
},
"source_path_part_ids": ["'$SOURCE_FOLDER_ID'"],
"instruction_path_part_ids": ["'$INSTRUCTIONS_DOC_ID'"],
"output_path_part_ids": ["'$OUTPUT_FOLDER_ID'"],
"max_run_duration_seconds": 3600
}')
DEFINITION_ID=$(echo $DEFINITION | jq -r '.id')
RUN=$(curl -sX POST "https://api-staging.knowledgestack.ai/v1/workflow-definitions/$DEFINITION_ID/invoke" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"idempotency_key": "nightly-2024-10-15"}')
RUN_ID=$(echo $RUN | jq -r '.id')
while true; do
STATUS=$(curl -s "https://api-staging.knowledgestack.ai/v1/workflow-runs/$RUN_ID" \
-H "Authorization: Bearer $API_KEY" | jq -r '.status')
echo "Status: $STATUS"
if [[ "$STATUS" == "COMPLETED" || "$STATUS" == "FAILED" ]]; then
break
fi
sleep 10
done