Skip to main content
Custom workflow definitions let you automate multi-step processing pipelines over your knowledge base. A workflow definition describes what to process (source documents), how to process it (instructions), where to write output, and which runner executes the work.

Concepts

TermDescription
Workflow definitionA reusable template that specifies sources, instructions, output locations, and runner configuration
Workflow runA single execution of a definition, triggered on demand
Self-hosted runnerYour own HTTP endpoint that Knowledge Stack calls with the run payload

1. Create a workflow definition

curl -X POST https://api-staging.knowledgestack.ai/v1/workflow-definitions \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Summarise Policy Documents",
    "description": "Generates executive summaries for all documents in the policy folder.",
    "runner_type": "SELF_HOSTED",
    "runner_config": {
      "url": "https://runner.yourcompany.com/workflow",
      "webhook_secret": "<your-webhook-secret>"
    },
    "source_path_part_ids": ["<policy-folder-path-part-id>"],
    "instruction_path_part_ids": ["<instructions-doc-path-part-id>"],
    "output_path_part_ids": ["<summaries-folder-path-part-id>"],
    "max_run_duration_seconds": 600
  }'
Request fields
FieldTypeRequiredDescription
namestringYesDefinition name (max 255 characters)
descriptionstringNoHuman-readable description of what this workflow does
runner_typestringYesMust be SELF_HOSTED — the only supported runner type
runner_configobjectNoConfiguration for the self-hosted runner (see below)
source_path_part_idsUUID[]YesPath parts whose content the workflow reads (1–20 IDs)
instruction_path_part_idsUUID[]YesPath parts containing processing instructions (1–20 IDs)
output_path_part_idsUUID[]YesPath parts where the runner writes results (1–20 IDs)
template_path_part_idUUIDNoOptional template document to use during processing
max_run_duration_secondsintegerNoMaximum allowed run time in seconds (60–86400, default 300)

Self-hosted runner configuration

When runner_type is SELF_HOSTED, Knowledge Stack calls your endpoint with the run payload. The runner_config object controls how it connects:
FieldDescription
urlHTTPS URL of your runner endpoint (must be a valid URI, max 2083 characters)
webhook_secretSecret string used to verify the webhook signature
The webhook_secret is write-only — it is never returned in API responses. Store it securely on your runner.
Response (201)
{
  "id": "wd1a2b3c-...",
  "name": "Summarise Policy Documents",
  "runner_type": "SELF_HOSTED",
  "max_run_duration_seconds": 600,
  "source_path_part_ids": ["..."],
  "instruction_path_part_ids": ["..."],
  "output_path_part_ids": ["..."],
  "created_at": "2024-10-15T09:00:00Z",
  "updated_at": "2024-10-15T09:00:00Z"
}

2. Invoke a workflow

Trigger a run of an existing definition. Knowledge Stack captures a snapshot of the current path configuration and dispatches the run to your self-hosted runner.
curl -X POST https://api-staging.knowledgestack.ai/v1/workflow-definitions/<definition-id>/invoke \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "idempotency_key": "run-2024-10-15-001"
  }'
Request fields
FieldTypeDescription
idempotency_keystringOptional. Prevents duplicate runs from retries — if a run with this key already exists, the existing run is returned instead of creating a new one (max 255 characters)
Response (202)
{
  "id": "wr1a2b3c-...",
  "workflow_definition_id": "wd1a2b3c-...",
  "status": "PENDING",
  "runner_type": "SELF_HOSTED",
  "started_at": "2024-10-15T09:05:00Z",
  "completed_at": null,
  "run_snapshot": {
    "workflow_name": "Summarise Policy Documents",
    "sources": [...],
    "instructions": [...],
    "outputs": [...]
  },
  "error": null,
  "created_at": "2024-10-15T09:05:00Z"
}
Save the id of the returned run to monitor its progress.

3. Monitor workflow runs

List runs for a definition

curl -X GET "https://api-staging.knowledgestack.ai/v1/workflow-definitions/<definition-id>/runs?limit=20&offset=0" \
  -H "Authorization: Bearer <your-api-key>"

Get a specific run

curl -X GET "https://api-staging.knowledgestack.ai/v1/workflow-runs/<run-id>" \
  -H "Authorization: Bearer <your-api-key>"

Run statuses

StatusDescription
PENDINGRun is queued and waiting to be dispatched to the runner
RUNNINGRunner has received the payload and is actively processing
COMPLETEDRunner reported success via the callback endpoint
FAILEDRunner reported failure, or the run exceeded max_run_duration_seconds
Poll GET /v1/workflow-runs/{run_id} until the status reaches COMPLETED or FAILED. Check the error field on failure for a description of what went wrong.

4. Callback endpoint

Your self-hosted runner must call the callback endpoint when it finishes processing. This transitions the run from RUNNING to COMPLETED or FAILED.
# Called by your runner, not by API consumers directly
curl -X POST https://api-staging.knowledgestack.ai/v1/workflow-runs/<run-id>/callback \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "COMPLETED"
  }'
Request fields
FieldTypeRequiredDescription
statusstringYesFinal status: COMPLETED or FAILED
errorstringNoError description when status is FAILED (max 8192 characters)

5. Cancel a run

To stop an in-progress run, delete the underlying workflow by its workflow_id. The workflow_id is returned in the run response.
curl -X DELETE "https://api-staging.knowledgestack.ai/v1/workflows/<workflow-id>" \
  -H "Authorization: Bearer <your-api-key>"
Cancellation is best-effort. If your self-hosted runner has already begun processing, it may not stop immediately. Your runner should handle a late callback gracefully.

Document version workflows

In addition to custom workflow definitions, Knowledge Stack runs an ingestion workflow for every document version. You can monitor and rerun these directly.

List all document ingestion workflows

curl -X GET "https://api-staging.knowledgestack.ai/v1/workflows/document_versions?limit=20" \
  -H "Authorization: Bearer <your-api-key>"
Tenant admins see all workflows; members see only workflows for document versions they have read access to.

Get a specific document ingestion workflow

curl -X GET "https://api-staging.knowledgestack.ai/v1/workflows/document_versions/<workflow-id>" \
  -H "Authorization: Bearer <your-api-key>"
The response includes live pipeline execution status, along with persisted fields like status, error, and chunks_processed.

Rerun a document ingestion workflow

If a document ingestion failed or you need to reprocess a version (for example after a configuration change), rerun it without re-uploading the file:
curl -X POST "https://api-staging.knowledgestack.ai/v1/workflows/document_versions/<workflow-id>" \
  -H "Authorization: Bearer <your-api-key>"
Knowledge Stack reuses the existing file in storage — no re-upload needed.

End-to-end example

1
Create a workflow definition
2
DEFINITION=$(curl -sX POST https://api-staging.knowledgestack.ai/v1/workflow-definitions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Nightly Report Generator",
    "runner_type": "SELF_HOSTED",
    "runner_config": {
      "url": "https://runner.yourcompany.com/nightly",
      "webhook_secret": "'$WEBHOOK_SECRET'"
    },
    "source_path_part_ids": ["'$SOURCE_FOLDER_ID'"],
    "instruction_path_part_ids": ["'$INSTRUCTIONS_DOC_ID'"],
    "output_path_part_ids": ["'$OUTPUT_FOLDER_ID'"],
    "max_run_duration_seconds": 3600
  }')
DEFINITION_ID=$(echo $DEFINITION | jq -r '.id')
3
Invoke the workflow
4
RUN=$(curl -sX POST "https://api-staging.knowledgestack.ai/v1/workflow-definitions/$DEFINITION_ID/invoke" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"idempotency_key": "nightly-2024-10-15"}')
RUN_ID=$(echo $RUN | jq -r '.id')
5
Poll until complete
6
while true; do
  STATUS=$(curl -s "https://api-staging.knowledgestack.ai/v1/workflow-runs/$RUN_ID" \
    -H "Authorization: Bearer $API_KEY" | jq -r '.status')
  echo "Status: $STATUS"
  if [[ "$STATUS" == "COMPLETED" || "$STATUS" == "FAILED" ]]; then
    break
  fi
  sleep 10
done