Skip to main content
Knowledge Stack provides two complementary search mechanisms: chunk search for deep content retrieval from ingested documents, and folder/item search for browsing the folder hierarchy by name. Use them together to build powerful retrieval pipelines.
POST /v1/chunks/search performs semantic or keyword search across the chunks produced during ingestion. This is the primary interface for retrieval-augmented generation (RAG) and content discovery.

Basic query

curl -X POST https://api-staging.knowledgestack.ai/v1/chunks/search \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the data retention policies?",
    "top_k": 10
  }'

Full request reference

{
  "query": "string (required)",
  "search_type": "dense_only | full_text",
  "parent_path_ids": ["<path-part-uuid>"],
  "tag_ids": ["<tag-uuid>"],
  "chunk_types": ["TEXT", "TABLE", "IMAGE", "HTML", "UNKNOWN"],
  "ingestion_time_after": "2024-01-01T00:00:00Z",
  "active_version_only": true,
  "top_k": 5,
  "score_threshold": 0.3,
  "with_document": false
}
Fields
FieldTypeDefaultDescription
querystringThe search query (required, min 1 character)
search_typestringdense_onlySearch algorithm to use (see Search types)
parent_path_idsUUID[]tenant rootLimit search to specific folder paths (non-chunk path parts)
tag_idsUUID[]Filter by tags. Chunks must have all specified tags (AND logic)
chunk_typesstring[]Limit to specific chunk types (TEXT, TABLE, IMAGE, HTML, UNKNOWN)
ingestion_time_afterdatetimeOnly return chunks ingested after this ISO 8601 timestamp
active_version_onlybooleantrueWhen true, only return chunks from the active document version
top_kinteger5Number of results to return (1–50)
score_thresholdnumber0.3Minimum similarity score (0.0–1.0). Results below this are excluded
with_documentbooleanfalseWhen true, includes document_id and document_version_id in each result

Search types

ValueDescription
dense_onlySemantic vector search using dense embeddings. Best for conceptual queries and paraphrasing
full_textBM25 keyword search. Best for exact term matching, product names, and identifiers
For most user-facing queries, dense_only (the default) gives the best results. Use full_text when your users search for specific codes, names, or technical terms that must match exactly.

Search results

The endpoint returns an array of ScoredChunkResponse objects ordered by relevance score:
[
  {
    "score": 0.87,
    "chunk_id": "c1d2e3f4-...",
    "content": "Data is retained for 7 years in accordance with...",
    "chunk_type": "TEXT",
    "path_part_id": "a1b2c3d4-...",
    "materialized_path": "/shared/compliance/policy-v2/section-3/chunk-12",
    "document_id": "d1e2f3...",
    "document_version_id": "v1a2b3..."
  }
]
document_id and document_version_id are only populated when you set "with_document": true in the request.

Filtering by folder

To restrict search to a specific folder (and all its contents), provide the folder’s path_part_id in parent_path_ids:
curl -X POST https://api-staging.knowledgestack.ai/v1/chunks/search \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "annual revenue",
    "parent_path_ids": ["<finance-folder-path-part-id>"],
    "top_k": 5
  }'
To search multiple folders simultaneously, include multiple IDs in the array. Results are pooled and ranked together.

Filtering by tags

Tags let you apply custom metadata to path parts and then filter search results by those tags:
curl -X POST https://api-staging.knowledgestack.ai/v1/chunks/search \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "service level agreement",
    "tag_ids": ["<tag-uuid-1>", "<tag-uuid-2>"],
    "top_k": 10
  }'
All specified tags must be present on a chunk for it to appear in results (AND logic).

Expanding context with neighbors

Search results return individual chunks, which may be too short to provide full context. Use GET /v1/chunks/{chunk_id}/neighbors to retrieve the chunks immediately before and after a result in the original document structure.
curl -X GET "https://api-staging.knowledgestack.ai/v1/chunks/<chunk-id>/neighbors" \
  -H "Authorization: Bearer <your-api-key>"
The response includes before and after arrays of ChunkResponse objects. This is useful for building a context window around a matched passage before sending it to an LLM.

Subtree chunks via path

To retrieve all chunks under a specific path node (a folder, document, or section), use GET /v1/path-parts/{path_part_id}/subtree_chunks. This returns a SubtreeChunksResponse grouping chunks by their shared path parts and tags.
curl -X GET "https://api-staging.knowledgestack.ai/v1/path-parts/<path-part-id>/subtree_chunks" \
  -H "Authorization: Bearer <your-api-key>"
{
  "groups": [
    {
      "chunk_ids": ["c1...", "c2..."],
      "path_part_ids": ["pp1...", "pp2..."],
      "tag_ids": ["t1..."]
    }
  ]
}

GET /v1/folders/search searches for folders and documents by name within your folder tree. It is useful for building browse-and-select UIs or locating a document when you know its name but not its ID.
curl -X GET "https://api-staging.knowledgestack.ai/v1/folders/search?q=compliance&limit=20" \
  -H "Authorization: Bearer <your-api-key>"
Query parameters
ParameterTypeDescription
qstringName search term
parent_path_part_idUUIDRestrict search to a specific folder subtree
part_typestringFilter by FOLDER or DOCUMENT
limitintegerResults per page (default 20, max 100)
offsetintegerPagination offset
The response is a paginated list of FolderResponse and DocumentResponse objects depending on what matches.

Realistic example: RAG pipeline

The following shows a complete chunk retrieval flow you might use in a retrieval-augmented generation pipeline:
# 1. Search for relevant chunks
curl -X POST https://api-staging.knowledgestack.ai/v1/chunks/search \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is our refund policy for enterprise customers?",
    "search_type": "dense_only",
    "parent_path_ids": ["'$DOCS_FOLDER_PATH_PART_ID'"],
    "top_k": 5,
    "score_threshold": 0.4,
    "with_document": true
  }'

# 2. Expand context around top result
curl -X GET "https://api-staging.knowledgestack.ai/v1/chunks/$TOP_CHUNK_ID/neighbors" \
  -H "Authorization: Bearer $API_KEY"
top_k is capped at 50. For broader recall, consider running multiple searches with different parent_path_ids or adjusting score_threshold rather than raising top_k.