Search

Knowledge Stack provides two complementary search mechanisms: chunk search for deep content retrieval from ingested documents, and folder/item search for browsing the folder hierarchy by name. Use them together to build powerful retrieval pipelines.

Chunk search

POST /v1/chunks/search performs semantic or keyword search across the chunks produced during ingestion. This is the primary interface for retrieval-augmented generation (RAG) and content discovery.

Basic query

curl -X POST https://api-staging.knowledgestack.ai/v1/chunks/search \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the data retention policies?",
    "top_k": 10
  }'

Full request reference

{
  "query": "string (required)",
  "search_type": "dense_only | full_text",
  "parent_path_ids": ["<path-part-uuid>"],
  "tag_ids": ["<tag-uuid>"],
  "chunk_types": ["TEXT", "TABLE", "IMAGE", "HTML", "UNKNOWN"],
  "ingestion_time_after": "2024-01-01T00:00:00Z",
  "active_version_only": true,
  "top_k": 5,
  "score_threshold": 0.3,
  "with_document": false
}

Fields

Field	Type	Default	Description
`query`	string	—	The search query (required, min 1 character)
`search_type`	string	`dense_only`	Search algorithm to use (see Search types)
`parent_path_ids`	UUID[]	tenant root	Limit search to specific folder paths (non-chunk path parts)
`tag_ids`	UUID[]	—	Filter by tags. Chunks must have all specified tags (AND logic)
`chunk_types`	string[]	—	Limit to specific chunk types (`TEXT`, `TABLE`, `IMAGE`, `HTML`, `UNKNOWN`)
`ingestion_time_after`	datetime	—	Only return chunks ingested after this ISO 8601 timestamp
`active_version_only`	boolean	`true`	When `true`, only return chunks from the active document version
`top_k`	integer	`5`	Number of results to return (1–50)
`score_threshold`	number	`0.3`	Minimum similarity score (0.0–1.0). Results below this are excluded
`with_document`	boolean	`false`	When `true`, includes `document_id` and `document_version_id` in each result

Search types

Value	Description
`dense_only`	Semantic vector search using dense embeddings. Best for conceptual queries and paraphrasing
`full_text`	BM25 keyword search. Best for exact term matching, product names, and identifiers

For most user-facing queries, dense_only (the default) gives the best results. Use full_text when your users search for specific codes, names, or technical terms that must match exactly.

Search results

The endpoint returns an array of ScoredChunkResponse objects ordered by relevance score:

[
  {
    "score": 0.87,
    "chunk_id": "c1d2e3f4-...",
    "content": "Data is retained for 7 years in accordance with...",
    "chunk_type": "TEXT",
    "path_part_id": "a1b2c3d4-...",
    "materialized_path": "/shared/compliance/policy-v2/section-3/chunk-12",
    "document_id": "d1e2f3...",
    "document_version_id": "v1a2b3..."
  }
]

document_id and document_version_id are only populated when you set "with_document": true in the request.

Filtering by folder

To restrict search to a specific folder (and all its contents), provide the folder’s path_part_id in parent_path_ids:

curl -X POST https://api-staging.knowledgestack.ai/v1/chunks/search \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "annual revenue",
    "parent_path_ids": ["<finance-folder-path-part-id>"],
    "top_k": 5
  }'

To search multiple folders simultaneously, include multiple IDs in the array. Results are pooled and ranked together.

Filtering by tags

Tags let you apply custom metadata to path parts and then filter search results by those tags:

curl -X POST https://api-staging.knowledgestack.ai/v1/chunks/search \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "service level agreement",
    "tag_ids": ["<tag-uuid-1>", "<tag-uuid-2>"],
    "top_k": 10
  }'

All specified tags must be present on a chunk for it to appear in results (AND logic).

Expanding context with neighbors

Search results return individual chunks, which may be too short to provide full context. Use GET /v1/chunks/{chunk_id}/neighbors to retrieve the chunks immediately before and after a result in the original document structure.

curl -X GET "https://api-staging.knowledgestack.ai/v1/chunks/<chunk-id>/neighbors" \
  -H "Authorization: Bearer <your-api-key>"

The response includes before and after arrays of ChunkResponse objects. This is useful for building a context window around a matched passage before sending it to an LLM.

Subtree chunks via path

To retrieve all chunks under a specific path node (a folder, document, or section), use GET /v1/path-parts/{path_part_id}/subtree_chunks. This returns a SubtreeChunksResponse grouping chunks by their shared path parts and tags.

curl -X GET "https://api-staging.knowledgestack.ai/v1/path-parts/<path-part-id>/subtree_chunks" \
  -H "Authorization: Bearer <your-api-key>"

{
  "groups": [
    {
      "chunk_ids": ["c1...", "c2..."],
      "path_part_ids": ["pp1...", "pp2..."],
      "tag_ids": ["t1..."]
    }
  ]
}

Folder and item search

GET /v1/folders/search searches for folders and documents by name within your folder tree. It is useful for building browse-and-select UIs or locating a document when you know its name but not its ID.

curl -X GET "https://api-staging.knowledgestack.ai/v1/folders/search?q=compliance&limit=20" \
  -H "Authorization: Bearer <your-api-key>"

Query parameters

Parameter	Type	Description
`q`	string	Name search term
`parent_path_part_id`	UUID	Restrict search to a specific folder subtree
`part_type`	string	Filter by `FOLDER` or `DOCUMENT`
`limit`	integer	Results per page (default 20, max 100)
`offset`	integer	Pagination offset

The response is a paginated list of FolderResponse and DocumentResponse objects depending on what matches.

Realistic example: RAG pipeline

The following shows a complete chunk retrieval flow you might use in a retrieval-augmented generation pipeline:

# 1. Search for relevant chunks
curl -X POST https://api-staging.knowledgestack.ai/v1/chunks/search \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is our refund policy for enterprise customers?",
    "search_type": "dense_only",
    "parent_path_ids": ["'$DOCS_FOLDER_PATH_PART_ID'"],
    "top_k": 5,
    "score_threshold": 0.4,
    "with_document": true
  }'

# 2. Expand context around top result
curl -X GET "https://api-staging.knowledgestack.ai/v1/chunks/$TOP_CHUNK_ID/neighbors" \
  -H "Authorization: Bearer $API_KEY"

top_k is capped at 50. For broader recall, consider running multiple searches with different parent_path_ids or adjusting score_threshold rather than raising top_k.

Get Started

Core Concepts

Guides

Chunk search

Basic query

Full request reference

Search types

Search results

Filtering by folder

Filtering by tags

Expanding context with neighbors

Subtree chunks via path

Folder and item search

Realistic example: RAG pipeline

Get Started

Core Concepts

Guides

​Chunk search

​Basic query

​Full request reference

​Search types

​Search results

​Filtering by folder

​Filtering by tags

​Expanding context with neighbors

​Subtree chunks via path

​Folder and item search

​Realistic example: RAG pipeline

Chunk search

Basic query

Full request reference

Search types

Search results

Filtering by folder

Filtering by tags

Expanding context with neighbors

Subtree chunks via path

Folder and item search

Realistic example: RAG pipeline