feat: improve docs search by cjbell · Pull Request #1277 · knocklabs/docs

cjbell · 2026-01-22T16:23:58Z

Description

This PR introduces a new indexing strategy for the docs, which will now include headings and content within a page in addition to the page title/tags that we previously indexed.

Note: right now we're not indexing API content within this result set, but I can easily change that if we'd like!

Phase 1 of Algolia search improvements: - Add EnhancedDocsSearchItem type with new fields: - pageTitle: Always the parent page title - description: From frontmatter (page-level only) - content: Text content (truncated ~2000 chars) - headingLevel: 0 for page, 2 for H2, 3 for H3 - isPageLevel: True if page-level record (not a heading) - Create scripts/indexDocsForSearch.ts: - Parses all MDX/MD content files - Extracts frontmatter using remark - Creates page-level records with intro content - Extracts H2/H3 headings with surrounding content - Creates heading-level records with anchor links - Batches uploads to Algolia (1000 per batch) - Gracefully handles missing Algolia credentials - Update package.json: - Add 'index-docs' script - Run new indexer in prebuild before index-apis This enables: - Deep linking to specific sections via #anchor URLs - Better relevance for specific queries - Smaller, more focused search records - Content-based search (not just titles) Co-authored-by: chris <chris@knock.app>

cursor · 2026-01-22T16:23:59Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

vercel · 2026-01-22T16:24:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	Feb 24, 2026 3:58am

Co-authored-by: chris <chris@knock.app>

scripts/indexDocsForSearch.ts

components/ui/Autocomplete.tsx

cursor · 2026-01-23T22:19:36Z

scripts/indexDocsForSearch.ts

+    matches.push({
+      index: match.index,
+      level: match[1].length,
+      title: match[2].trim(),


Heading titles retain raw markdown formatting characters

Medium Severity

Heading titles captured by the regex at line 183 are stored directly without cleaning markdown formatting. While heading content is properly cleaned via extractTextContent() at line 200, the title field never is. Headings like ## Using \config` variablesor## Important notes` will display with literal backticks, asterisks, or link syntax visible in search results. The title needs the same markdown cleanup applied to it.

Additional Locations (1)

scripts/indexDocsForSearch.ts#L205-L206

samseely · 2026-01-27T20:06:31Z

@cjbell should i be able to test this on preview link? I'm trying it out but not getting results by querying for headers or content

Resolve conflicts: - package.json: Combined split-specs from main with index-docs from this branch - Autocomplete.tsx: Used main's refactored content variable pattern, added EnhancedDocsSearchItem import, and added pageTitle display for heading-level search results only (not page-level to avoid redundant title display) Co-authored-by: Chris Bell <chris@cjbell.co>

scripts/indexDocsForSearch.ts

Move image removal before link removal to prevent images from being partially processed. Since images use ![alt](url) which contains the link pattern [alt](url), the link regex was matching first and leaving behind '!alt text' in the indexed content. Co-authored-by: Chris Bell <chris@cjbell.co>

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-24T04:04:59Z

scripts/indexDocsForSearch.ts

+    isPageLevel: true,
+  };
+  await queueItem(pageRecord);
+  pageCount++;


Old page indexing not removed, causing duplicate records

High Severity

The new indexDocsForSearch.ts script creates page-level records with objectID: "page-${urlPath}", but the old indexing mechanism in lib/content.server.ts (generateAlgoliaIndex) still runs during getStaticProps in pages/[...slug].tsx and creates records with objectID: frontmatter.id (the raw path). Both write to the same Algolia index. Since the objectIDs differ, every docs page gets two records, resulting in duplicate search results.

Additional Locations (1)

types.ts#L51-L66

cursor · 2026-02-24T04:04:59Z

scripts/indexDocsForSearch.ts

+  return filePath
+    .replace(CONTENT_DIR, "")
+    .replace(/\.mdx?$/, "")
+    .replace("/index", "")


Non-anchored /index replacement corrupts URL paths

Medium Severity

filePathToUrlPath uses .replace("/index", "") which is a plain string replace matching the first occurrence of /index anywhere in the path. If any content file lives under a directory or has a filename containing "index" as a substring (e.g., /indexing/, /reindex), the path would be silently corrupted. This needs to be anchored to the end of the path (e.g., a regex like /\/index$/).

cursor · 2026-02-24T04:04:59Z

scripts/indexDocsForSearch.ts

+  });
+
+  return files;
+}


Duplicate utility function already exists in codebase

Low Severity

getAllFilesInDir is reimplemented here, but an identical exported function already exists in lib/content.server.ts. This duplication means bug fixes or behavior changes to one copy won't propagate to the other.

cursor · 2026-02-24T04:04:59Z

scripts/indexDocsForSearch.ts

+      level: match[1].length,
+      title: match[2].trim(),
+    });
+  }


Heading regex matches inside fenced code blocks

Medium Severity

extractHeadings runs the heading regex /^(#{2,3})\s+(.+)$/gm on the full content before code blocks are stripped. Lines like ## some comment inside fenced code blocks (common in shell/Python examples) will be falsely detected as headings. This splits the real heading's content at the wrong boundary and creates spurious heading-level search records. The extractTextContent function already strips code blocks, but it's only applied to the content between the already-detected headings, not before heading detection runs.

vercel bot deployed to Preview January 22, 2026 16:25 View deployment

style: Format indexDocsForSearch.ts with Prettier

50b75c9

Co-authored-by: chris <chris@knock.app>

vercel bot deployed to Preview January 22, 2026 16:29 View deployment

chore: frontend improvements

a75ca86

cjbell changed the title ~~Docs search content indexing~~ feat: improve docs search Jan 23, 2026

cjbell marked this pull request as ready for review January 23, 2026 22:09

cjbell requested review from MikeCarbone and mattmikolay January 23, 2026 22:09

aikido-pr-checks bot reviewed Jan 23, 2026

View reviewed changes

scripts/indexDocsForSearch.ts Show resolved Hide resolved

vercel bot deployed to Preview January 23, 2026 22:09 View deployment

aikido-pr-checks bot reviewed Jan 23, 2026

View reviewed changes

scripts/indexDocsForSearch.ts Show resolved Hide resolved

chore: fix lint

aef305e

vercel bot deployed to Preview January 23, 2026 22:13 View deployment

cursor bot reviewed Jan 23, 2026

View reviewed changes

cjbell requested a review from samseely January 27, 2026 19:50

vercel bot deployed to Preview February 24, 2026 03:45 View deployment

cursor bot reviewed Feb 24, 2026

View reviewed changes

scripts/indexDocsForSearch.ts Outdated Show resolved Hide resolved

vercel bot deployed to Preview February 24, 2026 03:58 View deployment

cursor bot reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: improve docs search#1277

feat: improve docs search#1277
cjbell wants to merge 6 commits intomainfrom
cursor/docs-search-content-indexing-c357

cjbell commented Jan 22, 2026 •

edited

Loading

Uh oh!

cursor bot commented Jan 22, 2026

Uh oh!

vercel bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 23, 2026

Uh oh!

samseely commented Jan 27, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

cjbell commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

cursor bot commented Jan 22, 2026

Uh oh!

vercel bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 23, 2026

Choose a reason for hiding this comment

Heading titles retain raw markdown formatting characters

Uh oh!

samseely commented Jan 27, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Old page indexing not removed, causing duplicate records

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Non-anchored /index replacement corrupts URL paths

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Duplicate utility function already exists in codebase

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Heading regex matches inside fenced code blocks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cjbell commented Jan 22, 2026 •

edited

Loading

vercel bot commented Jan 22, 2026 •

edited

Loading

Non-anchored `/index` replacement corrupts URL paths