Updated script to add sub-headings to markdown code snippets#3182
Updated script to add sub-headings to markdown code snippets#3182
Conversation
WalkthroughAdds two exported utilities— Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
4215876 to
4e92bf7
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@data/onPostBuild/transpileMdxToMarkdown.ts`:
- Around line 13-15: In function getLanguageDisplayName, change the early return
to use braces to satisfy the eslint "curly" rule: replace the line "if (!lang)
return '';" with a block-style conditional "if (!lang) { return ''; }" so the
function (and its parameter lang) uses a braced return before continuing to the
split/capitalize logic.
- Around line 27-31: The current codeTagRegex only matches bare
"<Code>...</Code>" so tags with attributes like "<Code fixed="true">...</Code>"
are skipped; update the codeTagRegex definition (the variable named codeTagRegex
used in the replace call) to allow an opening <Code> tag followed by optional
attributes by requiring a word boundary after "Code" and permitting any
characters except ">" until the closing ">", keep the case-insensitive and
global flags and keep the inner capture non-greedy so the replace callback (the
function handling innerContent) still receives the correct content.
- Around line 33-46: The regex in codeBlockRegex only uses \w+ and thus misses
language identifiers with hyphens, plus signs, dots, or other non-word chars;
update codeBlockRegex to capture any non-newline/backtick language token (e.g.,
/```([^\n`]+)\n[\s\S]*?```/g) and change the innerContent.replace callback
signature to accept the match and the lang (remove the redundant fullMatch
parameter) so getLanguageDisplayName(lang) receives the full language token;
modify occurrences of codeBlockRegex, innerContent.replace, and the callback
used to build transformedContent accordingly.
There was a problem hiding this comment.
Pull request overview
This PR adds language-specific subheadings to code snippets within <Code> tags in markdown documentation to make it easier for LLMs to identify which language each code snippet belongs to. The transformation converts code blocks from having implicit language identifiers in the fence syntax to explicit markdown headings.
Changes:
- Added
getLanguageDisplayName()function to convert language identifiers to display names (e.g.,realtime_javascript→Realtime Javascript) - Added
addLanguageSubheadingsToCodeBlocks()function to transform code blocks within<Code>tags by prepending language headings - Integrated the new transformation as Stage 12 in the MDX-to-Markdown transpilation pipeline
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| data/onPostBuild/transpileMdxToMarkdown.ts | Implements two new functions for language display name formatting and code block transformation; integrates the transformation into the pipeline after template variable replacement |
| data/onPostBuild/transpileMdxToMarkdown.test.ts | Adds comprehensive test coverage for the new functions including various language formats, multiple code blocks, and content preservation |
| data/onPostBuild/snapshots/transpileMdxToMarkdown.test.ts.snap | Updates snapshot to reflect the new heading structure in transformed code blocks |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
kennethkalmer
left a comment
There was a problem hiding this comment.
I like this @sacOO7, good stuff! I have some questions though:
- Should we remove the language from the fenced code block for token efficiency?
- [nitpick] Do you think it is possible that we dynamically determine what header level the markdown should use? In
messages.mdwe jump from##to####in the first block, would be nice to go to###instead. Likewise, if we have some other page that already has####with code snippets inside we should then be going to#####.
|
@kennethkalmer good question. I had thought about both points.
I’ll definitely double-check and update accordingly 👍 |
Into ===>
Compared
oldandnewmarkdown fordocs/chat/rooms/messages.mdusing claude, it saidOld markdown:
⚠️ Relies on language identifiers in code fences (```javascript), which can be less prominent
⚠️ No explicit labels - must parse the code fence syntax
⚠️ Easier to miss - languages blend together without clear separators
✅ Cleaner, more concise
✅ Standard markdown syntax
New markdown:
✅ Explicit language headers (#### Javascript, #### React) make it immediately clear what each snippet is
✅ Hierarchical structure is unambiguous - headers act as labels
✅ Easy to extract - can search for #### pattern followed by language name
✅ Self-documenting - no need to infer from context
Summary by CodeRabbit
New Features
Tests