feat(seo): add sitemap, robots.txt, and meta tags for Google indexing#245
Conversation
- Add robots.txt with sitemap reference - Add sitemap.xml for Google indexing - Add OpenGraph tags for social sharing - Add Twitter card meta tags - Add JSON-LD structured data - Add canonical URL TODO: Create og-image.png (1200x630)
- Created og-image.svg as template (1200x630) - Need to convert to PNG for full social platform support - Updated meta tags to reference PNG
Consistent domain: opencodeintel.com across all files
|
@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds lightweight planning docs (IDEAS.md, TODO.md), enriches site SEO metadata and structured data in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
frontend/index.html (1)
47-66: Minor URL inconsistency in JSON-LDurlfield.Line 55 uses
"url": "https://opencodeintel.com"(no trailing slash), while the canonical URL (Line 18) and OG URL (Line 22) usehttps://opencodeintel.com/(with trailing slash). While functionally equivalent, consistency across all URL references avoids any ambiguity for search engines normalizing your canonical URL.Suggested fix
- "url": "https://opencodeintel.com", + "url": "https://opencodeintel.com/",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/index.html` around lines 47 - 66, Update the JSON-LD script's "url" field to match the canonical/OG URL by adding the trailing slash so it reads "https://opencodeintel.com/"; locate the <script type="application/ld+json"> block and modify the "url" property inside that JSON object to include the trailing slash for consistency.frontend/public/sitemap.xml (1)
1-15: Static sitemap will become stale — consider automating generation.Hardcoded
lastmoddates (Line 5, 11) and manually maintained URL entries will drift out of sync as pages are added or updated. For a small site this is fine short-term, but consider generating the sitemap at build time (e.g., via a Vite plugin or a simple script) to keep it accurate.Also, including
/loginin the sitemap is unusual — search engines generally gain little value from indexing authentication pages. Consider whether it truly needs to be discoverable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/public/sitemap.xml` around lines 1 - 15, The sitemap.xml uses hardcoded <lastmod> dates and includes an authentication page (/login); replace this static file with a build-time generated sitemap (e.g., a Vite plugin or a simple Node script run at build) that enumerates your public routes, sets <lastmod> from each page’s file mtime or CMS metadata, and optionally computes changefreq/priority, and remove or exclude /login from the generated entries (or keep it out and instead use noindex on auth pages). Update the build pipeline to write/update frontend/public/sitemap.xml during CI/build so it stays accurate.frontend/public/robots.txt (1)
4-11: Consider disallowing/apipaths if the backend is served from the same origin.The authenticated routes are correctly blocked. If the API is served under the same domain (e.g.,
/api/...), consider addingDisallow: /api/to prevent crawlers from hitting API endpoints, which can generate unnecessary load and index non-HTML responses.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/public/robots.txt` around lines 4 - 11, The robots.txt currently disallows authenticated routes but doesn't block API endpoints; if your backend serves APIs from the same origin, update the robots.txt to include a rule like "Disallow: /api/" so crawlers won't request or index API routes — modify the robots.txt entries (the User-agent block where Disallow: /dashboard, /settings, /repos/, /callback are defined) to add Disallow: /api/ (or the correct API prefix) to prevent crawler access to API endpoints.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@frontend/index.html`:
- Around line 47-66: Update the JSON-LD script's "url" field to match the
canonical/OG URL by adding the trailing slash so it reads
"https://opencodeintel.com/"; locate the <script type="application/ld+json">
block and modify the "url" property inside that JSON object to include the
trailing slash for consistency.
In `@frontend/public/robots.txt`:
- Around line 4-11: The robots.txt currently disallows authenticated routes but
doesn't block API endpoints; if your backend serves APIs from the same origin,
update the robots.txt to include a rule like "Disallow: /api/" so crawlers won't
request or index API routes — modify the robots.txt entries (the User-agent
block where Disallow: /dashboard, /settings, /repos/, /callback are defined) to
add Disallow: /api/ (or the correct API prefix) to prevent crawler access to API
endpoints.
In `@frontend/public/sitemap.xml`:
- Around line 1-15: The sitemap.xml uses hardcoded <lastmod> dates and includes
an authentication page (/login); replace this static file with a build-time
generated sitemap (e.g., a Vite plugin or a simple Node script run at build)
that enumerates your public routes, sets <lastmod> from each page’s file mtime
or CMS metadata, and optionally computes changefreq/priority, and remove or
exclude /login from the generated entries (or keep it out and instead use
noindex on auth pages). Update the build pipeline to write/update
frontend/public/sitemap.xml during CI/build so it stays accurate.
- Add trailing slash to JSON-LD url for consistency - Add /api/ to robots.txt disallow rules (defensive) - Remove /login from sitemap.xml (auth pages shouldn't be indexed)
Additional Comments (1)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! Prompt To Fix With AIThis is a comment left during a code review.
Path: frontend/public/sitemap.xml
Line: 3:14
Comment:
Missing many public pages that should be indexed. Consider adding `/docs`, `/signup`, and other public documentation routes like `/docs/quickstart`, `/docs/mcp-setup`, etc.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise. |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
frontend/index.html (3)
47-65: Consider using"@type": "WebApplication"instead of"SoftwareApplication".Since OpenCodeIntel is a web-based tool,
WebApplication(a subtype ofSoftwareApplicationin Schema.org) is a more precise fit and may yield better rich results for browser-based applications.Proposed fix
- "@type": "SoftwareApplication", + "@type": "WebApplication",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/index.html` around lines 47 - 65, Update the JSON-LD block inside the <script type="application/ld+json"> for OpenCodeIntel by changing the "@type" value from "SoftwareApplication" to "WebApplication" (i.e., replace "@type": "SoftwareApplication" with "@type": "WebApplication") so the structured data accurately reflects a browser-based application; keep the rest of the properties (name "OpenCodeIntel", applicationCategory, operatingSystem, description, url, offers, author) unchanged.
37-39: Redundantrobotsmeta directives.
index, followis the default behavior for all crawlers, so these tags are technically unnecessary. They don't hurt anything, but they add no value either — only include them if you plan to change them tonoindexfor certain pages in the future.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/index.html` around lines 37 - 39, The meta tags <meta name="robots" content="index, follow" /> and <meta name="googlebot" content="index, follow" /> are redundant since "index, follow" is the default; remove these two meta elements from the HTML (or comment them out) unless you intend to set non-default directives like "noindex" later—look for the <meta name="robots"...> and <meta name="googlebot"...> entries in the document and delete or disable them.
18-18: SPA limitation: static meta tags only cover the homepage.Since this is a React SPA (
src/main.tsx), these static OG/meta tags inindex.htmlwill be served for all routes. Social crawlers and search engines that don't execute JavaScript will see the same metadata regardless of the URL. This is fine today since the sitemap only lists the root, but if you later add more pages to the sitemap, you'll need server-side rendering or a prerendering service (e.g.,react-snap,prerender.io) for per-page meta tags.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/index.html` at line 18, The current static OG/meta tags in index.html are served for all SPA routes (see index.html and src/main.tsx), so crawlers that don't run JS will always see homepage metadata; fix by implementing per-route metadata via either (A) prerendering: add a prerender step (e.g., react-snap or prerender.io) and configure it to snapshot all sitemap routes so each route gets its own static meta tags, or (B) server-side rendering: introduce an SSR entry that uses react-helmet (or similar) to render per-route meta tags on the server before sending HTML. Choose one approach, update the build/deploy pipeline to run the prerender or SSR renderer for all routes listed in the sitemap, and remove reliance on the single static meta block in index.html.frontend/public/robots.txt (1)
8-12: Inconsistent trailing-slash usage in Disallow rules.
/repos/and/api/use a trailing slash (prefix-matches only paths under that directory), while/dashboard,/settings, and/callbackomit it. In robots.txt,Disallow: /dashboardprefix-matches anything starting with/dashboard, including a hypothetical/dashboardsroute. Adding trailing slashes makes the intent clearer and more precise.Proposed fix
-Disallow: /dashboard -Disallow: /settings +Disallow: /dashboard/ +Disallow: /settings/ Disallow: /repos/ -Disallow: /callback +Disallow: /callback/ Disallow: /api/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/public/robots.txt` around lines 8 - 12, The Disallow rules are inconsistent: /repos/ and /api/ use trailing slashes while /dashboard, /settings, and /callback do not, causing unintended prefix-matching; update the Disallow entries for "Disallow: /dashboard", "Disallow: /settings", and "Disallow: /callback" to include a trailing slash (i.e. "/dashboard/", "/settings/", "/callback/") so all rules consistently and precisely target those directories like the existing "Disallow: /repos/" and "Disallow: /api/" entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@frontend/index.html`:
- Around line 47-65: Update the JSON-LD block inside the <script
type="application/ld+json"> for OpenCodeIntel by changing the "@type" value from
"SoftwareApplication" to "WebApplication" (i.e., replace "@type":
"SoftwareApplication" with "@type": "WebApplication") so the structured data
accurately reflects a browser-based application; keep the rest of the properties
(name "OpenCodeIntel", applicationCategory, operatingSystem, description, url,
offers, author) unchanged.
- Around line 37-39: The meta tags <meta name="robots" content="index, follow"
/> and <meta name="googlebot" content="index, follow" /> are redundant since
"index, follow" is the default; remove these two meta elements from the HTML (or
comment them out) unless you intend to set non-default directives like "noindex"
later—look for the <meta name="robots"...> and <meta name="googlebot"...>
entries in the document and delete or disable them.
- Line 18: The current static OG/meta tags in index.html are served for all SPA
routes (see index.html and src/main.tsx), so crawlers that don't run JS will
always see homepage metadata; fix by implementing per-route metadata via either
(A) prerendering: add a prerender step (e.g., react-snap or prerender.io) and
configure it to snapshot all sitemap routes so each route gets its own static
meta tags, or (B) server-side rendering: introduce an SSR entry that uses
react-helmet (or similar) to render per-route meta tags on the server before
sending HTML. Choose one approach, update the build/deploy pipeline to run the
prerender or SSR renderer for all routes listed in the sitemap, and remove
reliance on the single static meta block in index.html.
In `@frontend/public/robots.txt`:
- Around line 8-12: The Disallow rules are inconsistent: /repos/ and /api/ use
trailing slashes while /dashboard, /settings, and /callback do not, causing
unintended prefix-matching; update the Disallow entries for "Disallow:
/dashboard", "Disallow: /settings", and "Disallow: /callback" to include a
trailing slash (i.e. "/dashboard/", "/settings/", "/callback/") so all rules
consistently and precisely target those directories like the existing "Disallow:
/repos/" and "Disallow: /api/" entries.
…temap robots.txt: - Fix /github/callback (was /callback) - Remove non-existent /repos/ and /settings - Add /login and /signup to disallow sitemap.xml: - Add all 18 public documentation pages - Remove hardcoded lastmod dates - Set appropriate priority levels
- Remove IDEAS.md and TODO.md from tracking - Add to .gitignore to prevent future commits
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Summary
SEO setup for better Google visibility and social sharing.
Changes
TODO after merge
og-image.svgtoog-image.png(1200x630)Summary by CodeRabbit
Greptile Summary
Added comprehensive SEO infrastructure including
robots.txt,sitemap.xml, OpenGraph meta tags, Twitter cards, and JSON-LD structured data for improved Google indexing and social sharing.Major changes:
robots.txtproperly configured with crawler directives for public/private routessitemap.xmlincludes all 18 public documentation pages with appropriate prioritiesindex.htmlenhanced with meta tags, OpenGraph properties, Twitter cards, and schema.org structured dataog-image.png(81KB) andog-image.svgcreated for social sharing previewsIDEAS.mdandTODO.mdadded as internal planning documents (consider moving to GitHub Issues or private docs folder)Confidence Score: 4/5
Important Files Changed
Flowchart
flowchart TD A[Search Engine Crawler] -->|Reads| B[robots.txt] B -->|Allow /| C[Public Routes] B -->|Disallow /dashboard, /api, etc| D[Private Routes] A -->|Fetches| E[sitemap.xml] E -->|Lists 18 URLs| F[Documentation Pages] A -->|Crawls| G[index.html] G -->|Extracts| H[Meta Tags] G -->|Extracts| I[OpenGraph Data] G -->|Extracts| J[JSON-LD Schema] H --> K[Google Search Results] I --> L[Social Media Previews] L -->|Uses| M[og-image.png] J --> N[Rich Search Results] style B fill:#6366f1,color:#fff style E fill:#6366f1,color:#fff style G fill:#6366f1,color:#fff style M fill:#8b5cf6,color:#fffLast reviewed commit: 1e85e79