Skip to content

feat(seo): add sitemap, robots.txt, and meta tags for Google indexing#245

Merged
DevanshuNEU merged 7 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:feature/seo-setup
Feb 17, 2026
Merged

feat(seo): add sitemap, robots.txt, and meta tags for Google indexing#245
DevanshuNEU merged 7 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:feature/seo-setup

Conversation

@DevanshuNEU

@DevanshuNEU DevanshuNEU commented Feb 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

SEO setup for better Google visibility and social sharing.

Changes

  • robots.txt - allows crawling, disallows private routes
  • sitemap.xml - lists public pages for Google indexing
  • index.html - added OpenGraph tags, Twitter cards, JSON-LD structured data
  • og-image.svg - template for social sharing image (needs PNG conversion)

TODO after merge

  • Convert og-image.svg to og-image.png (1200x630)
  • Add domain to Google Search Console
  • Submit sitemap URL
  • Test with Facebook/Twitter/LinkedIn debuggers

Summary by CodeRabbit

  • Documentation
    • Added a lightweight ideas backlog and a launch-priority roadmap with milestone-based progress tracking and completed items.
  • Chores
    • Expanded page metadata and structured data to improve SEO and social sharing.
    • Added crawler directives and a sitemap to control indexing and publish site discovery.

Greptile Summary

Added comprehensive SEO infrastructure including robots.txt, sitemap.xml, OpenGraph meta tags, Twitter cards, and JSON-LD structured data for improved Google indexing and social sharing.

Major changes:

  • robots.txt properly configured with crawler directives for public/private routes
  • sitemap.xml includes all 18 public documentation pages with appropriate priorities
  • index.html enhanced with meta tags, OpenGraph properties, Twitter cards, and schema.org structured data
  • og-image.png (81KB) and og-image.svg created for social sharing previews
  • IDEAS.md and TODO.md added as internal planning documents (consider moving to GitHub Issues or private docs folder)

Confidence Score: 4/5

  • Safe to merge with minor organizational considerations
  • SEO implementation is technically sound with proper meta tags, valid sitemap, and correct robots.txt directives. Score reduced by 1 point due to internal planning files (IDEAS.md, TODO.md) being committed to the public repo, which is generally not a best practice
  • IDEAS.md and TODO.md should be evaluated for whether they belong in the public repository

Important Files Changed

Filename Overview
frontend/public/robots.txt Added robots.txt with crawler directives, but route paths don't match actual routing structure
frontend/public/sitemap.xml Comprehensive sitemap with all public documentation routes correctly mapped
frontend/index.html Added comprehensive SEO meta tags, OpenGraph, Twitter cards, and JSON-LD structured data
IDEAS.md Internal planning document for ideas and backlog - typically not committed to repos
TODO.md Internal planning document for launch priorities - typically not committed to repos

Flowchart

flowchart TD
    A[Search Engine Crawler] -->|Reads| B[robots.txt]
    B -->|Allow /| C[Public Routes]
    B -->|Disallow /dashboard, /api, etc| D[Private Routes]
    
    A -->|Fetches| E[sitemap.xml]
    E -->|Lists 18 URLs| F[Documentation Pages]
    
    A -->|Crawls| G[index.html]
    G -->|Extracts| H[Meta Tags]
    G -->|Extracts| I[OpenGraph Data]
    G -->|Extracts| J[JSON-LD Schema]
    
    H --> K[Google Search Results]
    I --> L[Social Media Previews]
    L -->|Uses| M[og-image.png]
    J --> N[Rich Search Results]
    
    style B fill:#6366f1,color:#fff
    style E fill:#6366f1,color:#fff
    style G fill:#6366f1,color:#fff
    style M fill:#8b5cf6,color:#fff
Loading

Last reviewed commit: 1e85e79

- Add robots.txt with sitemap reference
- Add sitemap.xml for Google indexing
- Add OpenGraph tags for social sharing
- Add Twitter card meta tags
- Add JSON-LD structured data
- Add canonical URL

TODO: Create og-image.png (1200x630)
- Created og-image.svg as template (1200x630)
- Need to convert to PNG for full social platform support
- Updated meta tags to reference PNG
Consistent domain: opencodeintel.com across all files
@vercel

vercel Bot commented Feb 17, 2026

Copy link
Copy Markdown

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Feb 17, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@DevanshuNEU has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 33 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds lightweight planning docs (IDEAS.md, TODO.md), enriches site SEO metadata and structured data in frontend/index.html, and adds crawler directives and a sitemap (frontend/public/robots.txt, frontend/public/sitemap.xml) for indexing control.

Changes

Cohort / File(s) Summary
Documentation & Planning
IDEAS.md, TODO.md
New markdown backlog and launch-priority checklist files. IDEAS.md holds idea checklists; TODO.md lists P0/P1/P2 launch tasks across Frontend, DevOps/Auth, Product and a Completed section.
Frontend SEO & Structured Data
frontend/index.html
Replaced a single meta description with expanded head metadata: title, meta title/description/keywords/author, canonical URL, Open Graph and Twitter tags, robots/googlebot directives, and a JSON-LD SoftwareApplication schema script.
Crawler & Indexing
frontend/public/robots.txt, frontend/public/sitemap.xml
Added robots.txt allowing all user-agents but disallowing /dashboard, /settings, /repos/, /callback, /api/; added sitemap.xml with homepage entry and lastmod metadata.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰
I nibble notes in IDEAS, neat and bright,
I tick the TODOs by soft moonlight,
Meta tags shimmer, robots take their place,
Sitemap guides the web’s wide-open space,
A tiny rabbit cheers the code’s new face. ✨

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: adding SEO infrastructure (sitemap, robots.txt, and meta tags) for Google indexing, which aligns with the core modifications across frontend/index.html, robots.txt, and sitemap.xml.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
frontend/index.html (1)

47-66: Minor URL inconsistency in JSON-LD url field.

Line 55 uses "url": "https://opencodeintel.com" (no trailing slash), while the canonical URL (Line 18) and OG URL (Line 22) use https://opencodeintel.com/ (with trailing slash). While functionally equivalent, consistency across all URL references avoids any ambiguity for search engines normalizing your canonical URL.

Suggested fix
-      "url": "https://opencodeintel.com",
+      "url": "https://opencodeintel.com/",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/index.html` around lines 47 - 66, Update the JSON-LD script's "url"
field to match the canonical/OG URL by adding the trailing slash so it reads
"https://opencodeintel.com/"; locate the <script type="application/ld+json">
block and modify the "url" property inside that JSON object to include the
trailing slash for consistency.
frontend/public/sitemap.xml (1)

1-15: Static sitemap will become stale — consider automating generation.

Hardcoded lastmod dates (Line 5, 11) and manually maintained URL entries will drift out of sync as pages are added or updated. For a small site this is fine short-term, but consider generating the sitemap at build time (e.g., via a Vite plugin or a simple script) to keep it accurate.

Also, including /login in the sitemap is unusual — search engines generally gain little value from indexing authentication pages. Consider whether it truly needs to be discoverable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/public/sitemap.xml` around lines 1 - 15, The sitemap.xml uses
hardcoded <lastmod> dates and includes an authentication page (/login); replace
this static file with a build-time generated sitemap (e.g., a Vite plugin or a
simple Node script run at build) that enumerates your public routes, sets
<lastmod> from each page’s file mtime or CMS metadata, and optionally computes
changefreq/priority, and remove or exclude /login from the generated entries (or
keep it out and instead use noindex on auth pages). Update the build pipeline to
write/update frontend/public/sitemap.xml during CI/build so it stays accurate.
frontend/public/robots.txt (1)

4-11: Consider disallowing /api paths if the backend is served from the same origin.

The authenticated routes are correctly blocked. If the API is served under the same domain (e.g., /api/...), consider adding Disallow: /api/ to prevent crawlers from hitting API endpoints, which can generate unnecessary load and index non-HTML responses.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/public/robots.txt` around lines 4 - 11, The robots.txt currently
disallows authenticated routes but doesn't block API endpoints; if your backend
serves APIs from the same origin, update the robots.txt to include a rule like
"Disallow: /api/" so crawlers won't request or index API routes — modify the
robots.txt entries (the User-agent block where Disallow: /dashboard, /settings,
/repos/, /callback are defined) to add Disallow: /api/ (or the correct API
prefix) to prevent crawler access to API endpoints.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@frontend/index.html`:
- Around line 47-66: Update the JSON-LD script's "url" field to match the
canonical/OG URL by adding the trailing slash so it reads
"https://opencodeintel.com/"; locate the <script type="application/ld+json">
block and modify the "url" property inside that JSON object to include the
trailing slash for consistency.

In `@frontend/public/robots.txt`:
- Around line 4-11: The robots.txt currently disallows authenticated routes but
doesn't block API endpoints; if your backend serves APIs from the same origin,
update the robots.txt to include a rule like "Disallow: /api/" so crawlers won't
request or index API routes — modify the robots.txt entries (the User-agent
block where Disallow: /dashboard, /settings, /repos/, /callback are defined) to
add Disallow: /api/ (or the correct API prefix) to prevent crawler access to API
endpoints.

In `@frontend/public/sitemap.xml`:
- Around line 1-15: The sitemap.xml uses hardcoded <lastmod> dates and includes
an authentication page (/login); replace this static file with a build-time
generated sitemap (e.g., a Vite plugin or a simple Node script run at build)
that enumerates your public routes, sets <lastmod> from each page’s file mtime
or CMS metadata, and optionally computes changefreq/priority, and remove or
exclude /login from the generated entries (or keep it out and instead use
noindex on auth pages). Update the build pipeline to write/update
frontend/public/sitemap.xml during CI/build so it stays accurate.

- Add trailing slash to JSON-LD url for consistency
- Add /api/ to robots.txt disallow rules (defensive)
- Remove /login from sitemap.xml (auth pages shouldn't be indexed)

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment thread frontend/public/robots.txt Outdated
@greptile-apps

greptile-apps Bot commented Feb 17, 2026

Copy link
Copy Markdown
Additional Comments (1)

frontend/public/sitemap.xml
Missing many public pages that should be indexed. Consider adding /docs, /signup, and other public documentation routes like /docs/quickstart, /docs/mcp-setup, etc.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: frontend/public/sitemap.xml
Line: 3:14

Comment:
Missing many public pages that should be indexed. Consider adding `/docs`, `/signup`, and other public documentation routes like `/docs/quickstart`, `/docs/mcp-setup`, etc.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
frontend/index.html (3)

47-65: Consider using "@type": "WebApplication" instead of "SoftwareApplication".

Since OpenCodeIntel is a web-based tool, WebApplication (a subtype of SoftwareApplication in Schema.org) is a more precise fit and may yield better rich results for browser-based applications.

Proposed fix
-      "@type": "SoftwareApplication",
+      "@type": "WebApplication",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/index.html` around lines 47 - 65, Update the JSON-LD block inside
the <script type="application/ld+json"> for OpenCodeIntel by changing the
"@type" value from "SoftwareApplication" to "WebApplication" (i.e., replace
"@type": "SoftwareApplication" with "@type": "WebApplication") so the structured
data accurately reflects a browser-based application; keep the rest of the
properties (name "OpenCodeIntel", applicationCategory, operatingSystem,
description, url, offers, author) unchanged.

37-39: Redundant robots meta directives.

index, follow is the default behavior for all crawlers, so these tags are technically unnecessary. They don't hurt anything, but they add no value either — only include them if you plan to change them to noindex for certain pages in the future.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/index.html` around lines 37 - 39, The meta tags <meta name="robots"
content="index, follow" /> and <meta name="googlebot" content="index, follow" />
are redundant since "index, follow" is the default; remove these two meta
elements from the HTML (or comment them out) unless you intend to set
non-default directives like "noindex" later—look for the <meta name="robots"...>
and <meta name="googlebot"...> entries in the document and delete or disable
them.

18-18: SPA limitation: static meta tags only cover the homepage.

Since this is a React SPA (src/main.tsx), these static OG/meta tags in index.html will be served for all routes. Social crawlers and search engines that don't execute JavaScript will see the same metadata regardless of the URL. This is fine today since the sitemap only lists the root, but if you later add more pages to the sitemap, you'll need server-side rendering or a prerendering service (e.g., react-snap, prerender.io) for per-page meta tags.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/index.html` at line 18, The current static OG/meta tags in
index.html are served for all SPA routes (see index.html and src/main.tsx), so
crawlers that don't run JS will always see homepage metadata; fix by
implementing per-route metadata via either (A) prerendering: add a prerender
step (e.g., react-snap or prerender.io) and configure it to snapshot all sitemap
routes so each route gets its own static meta tags, or (B) server-side
rendering: introduce an SSR entry that uses react-helmet (or similar) to render
per-route meta tags on the server before sending HTML. Choose one approach,
update the build/deploy pipeline to run the prerender or SSR renderer for all
routes listed in the sitemap, and remove reliance on the single static meta
block in index.html.
frontend/public/robots.txt (1)

8-12: Inconsistent trailing-slash usage in Disallow rules.

/repos/ and /api/ use a trailing slash (prefix-matches only paths under that directory), while /dashboard, /settings, and /callback omit it. In robots.txt, Disallow: /dashboard prefix-matches anything starting with /dashboard, including a hypothetical /dashboards route. Adding trailing slashes makes the intent clearer and more precise.

Proposed fix
-Disallow: /dashboard
-Disallow: /settings
+Disallow: /dashboard/
+Disallow: /settings/
 Disallow: /repos/
-Disallow: /callback
+Disallow: /callback/
 Disallow: /api/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/public/robots.txt` around lines 8 - 12, The Disallow rules are
inconsistent: /repos/ and /api/ use trailing slashes while /dashboard,
/settings, and /callback do not, causing unintended prefix-matching; update the
Disallow entries for "Disallow: /dashboard", "Disallow: /settings", and
"Disallow: /callback" to include a trailing slash (i.e. "/dashboard/",
"/settings/", "/callback/") so all rules consistently and precisely target those
directories like the existing "Disallow: /repos/" and "Disallow: /api/" entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@frontend/index.html`:
- Around line 47-65: Update the JSON-LD block inside the <script
type="application/ld+json"> for OpenCodeIntel by changing the "@type" value from
"SoftwareApplication" to "WebApplication" (i.e., replace "@type":
"SoftwareApplication" with "@type": "WebApplication") so the structured data
accurately reflects a browser-based application; keep the rest of the properties
(name "OpenCodeIntel", applicationCategory, operatingSystem, description, url,
offers, author) unchanged.
- Around line 37-39: The meta tags <meta name="robots" content="index, follow"
/> and <meta name="googlebot" content="index, follow" /> are redundant since
"index, follow" is the default; remove these two meta elements from the HTML (or
comment them out) unless you intend to set non-default directives like "noindex"
later—look for the <meta name="robots"...> and <meta name="googlebot"...>
entries in the document and delete or disable them.
- Line 18: The current static OG/meta tags in index.html are served for all SPA
routes (see index.html and src/main.tsx), so crawlers that don't run JS will
always see homepage metadata; fix by implementing per-route metadata via either
(A) prerendering: add a prerender step (e.g., react-snap or prerender.io) and
configure it to snapshot all sitemap routes so each route gets its own static
meta tags, or (B) server-side rendering: introduce an SSR entry that uses
react-helmet (or similar) to render per-route meta tags on the server before
sending HTML. Choose one approach, update the build/deploy pipeline to run the
prerender or SSR renderer for all routes listed in the sitemap, and remove
reliance on the single static meta block in index.html.

In `@frontend/public/robots.txt`:
- Around line 8-12: The Disallow rules are inconsistent: /repos/ and /api/ use
trailing slashes while /dashboard, /settings, and /callback do not, causing
unintended prefix-matching; update the Disallow entries for "Disallow:
/dashboard", "Disallow: /settings", and "Disallow: /callback" to include a
trailing slash (i.e. "/dashboard/", "/settings/", "/callback/") so all rules
consistently and precisely target those directories like the existing "Disallow:
/repos/" and "Disallow: /api/" entries.

…temap

robots.txt:
- Fix /github/callback (was /callback)
- Remove non-existent /repos/ and /settings
- Add /login and /signup to disallow

sitemap.xml:
- Add all 18 public documentation pages
- Remove hardcoded lastmod dates
- Set appropriate priority levels

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment thread IDEAS.md Outdated
Comment thread TODO.md Outdated
- Remove IDEAS.md and TODO.md from tracking
- Add to .gitignore to prevent future commits
@vercel

vercel Bot commented Feb 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
opencodeintel Ignored Ignored Preview Feb 17, 2026 2:50am

@DevanshuNEU DevanshuNEU merged commit b42c6af into OpenCodeIntel:main Feb 17, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant