feat(seo): add sitemap, robots.txt, and meta tags for Google indexing by DevanshuNEU · Pull Request #245 · OpenCodeIntel/opencodeintel

DevanshuNEU · 2026-02-17T02:08:12Z

Summary

SEO setup for better Google visibility and social sharing.

Changes

robots.txt - allows crawling, disallows private routes
sitemap.xml - lists public pages for Google indexing
index.html - added OpenGraph tags, Twitter cards, JSON-LD structured data
og-image.svg - template for social sharing image (needs PNG conversion)

TODO after merge

Convert og-image.svg to og-image.png (1200x630)
Add domain to Google Search Console
Submit sitemap URL
Test with Facebook/Twitter/LinkedIn debuggers

Summary by CodeRabbit

Documentation
- Added a lightweight ideas backlog and a launch-priority roadmap with milestone-based progress tracking and completed items.
Chores
- Expanded page metadata and structured data to improve SEO and social sharing.
- Added crawler directives and a sitemap to control indexing and publish site discovery.

Greptile Summary

Added comprehensive SEO infrastructure including robots.txt, sitemap.xml, OpenGraph meta tags, Twitter cards, and JSON-LD structured data for improved Google indexing and social sharing.

Major changes:

robots.txt properly configured with crawler directives for public/private routes
sitemap.xml includes all 18 public documentation pages with appropriate priorities
index.html enhanced with meta tags, OpenGraph properties, Twitter cards, and schema.org structured data
og-image.png (81KB) and og-image.svg created for social sharing previews
IDEAS.md and TODO.md added as internal planning documents (consider moving to GitHub Issues or private docs folder)

Confidence Score: 4/5

Safe to merge with minor organizational considerations
SEO implementation is technically sound with proper meta tags, valid sitemap, and correct robots.txt directives. Score reduced by 1 point due to internal planning files (IDEAS.md, TODO.md) being committed to the public repo, which is generally not a best practice
IDEAS.md and TODO.md should be evaluated for whether they belong in the public repository

Important Files Changed

Filename	Overview
frontend/public/robots.txt	Added robots.txt with crawler directives, but route paths don't match actual routing structure
frontend/public/sitemap.xml	Comprehensive sitemap with all public documentation routes correctly mapped
frontend/index.html	Added comprehensive SEO meta tags, OpenGraph, Twitter cards, and JSON-LD structured data
IDEAS.md	Internal planning document for ideas and backlog - typically not committed to repos
TODO.md	Internal planning document for launch priorities - typically not committed to repos

Flowchart

flowchart TD
    A[Search Engine Crawler] -->|Reads| B[robots.txt]
    B -->|Allow /| C[Public Routes]
    B -->|Disallow /dashboard, /api, etc| D[Private Routes]
    
    A -->|Fetches| E[sitemap.xml]
    E -->|Lists 18 URLs| F[Documentation Pages]
    
    A -->|Crawls| G[index.html]
    G -->|Extracts| H[Meta Tags]
    G -->|Extracts| I[OpenGraph Data]
    G -->|Extracts| J[JSON-LD Schema]
    
    H --> K[Google Search Results]
    I --> L[Social Media Previews]
    L -->|Uses| M[og-image.png]
    J --> N[Rich Search Results]
    
    style B fill:#6366f1,color:#fff
    style E fill:#6366f1,color:#fff
    style G fill:#6366f1,color:#fff
    style M fill:#8b5cf6,color:#fff

_{Last reviewed commit: 1e85e79}

- Add robots.txt with sitemap reference - Add sitemap.xml for Google indexing - Add OpenGraph tags for social sharing - Add Twitter card meta tags - Add JSON-LD structured data - Add canonical URL TODO: Create og-image.png (1200x630)

- Created og-image.svg as template (1200x630) - Need to convert to PNG for full social platform support - Updated meta tags to reference PNG

Consistent domain: opencodeintel.com across all files

vercel · 2026-02-17T02:08:16Z

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-02-17T02:08:31Z

Warning

Rate limit exceeded

@DevanshuNEU has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 33 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds lightweight planning docs (IDEAS.md, TODO.md), enriches site SEO metadata and structured data in frontend/index.html, and adds crawler directives and a sitemap (frontend/public/robots.txt, frontend/public/sitemap.xml) for indexing control.

Changes

Cohort / File(s)	Summary
Documentation & Planning `IDEAS.md`, `TODO.md`	New markdown backlog and launch-priority checklist files. `IDEAS.md` holds idea checklists; `TODO.md` lists P0/P1/P2 launch tasks across Frontend, DevOps/Auth, Product and a Completed section.
Frontend SEO & Structured Data `frontend/index.html`	Replaced a single meta description with expanded head metadata: title, meta title/description/keywords/author, canonical URL, Open Graph and Twitter tags, robots/googlebot directives, and a JSON-LD SoftwareApplication schema script.
Crawler & Indexing `frontend/public/robots.txt`, `frontend/public/sitemap.xml`	Added `robots.txt` allowing all user-agents but disallowing `/dashboard`, `/settings`, `/repos/`, `/callback`, `/api/`; added `sitemap.xml` with homepage entry and lastmod metadata.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

chore: pre-launch polish - branding, favicon, README rewrite #234: Modifies frontend/index.html head metadata (title/meta tags and related SEO entries), directly related to the SEO/metadata changes in this PR.

Poem

🐰
I nibble notes in IDEAS, neat and bright,
I tick the TODOs by soft moonlight,
Meta tags shimmer, robots take their place,
Sitemap guides the web’s wide-open space,
A tiny rabbit cheers the code’s new face. ✨

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: adding SEO infrastructure (sitemap, robots.txt, and meta tags) for Google indexing, which aligns with the core modifications across frontend/index.html, robots.txt, and sitemap.xml.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

frontend/index.html (1)
47-66: Minor URL inconsistency in JSON-LD url field.

Line 55 uses "url": "https://opencodeintel.com" (no trailing slash), while the canonical URL (Line 18) and OG URL (Line 22) use https://opencodeintel.com/ (with trailing slash). While functionally equivalent, consistency across all URL references avoids any ambiguity for search engines normalizing your canonical URL.
Suggested fix
-      "url": "https://opencodeintel.com",
+      "url": "https://opencodeintel.com/",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/index.html` around lines 47 - 66, Update the JSON-LD script's "url"
field to match the canonical/OG URL by adding the trailing slash so it reads
"https://opencodeintel.com/"; locate the <script type="application/ld+json">
block and modify the "url" property inside that JSON object to include the
trailing slash for consistency.
frontend/public/sitemap.xml (1)
1-15: Static sitemap will become stale — consider automating generation.

Hardcoded lastmod dates (Line 5, 11) and manually maintained URL entries will drift out of sync as pages are added or updated. For a small site this is fine short-term, but consider generating the sitemap at build time (e.g., via a Vite plugin or a simple script) to keep it accurate.

Also, including /login in the sitemap is unusual — search engines generally gain little value from indexing authentication pages. Consider whether it truly needs to be discoverable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/public/sitemap.xml` around lines 1 - 15, The sitemap.xml uses
hardcoded <lastmod> dates and includes an authentication page (/login); replace
this static file with a build-time generated sitemap (e.g., a Vite plugin or a
simple Node script run at build) that enumerates your public routes, sets
<lastmod> from each page’s file mtime or CMS metadata, and optionally computes
changefreq/priority, and remove or exclude /login from the generated entries (or
keep it out and instead use noindex on auth pages). Update the build pipeline to
write/update frontend/public/sitemap.xml during CI/build so it stays accurate.
frontend/public/robots.txt (1)
4-11: Consider disallowing /api paths if the backend is served from the same origin.

The authenticated routes are correctly blocked. If the API is served under the same domain (e.g., /api/...), consider adding Disallow: /api/ to prevent crawlers from hitting API endpoints, which can generate unnecessary load and index non-HTML responses.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/public/robots.txt` around lines 4 - 11, The robots.txt currently
disallows authenticated routes but doesn't block API endpoints; if your backend
serves APIs from the same origin, update the robots.txt to include a rule like
"Disallow: /api/" so crawlers won't request or index API routes — modify the
robots.txt entries (the User-agent block where Disallow: /dashboard, /settings,
/repos/, /callback are defined) to add Disallow: /api/ (or the correct API
prefix) to prevent crawler access to API endpoints.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@frontend/index.html`:
- Around line 47-66: Update the JSON-LD script's "url" field to match the
canonical/OG URL by adding the trailing slash so it reads
"https://opencodeintel.com/"; locate the <script type="application/ld+json">
block and modify the "url" property inside that JSON object to include the
trailing slash for consistency.

In `@frontend/public/robots.txt`:
- Around line 4-11: The robots.txt currently disallows authenticated routes but
doesn't block API endpoints; if your backend serves APIs from the same origin,
update the robots.txt to include a rule like "Disallow: /api/" so crawlers won't
request or index API routes — modify the robots.txt entries (the User-agent
block where Disallow: /dashboard, /settings, /repos/, /callback are defined) to
add Disallow: /api/ (or the correct API prefix) to prevent crawler access to API
endpoints.

In `@frontend/public/sitemap.xml`:
- Around line 1-15: The sitemap.xml uses hardcoded <lastmod> dates and includes
an authentication page (/login); replace this static file with a build-time
generated sitemap (e.g., a Vite plugin or a simple Node script run at build)
that enumerates your public routes, sets <lastmod> from each page’s file mtime
or CMS metadata, and optionally computes changefreq/priority, and remove or
exclude /login from the generated entries (or keep it out and instead use
noindex on auth pages). Update the build pipeline to write/update
frontend/public/sitemap.xml during CI/build so it stays accurate.

- Add trailing slash to JSON-LD url for consistency - Add /api/ to robots.txt disallow rules (defensive) - Remove /login from sitemap.xml (auth pages shouldn't be indexed)

greptile-apps

_{7 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-17T02:15:17Z

Additional Comments (1)

frontend/public/sitemap.xml
Missing many public pages that should be indexed. Consider adding /docs, /signup, and other public documentation routes like /docs/quickstart, /docs/mcp-setup, etc.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review.
Path: frontend/public/sitemap.xml
Line: 3:14

Comment:
Missing many public pages that should be indexed. Consider adding `/docs`, `/signup`, and other public documentation routes like `/docs/quickstart`, `/docs/mcp-setup`, etc.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

coderabbitai

🧹 Nitpick comments (4)

frontend/index.html (3)
47-65: Consider using "@type": "WebApplication" instead of "SoftwareApplication".

Since OpenCodeIntel is a web-based tool, WebApplication (a subtype of SoftwareApplication in Schema.org) is a more precise fit and may yield better rich results for browser-based applications.
Proposed fix
-      "@type": "SoftwareApplication",
+      "@type": "WebApplication",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/index.html` around lines 47 - 65, Update the JSON-LD block inside
the <script type="application/ld+json"> for OpenCodeIntel by changing the
"@type" value from "SoftwareApplication" to "WebApplication" (i.e., replace
"@type": "SoftwareApplication" with "@type": "WebApplication") so the structured
data accurately reflects a browser-based application; keep the rest of the
properties (name "OpenCodeIntel", applicationCategory, operatingSystem,
description, url, offers, author) unchanged.
37-39: Redundant robots meta directives.

index, follow is the default behavior for all crawlers, so these tags are technically unnecessary. They don't hurt anything, but they add no value either — only include them if you plan to change them to noindex for certain pages in the future.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/index.html` around lines 37 - 39, The meta tags <meta name="robots"
content="index, follow" /> and <meta name="googlebot" content="index, follow" />
are redundant since "index, follow" is the default; remove these two meta
elements from the HTML (or comment them out) unless you intend to set
non-default directives like "noindex" later—look for the <meta name="robots"...>
and <meta name="googlebot"...> entries in the document and delete or disable
them.
18-18: SPA limitation: static meta tags only cover the homepage.

Since this is a React SPA (src/main.tsx), these static OG/meta tags in index.html will be served for all routes. Social crawlers and search engines that don't execute JavaScript will see the same metadata regardless of the URL. This is fine today since the sitemap only lists the root, but if you later add more pages to the sitemap, you'll need server-side rendering or a prerendering service (e.g., react-snap, prerender.io) for per-page meta tags.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/index.html` at line 18, The current static OG/meta tags in
index.html are served for all SPA routes (see index.html and src/main.tsx), so
crawlers that don't run JS will always see homepage metadata; fix by
implementing per-route metadata via either (A) prerendering: add a prerender
step (e.g., react-snap or prerender.io) and configure it to snapshot all sitemap
routes so each route gets its own static meta tags, or (B) server-side
rendering: introduce an SSR entry that uses react-helmet (or similar) to render
per-route meta tags on the server before sending HTML. Choose one approach,
update the build/deploy pipeline to run the prerender or SSR renderer for all
routes listed in the sitemap, and remove reliance on the single static meta
block in index.html.
frontend/public/robots.txt (1)
8-12: Inconsistent trailing-slash usage in Disallow rules.

/repos/ and /api/ use a trailing slash (prefix-matches only paths under that directory), while /dashboard, /settings, and /callback omit it. In robots.txt, Disallow: /dashboard prefix-matches anything starting with /dashboard, including a hypothetical /dashboards route. Adding trailing slashes makes the intent clearer and more precise.
Proposed fix
-Disallow: /dashboard
-Disallow: /settings
+Disallow: /dashboard/
+Disallow: /settings/
 Disallow: /repos/
-Disallow: /callback
+Disallow: /callback/
 Disallow: /api/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/public/robots.txt` around lines 8 - 12, The Disallow rules are
inconsistent: /repos/ and /api/ use trailing slashes while /dashboard,
/settings, and /callback do not, causing unintended prefix-matching; update the
Disallow entries for "Disallow: /dashboard", "Disallow: /settings", and
"Disallow: /callback" to include a trailing slash (i.e. "/dashboard/",
"/settings/", "/callback/") so all rules consistently and precisely target those
directories like the existing "Disallow: /repos/" and "Disallow: /api/" entries.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@frontend/index.html`:
- Around line 47-65: Update the JSON-LD block inside the <script
type="application/ld+json"> for OpenCodeIntel by changing the "@type" value from
"SoftwareApplication" to "WebApplication" (i.e., replace "@type":
"SoftwareApplication" with "@type": "WebApplication") so the structured data
accurately reflects a browser-based application; keep the rest of the properties
(name "OpenCodeIntel", applicationCategory, operatingSystem, description, url,
offers, author) unchanged.
- Around line 37-39: The meta tags <meta name="robots" content="index, follow"
/> and <meta name="googlebot" content="index, follow" /> are redundant since
"index, follow" is the default; remove these two meta elements from the HTML (or
comment them out) unless you intend to set non-default directives like "noindex"
later—look for the <meta name="robots"...> and <meta name="googlebot"...>
entries in the document and delete or disable them.
- Line 18: The current static OG/meta tags in index.html are served for all SPA
routes (see index.html and src/main.tsx), so crawlers that don't run JS will
always see homepage metadata; fix by implementing per-route metadata via either
(A) prerendering: add a prerender step (e.g., react-snap or prerender.io) and
configure it to snapshot all sitemap routes so each route gets its own static
meta tags, or (B) server-side rendering: introduce an SSR entry that uses
react-helmet (or similar) to render per-route meta tags on the server before
sending HTML. Choose one approach, update the build/deploy pipeline to run the
prerender or SSR renderer for all routes listed in the sitemap, and remove
reliance on the single static meta block in index.html.

In `@frontend/public/robots.txt`:
- Around line 8-12: The Disallow rules are inconsistent: /repos/ and /api/ use
trailing slashes while /dashboard, /settings, and /callback do not, causing
unintended prefix-matching; update the Disallow entries for "Disallow:
/dashboard", "Disallow: /settings", and "Disallow: /callback" to include a
trailing slash (i.e. "/dashboard/", "/settings/", "/callback/") so all rules
consistently and precisely target those directories like the existing "Disallow:
/repos/" and "Disallow: /api/" entries.

…temap robots.txt: - Fix /github/callback (was /callback) - Remove non-existent /repos/ and /settings - Add /login and /signup to disallow sitemap.xml: - Add all 18 public documentation pages - Remove hardcoded lastmod dates - Set appropriate priority levels

greptile-apps

_{7 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

- Remove IDEAS.md and TODO.md from tracking - Add to .gitignore to prevent future commits

vercel · 2026-02-17T02:50:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
opencodeintel	Ignored	Preview	Feb 17, 2026 2:50am

DevanshuNEU added 4 commits February 16, 2026 20:45

feat(seo): add OG image template (SVG)

f752628

- Created og-image.svg as template (1200x630) - Need to convert to PNG for full social platform support - Updated meta tags to reference PNG

fix(seo): change domain from .dev to .com

19aef51

Consistent domain: opencodeintel.com across all files

feat: added png format for the og-image

7563375

coderabbitai Bot reviewed Feb 17, 2026

View reviewed changes

fix(seo): address review nitpicks

8dbf51b

- Add trailing slash to JSON-LD url for consistency - Add /api/ to robots.txt disallow rules (defensive) - Remove /login from sitemap.xml (auth pages shouldn't be indexed)

greptile-apps Bot reviewed Feb 17, 2026

View reviewed changes

Comment thread frontend/public/robots.txt Outdated

coderabbitai Bot reviewed Feb 17, 2026

View reviewed changes

greptile-apps Bot reviewed Feb 17, 2026

View reviewed changes

Comment thread IDEAS.md Outdated

Comment thread TODO.md Outdated

chore: remove internal planning files from repo

b33fec1

- Remove IDEAS.md and TODO.md from tracking - Add to .gitignore to prevent future commits

DevanshuNEU merged commit b42c6af into OpenCodeIntel:main Feb 17, 2026
7 checks passed

Uh oh!

Conversation

DevanshuNEU commented Feb 17, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

TODO after merge

Summary by CodeRabbit

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

vercel Bot commented Feb 17, 2026

Uh oh!

coderabbitai Bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot commented Feb 17, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vercel Bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DevanshuNEU commented Feb 17, 2026 •

edited by greptile-apps Bot

Loading

coderabbitai Bot commented Feb 17, 2026 •

edited

Loading

vercel Bot commented Feb 17, 2026 •

edited

Loading