Skip to content

feat(categorize): enrich URL-only bookmarks before categorization#23

Closed
rafaelreis-r wants to merge 1 commit intoviperrcrypto:mainfrom
rafaelreis-r:pr/categorize-url-enrichment
Closed

feat(categorize): enrich URL-only bookmarks before categorization#23
rafaelreis-r wants to merge 1 commit intoviperrcrypto:mainfrom
rafaelreis-r:pr/categorize-url-enrichment

Conversation

@rafaelreis-r
Copy link
Copy Markdown
Contributor

Bookmarks that only contain a URL (text shorter than 20 chars) are skipped as trivial. This PR fetches their page title/description via a lightweight og-tag scraper and uses that as the enrichment text, improving categorization quality with no schema changes.

@viperrcrypto
Copy link
Copy Markdown
Owner

Closing - this permanently mutates bookmark text in the DB with no rollback path, and the spoofed Googlebot User-Agent is a concern. The URL enrichment idea is good but needs a non-destructive approach (e.g. storing enriched text in a separate field).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants