Skip to content

SCIX-884 feat(links): encode DOI/OpenURL special chars#879

Open
thostetler wants to merge 1 commit into
adsabs:masterfrom
thostetler:SCIX-884
Open

SCIX-884 feat(links): encode DOI/OpenURL special chars#879
thostetler wants to merge 1 commit into
adsabs:masterfrom
thostetler:SCIX-884

Conversation

@thostetler
Copy link
Copy Markdown
Member

DOI identifiers containing special characters (#, ?, spaces, brackets) were being passed raw
into URL paths and OpenURL query strings, causing broken links and bad redirects on
abstract pages.

  • Added encodeDOIPath utility that percent-encodes DOI values but preserves / as a
    legitimate path separator
  • Applied encoding in createUrlByType (gateway resolver links) and buildSameAs (JSON-LD
    structured data)
  • Fixed getOpenUrl to use encodeURIComponent per field instead of the blunt encodeURI
    wrapper, which was under-encoding values
  • Removed the heuristic in normalizeAbsPath that silently dropped the last path segment when
    a DOI had more than two components; all segments are now preserved since there is no
    reliable way to distinguish a trailing view token from a legitimate DOI suffix

Copilot AI review requested due to automatic review settings June 2, 2026 18:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens DOI/OpenURL link generation and abstract-path normalization by correctly percent-encoding special characters that were previously breaking URLs (e.g., #, ?, spaces, brackets), while also adjusting middleware behavior to preserve all identifier segments.

Risk summary: Medium. The changes touch edge middleware rewrite logic (routing) and URL construction used broadly on abstract pages. The behavior is well-motivated and has added/updated tests, but a subtle encoding-detection bug could still cause incorrect rewrites for some encoded inputs.

Changes:

  • Added encodeDOIPath() and applied it to DOI URLs (doi.org JSON-LD sameAs) and gateway resolver links.
  • Fixed OpenURL generation to encode each query parameter value with encodeURIComponent (instead of encodeURI on the whole URL).
  • Updated /abs/... middleware normalization to preserve all path segments for ambiguous multi-segment identifiers and expanded test coverage.

Findings (priority order)

  • medium — Case-sensitive detection of already-encoded identifiers can cause incorrect rewrites

    • Impact: Requests containing lowercase percent-escapes (e.g. %2f) may be treated as not encoded, leading to double-encoding or unintended rewrites/redirects for some /abs/... paths.
    • Location: src/middleware.ts (logic around hasEncodedId, comment stored on the changed hunk).
    • Minimal fix: Use a case-insensitive check (e.g. /%2f/i) instead of includes('%2F').
    • Confidence: high
  • low — Misleading test name/comment around % encoding behavior

    • Impact: Future maintainers may assume encodeDOIPath is idempotent with pre-encoded inputs when it currently is not.
    • Location: src/utils/common/__tests__/encodeDOI.test.ts
    • Minimal fix: Rename the test (and comment) to match current behavior or change implementation semantics.
    • Confidence: high

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/utils/common/encodeDOI.ts Adds encodeDOIPath utility for DOI-safe URL-path encoding while preserving /.
src/utils/common/tests/encodeDOI.test.ts Adds unit coverage for DOI path encoding edge cases.
src/middlewares/tests/middleware-helpers.test.ts Updates tests to reflect new /abs normalization behavior and uses test consistently.
src/middleware/tests/rewrite.test.ts Expands rewrite tests to cover special-character DOIs and unknown trailing segments.
src/middleware.ts Adjusts /abs normalization to preserve all identifier segments when view token is ambiguous.
src/components/Metatags/json-ld-abstract/identifiers.ts Encodes DOI values when building JSON-LD sameAs doi.org URLs.
src/components/Metatags/json-ld-abstract/tests/identifier.test.ts Adds test ensuring DOI sameAs URLs encode special characters.
src/components/AbstractSources/openUrlGenerator.ts Encodes OpenURL query parameter values with encodeURIComponent per field.
src/components/AbstractSources/linkGenerator.ts Applies DOI path encoding when constructing gateway resolver links.
src/components/AbstractSources/tests/openUrlGenerator.test.ts Adds OpenURL test asserting # is encoded in DOI values.
src/components/AbstractSources/tests/linkGenerator.test.ts Updates expected OpenURL output to match per-field encoding behavior.
src/components/AbstractSources/tests/createUrlByType.test.ts Adds focused tests for createUrlByType DOI encoding behavior.

Comment thread src/utils/common/__tests__/encodeDOI.test.ts Outdated
Comment thread src/middleware.ts
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 90.36145% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.2%. Comparing base (6a15785) to head (436ce91).

Files with missing lines Patch % Lines
src/lib/serverside/absCanonicalization.ts 87.9% 5 Missing ⚠️
src/middleware.ts 81.3% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master    #879     +/-   ##
========================================
+ Coverage    61.2%   61.2%   +0.1%     
========================================
  Files         348     349      +1     
  Lines       41373   41439     +66     
  Branches     1826    1835      +9     
========================================
+ Hits        25297   25359     +62     
- Misses      16033   16037      +4     
  Partials       43      43             
Files with missing lines Coverage Δ
src/components/AbstractSources/linkGenerator.ts 89.9% <100.0%> (+2.9%) ⬆️
src/components/AbstractSources/openUrlGenerator.ts 82.5% <100.0%> (ø)
...omponents/Metatags/json-ld-abstract/identifiers.ts 92.7% <100.0%> (+0.1%) ⬆️
src/utils/common/encodeDOI.ts 100.0% <100.0%> (ø)
src/middleware.ts 91.5% <81.3%> (-0.3%) ⬇️
src/lib/serverside/absCanonicalization.ts 68.2% <87.9%> (+4.0%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants