SCIX-884 feat(links): encode DOI/OpenURL special chars#879
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens DOI/OpenURL link generation and abstract-path normalization by correctly percent-encoding special characters that were previously breaking URLs (e.g., #, ?, spaces, brackets), while also adjusting middleware behavior to preserve all identifier segments.
Risk summary: Medium. The changes touch edge middleware rewrite logic (routing) and URL construction used broadly on abstract pages. The behavior is well-motivated and has added/updated tests, but a subtle encoding-detection bug could still cause incorrect rewrites for some encoded inputs.
Changes:
- Added
encodeDOIPath()and applied it to DOI URLs (doi.org JSON-LDsameAs) and gateway resolver links. - Fixed OpenURL generation to encode each query parameter value with
encodeURIComponent(instead ofencodeURIon the whole URL). - Updated
/abs/...middleware normalization to preserve all path segments for ambiguous multi-segment identifiers and expanded test coverage.
Findings (priority order)
-
medium — Case-sensitive detection of already-encoded identifiers can cause incorrect rewrites
- Impact: Requests containing lowercase percent-escapes (e.g.
%2f) may be treated as not encoded, leading to double-encoding or unintended rewrites/redirects for some/abs/...paths. - Location:
src/middleware.ts(logic aroundhasEncodedId, comment stored on the changed hunk). - Minimal fix: Use a case-insensitive check (e.g.
/%2f/i) instead ofincludes('%2F'). - Confidence: high
- Impact: Requests containing lowercase percent-escapes (e.g.
-
low — Misleading test name/comment around
%encoding behavior- Impact: Future maintainers may assume
encodeDOIPathis idempotent with pre-encoded inputs when it currently is not. - Location:
src/utils/common/__tests__/encodeDOI.test.ts - Minimal fix: Rename the test (and comment) to match current behavior or change implementation semantics.
- Confidence: high
- Impact: Future maintainers may assume
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/common/encodeDOI.ts | Adds encodeDOIPath utility for DOI-safe URL-path encoding while preserving /. |
| src/utils/common/tests/encodeDOI.test.ts | Adds unit coverage for DOI path encoding edge cases. |
| src/middlewares/tests/middleware-helpers.test.ts | Updates tests to reflect new /abs normalization behavior and uses test consistently. |
| src/middleware/tests/rewrite.test.ts | Expands rewrite tests to cover special-character DOIs and unknown trailing segments. |
| src/middleware.ts | Adjusts /abs normalization to preserve all identifier segments when view token is ambiguous. |
| src/components/Metatags/json-ld-abstract/identifiers.ts | Encodes DOI values when building JSON-LD sameAs doi.org URLs. |
| src/components/Metatags/json-ld-abstract/tests/identifier.test.ts | Adds test ensuring DOI sameAs URLs encode special characters. |
| src/components/AbstractSources/openUrlGenerator.ts | Encodes OpenURL query parameter values with encodeURIComponent per field. |
| src/components/AbstractSources/linkGenerator.ts | Applies DOI path encoding when constructing gateway resolver links. |
| src/components/AbstractSources/tests/openUrlGenerator.test.ts | Adds OpenURL test asserting # is encoded in DOI values. |
| src/components/AbstractSources/tests/linkGenerator.test.ts | Updates expected OpenURL output to match per-field encoding behavior. |
| src/components/AbstractSources/tests/createUrlByType.test.ts | Adds focused tests for createUrlByType DOI encoding behavior. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #879 +/- ##
========================================
+ Coverage 61.2% 61.2% +0.1%
========================================
Files 348 349 +1
Lines 41373 41439 +66
Branches 1826 1835 +9
========================================
+ Hits 25297 25359 +62
- Misses 16033 16037 +4
Partials 43 43
🚀 New features to boost your workflow:
|
…rewrite for multi-segment DOIs
DOI identifiers containing special characters (#, ?, spaces, brackets) were being passed raw
into URL paths and OpenURL query strings, causing broken links and bad redirects on
abstract pages.
legitimate path separator
structured data)
wrapper, which was under-encoding values
a DOI had more than two components; all segments are now preserved since there is no
reliable way to distinguish a trailing view token from a legitimate DOI suffix