Skip to content

Fix path/branch extraction removing every /blob/ and /tree/ marker, not just the leading one#149

Open
patchwright wants to merge 1 commit into
nephila:masterfrom
patchwright:fix-blob-tree-prefix-strip
Open

Fix path/branch extraction removing every /blob/ and /tree/ marker, not just the leading one#149
patchwright wants to merge 1 commit into
nephila:masterfrom
patchwright:fix-blob-tree-prefix-strip

Conversation

@patchwright

Copy link
Copy Markdown

Description

path and branch are extracted from the matched URL by removing the leading
/blob/, /tree/, /-/blob/ or /-/tree/ marker. The code did this with
str.replace(marker, ""), which removes every occurrence of the marker, not
just the leading one. When the file path or branch name legitimately contains the
same segment again, the result is silently corrupted.

For example:

>>> import giturlparse
>>> giturlparse.parse("https://github.com/owner/repo/blob/main/src/blob/utils.py").path
'main/srcutils.py'        # expected: 'main/src/blob/utils.py'
>>> giturlparse.parse("https://github.com/owner/repo/tree/feature/tree/x").branch
'featurex'                # expected: 'feature/tree/x'

The leading marker is already guaranteed by the preceding startswith(...) check,
so the fix is to slice off exactly that prefix instead of calling replace:

data["path"] = data["path_raw"][len("/blob/"):]

Slicing is used (rather than str.removeprefix) to remain compatible with the
declared python_requires = >=3.8.

Fixed in both the GitHub and GitLab platforms.

References

No existing issue.

Checklist

  • Code lint checked via inv lint (ruff, black, isort all clean on the changed files)
  • Tests added (regression cases for nested /blob/ paths and /tree/ branch names on both GitHub and GitLab; verified to fail on the previous code and pass with the fix)
  • Changelog fragment added under changes/

GitHubPlatform and GitLabPlatform built `path`/`branch` with
str.replace(marker, ""), which removes *every* occurrence of the
marker rather than only the leading one. A URL whose file path or
branch name contained the same segment again was silently corrupted:

    parse('.../blob/main/src/blob/utils.py').path  -> 'main/srcutils.py'
    parse('.../tree/feature/tree/x').branch        -> 'featurex'

The leading marker is already guaranteed by the preceding startswith
check, so slice it off instead. Slicing (rather than str.removeprefix)
keeps the declared Python 3.8 compatibility. Adds regression tests for
nested /blob/ paths and /tree/ branch names on both GitHub and GitLab.
@protoroto

Copy link
Copy Markdown
Member

@patchwright Hi! Thanks for pointing this out. In order to merge this (and to make the CI pass), I've created #150 this issue. Would you be so kind to:

  • rename the changes/149.bugfix in changes/150.bugfix (to mimick the issue number)
  • rename your branch to bugfix/issue-150-fix-blob-tree-prefix-strip
    Then the CI will not complain and I'll merge and release this as soon as possible.
    Thanks again for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants