Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@
Unreleased
----------

* Respect `AzureBlobFileSystem.protocol` tuple when removing protocols from fully-qualified
paths provided to `AzureBlobFileSystem` methods.


2025.8.0
--------

- Added "adlfs" to library's default user agent
- Fix issue where ``AzureBlobFile`` did not respect ``location_mode`` parameter
from parent ``AzureBlobFileSystem`` when using SAS credentials and connecting to
Expand Down
12 changes: 10 additions & 2 deletions adlfs/spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ class AzureBlobFileSystem(AsyncFileSystem):
... 'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY,})
"""

protocol = "abfs"
protocol = ("abfs", "az", "abfss")

def __init__(
self,
Expand Down Expand Up @@ -398,7 +398,15 @@ def _strip_protocol(cls, path: str):

STORE_SUFFIX = ".dfs.core.windows.net"
logger.debug(f"_strip_protocol for {path}")
if not path.startswith(("abfs://", "az://", "abfss://")):
if isinstance(cls.protocol, str):
# The protocol can be either a string or a tuple of strings.
# While the default protocol is a tuple, a user can override the protocol to be a string.
# So, this handles the case where the user has overridden the protocol, restricting
# supported protocols to a single string.
protocol_startswith = (f"{cls.protocol}://",)
Comment thread
kyleknap marked this conversation as resolved.
else:
protocol_startswith = tuple(f"{proto}://" for proto in cls.protocol)
if not path.startswith(protocol_startswith):
path = path.lstrip("/")
path = "abfs://" + path
ops = infer_storage_options(path)
Expand Down
39 changes: 37 additions & 2 deletions adlfs/tests/test_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -1776,15 +1776,15 @@ def test_find_with_prefix(storage):
]


@pytest.mark.parametrize("proto", [None, "abfs://", "az://"])
@pytest.mark.parametrize("proto", [None, "abfs://", "az://", "abfss://"])
@pytest.mark.parametrize("path", ["container/file", "container/file?versionid=1234"])
def test_strip_protocol(proto, path):
assert (
AzureBlobFileSystem._strip_protocol(f"{proto}{path}" if proto else path) == path
)


@pytest.mark.parametrize("proto", ["", "abfs://", "az://"])
@pytest.mark.parametrize("proto", ["", "abfs://", "az://", "abfss://"])
@pytest.mark.parametrize("key", ["file", "dir/file"])
@pytest.mark.parametrize("version_aware", [True, False])
@pytest.mark.parametrize("version_id", [None, "1970-01-01T00:00:00.0000000Z"])
Expand All @@ -1805,6 +1805,41 @@ def test_split_path(storage, proto, key, version_aware, version_id):
)


@pytest.mark.parametrize("unsupported_proto", ["unsupported", "wasb", "wasbs"])
def test_can_update_default_supported_protocols(storage, mocker, unsupported_proto):
mocker.patch.object(
AzureBlobFileSystem,
"protocol",
AzureBlobFileSystem.protocol + (unsupported_proto,),
)
Comment on lines +1810 to +1814
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if protocol here is meant to be overridden, lets not use mock here since thats specifically for testing.

Would be great to show how to override the protocol

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer if we stick with using mock.patch.object, mainly if we modify directly the AzureBlobFileSystem class without restoring state, the protocol mutation lives on for all test cases that are ran after it. By using the mocker fixture, we guarantee that the test automatically undoes the patch.

Technically, we can get around this by adding a fixture that restores the original AzureBlobFileSystem.protocol on each test case, but that would mean additional scaffolding when there is already the mocker fixture and pytest's builtin in monkeypatch fixture.

Other option, I'm fine with is defining a subclass of AzureBlobFileSystem in the test body and override its protocol property in order to avoid lingering state.

Thoughts?

Copy link
Copy Markdown

@kevinjqliu kevinjqliu Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense! just a nit. i think just having some docs around overriding protocol would be good. the PR description is very helpful

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set the .protocol on the instance instead of the class. That's the only real use case for "overriding" that is likely to happen.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martindurant I actually went with the approach of just overriding the protocol on the instance originally, but I realized it would not work because _strip_protocol() is a class method and will only ever reference the AzureBlobFileSystem class and not the actual instance with the overridden protocol.

So unfortunately, being able to override the protocol is not as clean as I hoped in this comment: #493 (comment) in just being able to override the protocol property on an instance... Instead, as the code sits now, it either requires users to override the protocol through direct monkeypatching of the class or subclassing.

I did explore what it would take to be able to override protocol property on instances. However, it seemed it would require a fairly complicated given how prevalent _strip_protocol() is used in fsspec implementations and did not seem worth the effort. But maybe something we can consider in the future if we see other projects/users wanting instance level overrides?

Thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yes, you are right I think.

fs = AzureBlobFileSystem(
account_name=storage.account_name,
connection_string=CONN_STR,
skip_instance_cache=True,
)
assert unsupported_proto in fs.protocol
assert (
fs._strip_protocol(f"{unsupported_proto}://container/file") == "container/file"
)
assert fs.split_path(f"{unsupported_proto}://container/file") == (
"container",
"file",
None,
)


def test_can_restrict_protocol_to_single_string(storage, mocker):
mocker.patch.object(AzureBlobFileSystem, "protocol", "abfs")
fs = AzureBlobFileSystem(
account_name=storage.account_name,
connection_string=CONN_STR,
skip_instance_cache=True,
)
assert fs.protocol == "abfs"
assert fs._strip_protocol("abfs://container/file") == "container/file"
assert fs.split_path("abfs://container/file") == ("container", "file", None)


async def test_details_versioned(storage):
from azure.storage.blob import BlobProperties

Expand Down