[Storage] Strip sensitive auth info on cross-domain redirect#47541
[Storage] Strip sensitive auth info on cross-domain redirect#47541weirongw23-msft wants to merge 8 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a Storage pipeline policy intended to prevent credential/SAS leakage by stripping sensitive authentication headers and query parameters when an HTTP redirect crosses domains, and wires that policy into the Blob, Queue, File Share, and Data Lake pipelines.
Changes:
- Added
StorageSensitiveHeaderCleanupPolicyto scrub sensitive headers and removesigfrom the URL query whenRedirectPolicyflags a cross-domain redirect. - Inserted the new policy into the sync/async pipeline construction for Blob, Queue, File Share, and Data Lake clients.
- Added a Blob unit test covering the redirect-cleanup behavior.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/storage/azure-storage-queue/azure/storage/queue/_shared/policies.py | Adds StorageSensitiveHeaderCleanupPolicy implementation for Queue. |
| sdk/storage/azure-storage-queue/azure/storage/queue/_shared/base_client.py | Wires the new cleanup policy into the Queue sync pipeline. |
| sdk/storage/azure-storage-queue/azure/storage/queue/_shared/base_client_async.py | Wires the new cleanup policy into the Queue async pipeline. |
| sdk/storage/azure-storage-file-share/azure/storage/fileshare/_shared/policies.py | Adds StorageSensitiveHeaderCleanupPolicy implementation for File Share. |
| sdk/storage/azure-storage-file-share/azure/storage/fileshare/_shared/base_client.py | Wires the new cleanup policy into the File Share sync pipeline. |
| sdk/storage/azure-storage-file-share/azure/storage/fileshare/_shared/base_client_async.py | Wires the new cleanup policy into the File Share async pipeline. |
| sdk/storage/azure-storage-file-datalake/azure/storage/filedatalake/_shared/policies.py | Adds StorageSensitiveHeaderCleanupPolicy implementation for Data Lake. |
| sdk/storage/azure-storage-file-datalake/azure/storage/filedatalake/_shared/base_client.py | Wires the new cleanup policy into the Data Lake sync pipeline. |
| sdk/storage/azure-storage-file-datalake/azure/storage/filedatalake/_shared/base_client_async.py | Wires the new cleanup policy into the Data Lake async pipeline. |
| sdk/storage/azure-storage-blob/azure/storage/blob/_shared/policies.py | Adds StorageSensitiveHeaderCleanupPolicy implementation for Blob. |
| sdk/storage/azure-storage-blob/azure/storage/blob/_shared/base_client.py | Wires the new cleanup policy into the Blob sync pipeline. |
| sdk/storage/azure-storage-blob/azure/storage/blob/_shared/base_client_async.py | Wires the new cleanup policy into the Blob async pipeline. |
| sdk/storage/azure-storage-blob/tests/test_sensitive_redirect.py | Adds unit coverage for redirect-based sensitive header/query cleanup (Blob only). |
| # Clean up request query parameters | ||
| parsed = urlparse(request.http_request.url) | ||
| kept = [ | ||
| pair | ||
| for pair in parsed.query.split("&") | ||
| if pair and pair.split("=", 1)[0] not in self._blocked_query_params | ||
| ] | ||
| request.http_request.url = urlunparse(parsed._replace(query="&".join(kept))) |
There was a problem hiding this comment.
I am okay with using parse_qsl but I don't think we should re-encode. A lot of existing SAS features (e.g. Directory-Level SAS on Blob FNS) predicates on that we do not re-encode as it is encoding sensitive.
There was a problem hiding this comment.
Headers can be case sensitive and the blocked header list is also case sensitive.
There was a problem hiding this comment.
Hmm yeah this is a tricky one since parse_qsl will decode so we must re-encode when we build the string back. But yeah, I remember those new SAS query params are picky about encoding... I think it's probably safer to go back to something like you had before with manual parsing. Since we are the ones building the URLs we can be somewhat certain that there is less funny business like encoded separators or empty params.
| return True | ||
|
|
||
|
|
||
| class StorageSensitiveHeaderCleanupPolicy(SansIOHTTPPolicy[HTTPRequestType, HTTPResponseType]): |
There was a problem hiding this comment.
Probably don't need any of this HTTP type stuff, just inherit from plain SansIOHTTPPolicy like all of our other policies.
| DEFAULT_SENSITIVE_HEADERS = { | ||
| "Authorization", "x-ms-authorization-auxiliary", "x-ms-copy-source", "x-ms-copy-source-authorization", | ||
| "x-ms-rename-source" | ||
| } |
There was a problem hiding this comment.
Is this how black wants this formatted? I would prefer one per line.
There was a problem hiding this comment.
black automatically formatted like this, but I will put one line each and even during future reformats it'll be okay.
| self, # pylint: disable=unused-argument | ||
| *, | ||
| blocked_redirect_headers: Optional[List[str]] = None, | ||
| blocked_query_params: Optional[List[str]] = None, |
There was a problem hiding this comment.
Let's call this blocked_redirect_query_params. I originally was going to suggest getting rid of all these customization options since this is our policy and we should just hardcode the list BUT because we pass kwargs in base_client it means that users could pass these when constructing a Storage client, which I think is good flexibility to have if a customer wants to customize this at all. So, because of that, the keyword name needs to be specific to redirect since it gets pass to our constructor.
There was a problem hiding this comment.
Sounds good, done :)
| # Clean up request query parameters | ||
| parsed = urlparse(request.http_request.url) | ||
| kept = [ | ||
| pair | ||
| for pair in parsed.query.split("&") | ||
| if pair and pair.split("=", 1)[0] not in self._blocked_query_params | ||
| ] | ||
| request.http_request.url = urlunparse(parsed._replace(query="&".join(kept))) |
There was a problem hiding this comment.
Hmm yeah this is a tricky one since parse_qsl will decode so we must re-encode when we build the string back. But yeah, I remember those new SAS query params are picky about encoding... I think it's probably safer to go back to something like you had before with manual parsing. Since we are the ones building the URLs we can be somewhat certain that there is less funny business like encoded separators or empty params.
No description provided.