Change package URL output to Google Maven format#310
Conversation
📝 WalkthroughWalkthroughReworked download URL resolution: Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/fosslight_dependency/_package_manager.py (1)
580-590: Use a module-level Session, cache on artifact key, and narrow exception handling.
get_google_maven_urlis called during resolution for each artifact, issuing a blocking HTTP HEAD (5s timeout, redirects followed) with no connection reuse. For projects with many Android/Maven dependencies, this causes repeated TLS handshakes to the same host (dl.google.com). A couple of straightforward improvements:
- Reuse a
requests.Session(module-level) to maintain persistent TCP connections across calls, yielding significant latency reduction for repeated requests to the same host.- Wrap the function with
functools.lru_cacheon(mvnrepo_url, group_id, artifact_id, version)to avoid re-probing identical keys across modules/configurations.- Narrow the blanket
except Exceptiontorequests.RequestExceptionto avoid silently swallowing unrelated errors (Ruff BLE001).♻️ Proposed change
+from functools import lru_cache + +_google_maven_session = requests.Session() + +@lru_cache(maxsize=4096) def get_google_maven_url(mvnrepo_url, group_id, artifact_id, version): group_path = group_id.replace('.', '/') pom_url = (f"https://dl.google.com/dl/android/maven2" f"/{group_path}/{artifact_id}/{version}/{artifact_id}-{version}.pom") try: - resp = requests.head(pom_url, timeout=5, allow_redirects=True) + resp = _google_maven_session.head(pom_url, timeout=5, allow_redirects=True) if resp.status_code == 200: return f"https://maven.google.com/web/index.html?#{group_id}:{artifact_id}:{version}" - except Exception: + except requests.RequestException: logger.debug(f"Failed to check Google Maven URL: {pom_url}") return f"{mvnrepo_url}{group_id}/{artifact_id}/{version}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_dependency/_package_manager.py` around lines 580 - 590, get_google_maven_url issues repeated plain requests.head calls and a broad except; fix by introducing a module-level requests.Session (e.g. _HTTP_SESSION) and use that session's head method inside get_google_maven_url, decorate get_google_maven_url with functools.lru_cache(maxsize=None) so results are cached on the (mvnrepo_url, group_id, artifact_id, version) tuple, and replace the bare except Exception with except requests.RequestException to only catch network-related errors; keep the same timeout and allow_redirects parameters when calling session.head and return the same URLs on success/failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fosslight_dependency/_package_manager.py`:
- Around line 568-590: The helper get_google_maven_url can return a Maven
Central–style URL on HEAD failure which is incorrect for artifacts already
resolved from Google; update get_download_location and get_google_maven_url so
that when actual_url contains a Google host (checked in get_download_location),
we either (a) call get_google_maven_url with an explicit fallback equal to
actual_url and return that fallback on any non-200/exception, or (b) skip the
HEAD probe and directly return the Google web index URL (constructed from
group_id:artifact_id:version) — implement option (a) by adding an optional
fallback parameter to get_google_maven_url and using it whenever requests.head
fails, and ensure get_download_location passes actual_url as fallback when the
host is Google.
---
Nitpick comments:
In `@src/fosslight_dependency/_package_manager.py`:
- Around line 580-590: get_google_maven_url issues repeated plain requests.head
calls and a broad except; fix by introducing a module-level requests.Session
(e.g. _HTTP_SESSION) and use that session's head method inside
get_google_maven_url, decorate get_google_maven_url with
functools.lru_cache(maxsize=None) so results are cached on the (mvnrepo_url,
group_id, artifact_id, version) tuple, and replace the bare except Exception
with except requests.RequestException to only catch network-related errors; keep
the same timeout and allow_redirects parameters when calling session.head and
return the same URLs on success/failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2fb5168c-fde7-425f-9fcd-3cdbd45efc63
📒 Files selected for processing (1)
src/fosslight_dependency/_package_manager.py
Signed-off-by: woocheol <jayden6659@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/fosslight_dependency/_package_manager.py (1)
580-580: Trailing comma in function signature.
def get_google_maven_url(mvnrepo_url, group_id, artifact_id, version, ):is valid Python but the dangling, )looks like an editing leftover. Drop it for clarity.-def get_google_maven_url(mvnrepo_url, group_id, artifact_id, version, ): +def get_google_maven_url(mvnrepo_url, group_id, artifact_id, version):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_dependency/_package_manager.py` at line 580, The function signature for get_google_maven_url has an extraneous trailing comma before the closing parenthesis; update the def get_google_maven_url(...) declaration to remove the dangling comma so it reads a normal parameter list (i.e., def get_google_maven_url(mvnrepo_url, group_id, artifact_id, version):), then run linter/formatting to ensure no other signature style issues remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fosslight_dependency/_package_manager.py`:
- Around line 568-590: get_download_location/get_google_maven_url currently
issues a synchronous requests.head per artifact which can cause large cumulative
delays; add a simple in-memory memoization keyed by (group_id, artifact_id,
version) in the module and have get_google_maven_url check that cache before
issuing requests.head and store both positive (Google URL) and negative
(fallback) outcomes; additionally track a short-lived failure counter (or
timestamped negative cache) to short‑circuit repeated probes after several
failures and return the mvnrepo fallback immediately; update
get_google_maven_url to use the cache and failure threshold and keep logging via
logger when a probe is skipped.
- Line 14: The module contains an unconditional "import requests" which means
the project must declare requests as a runtime dependency; add "requests" to the
dependencies section of pyproject.toml (with an appropriate version spec, e.g.
"requests>=2.28,<3") and also add it to requirements.txt if your repo maintains
one so fresh installs won't raise ImportError; verify the package is listed
under the same environment (main/runtime) and run a local install to confirm the
import resolves.
---
Nitpick comments:
In `@src/fosslight_dependency/_package_manager.py`:
- Line 580: The function signature for get_google_maven_url has an extraneous
trailing comma before the closing parenthesis; update the def
get_google_maven_url(...) declaration to remove the dangling comma so it reads a
normal parameter list (i.e., def get_google_maven_url(mvnrepo_url, group_id,
artifact_id, version):), then run linter/formatting to ensure no other signature
style issues remain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b5c95f52-6446-40ad-a6ca-fb00dcacd110
📒 Files selected for processing (1)
src/fosslight_dependency/_package_manager.py
| import subprocess | ||
| import shutil | ||
| import stat | ||
| import requests |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate dependency manifests and check for a `requests` declaration.
fd -t f -H -E .git '^(setup\.py|setup\.cfg|pyproject\.toml|requirements.*\.txt|Pipfile|Pipfile\.lock)$' \
| xargs -I{} sh -c 'echo "=== {} ==="; grep -nE "(^|[^a-zA-Z0-9_])requests([^a-zA-Z0-9_]|$)" "{}" || echo "(no match)"'Repository: fosslight/fosslight_dependency_scanner
Length of output: 390
🏁 Script executed:
#!/bin/bash
# 1. Check if setup.py exists at root
echo "=== Checking for setup.py ==="
find . -maxdepth 1 -name "setup.py" -type f
# 2. Read the main dependency files
echo -e "\n=== pyproject.toml ==="
cat pyproject.toml 2>/dev/null || echo "(file not found)"
echo -e "\n=== setup.py ==="
cat setup.py 2>/dev/null || echo "(file not found)"
echo -e "\n=== requirements.txt ==="
cat requirements.txt 2>/dev/null || echo "(file not found)"
# 3. Confirm the import statement at line 14 of _package_manager.py
echo -e "\n=== _package_manager.py lines 1-20 ==="
head -20 src/fosslight_dependency/_package_manager.pyRepository: fosslight/fosslight_dependency_scanner
Length of output: 2198
Add requests to the project's runtime dependencies.
requests is imported unconditionally at line 14 of this module. It is not currently declared in pyproject.toml or requirements.txt, which will cause fresh installs to fail with an ImportError. Add requests to the dependencies list in pyproject.toml (and requirements.txt if maintained).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/fosslight_dependency/_package_manager.py` at line 14, The module contains
an unconditional "import requests" which means the project must declare requests
as a runtime dependency; add "requests" to the dependencies section of
pyproject.toml (with an appropriate version spec, e.g. "requests>=2.28,<3") and
also add it to requirements.txt if your repo maintains one so fresh installs
won't raise ImportError; verify the package is listed under the same environment
(main/runtime) and run a local install to confirm the import resolves.
| def get_download_location(download_url_map, group_id, artifact_id, version, mvnrepo_url): | ||
| actual_key = f"{group_id}:{artifact_id}:{version}" | ||
| if download_url_map: | ||
| try: | ||
| actual_url = download_url_map.get(actual_key) | ||
|
|
||
| use_mvnrepo = True | ||
| if actual_url: | ||
| central_like = ("repo1.maven.org" in actual_url) or ("repo.maven.apache.org" in actual_url) | ||
| google_like = (("maven.google.com" in actual_url) or | ||
| ("dl.google.com/android/maven2" in actual_url) or | ||
| ("dl.google.com/dl/android/maven2" in actual_url)) | ||
| if central_like or google_like: | ||
| use_mvnrepo = True | ||
| else: | ||
| use_mvnrepo = False | ||
| except Exception as e: | ||
| logger.debug(f"Failed to get download location from download_url_map: {e}") | ||
| use_mvnrepo = True | ||
| else: | ||
| use_mvnrepo = True | ||
| if use_mvnrepo: | ||
| return f"{mvnrepo_url}{group_id}/{artifact_id}/{version}" | ||
| else: | ||
| return actual_url | ||
| actual_url = download_url_map.get(actual_key) if download_url_map else None | ||
| if actual_url: | ||
| if any(host in actual_url for host in ("repo1.maven.org", "repo.maven.apache.org")): | ||
| return f"{mvnrepo_url}{group_id}/{artifact_id}/{version}" | ||
| if not any(host in actual_url for host in ( | ||
| "maven.google.com", "dl.google.com/android/maven2", "dl.google.com/dl/android/maven2")): | ||
| return actual_url | ||
| return get_google_maven_url(mvnrepo_url, group_id, artifact_id, version) | ||
|
|
||
|
|
||
| def get_google_maven_url(mvnrepo_url, group_id, artifact_id, version, ): | ||
| group_path = group_id.replace('.', '/') | ||
| pom_url = (f"https://dl.google.com/dl/android/maven2" | ||
| f"/{group_path}/{artifact_id}/{version}/{artifact_id}-{version}.pom") | ||
| try: | ||
| resp = requests.head(pom_url, timeout=5, allow_redirects=True) | ||
| if resp.status_code == 200: | ||
| return f"https://maven.google.com/web/index.html#{group_id}:{artifact_id}:{version}" | ||
| except Exception: | ||
| logger.debug(f"Failed to check Google Maven URL: {pom_url}") | ||
| return f"{mvnrepo_url}{group_id}/{artifact_id}/{version}" |
There was a problem hiding this comment.
Per‑artifact HEAD probe can substantially slow large scans.
get_download_location is invoked once per oss_item by Android/Gradle/Maven callers. Whenever the mapped URL is None or Google‑hosted, get_google_maven_url issues a synchronous requests.head with a 5 s timeout. For projects with hundreds of dependencies, or when dl.google.com is unreachable (offline / corporate proxy), the cumulative wait time grows linearly and the same (group_id, artifact_id, version) may be probed repeatedly across duplicates. Consider memoizing the probe result and/or short‑circuiting when several consecutive probes have failed.
♻️ Suggested cache to avoid repeated network round‑trips
+_GOOGLE_MAVEN_CACHE: dict[tuple[str, str, str], str] = {}
+
def get_google_maven_url(mvnrepo_url, group_id, artifact_id, version, ):
+ cache_key = (group_id, artifact_id, version)
+ if cache_key in _GOOGLE_MAVEN_CACHE:
+ return _GOOGLE_MAVEN_CACHE[cache_key]
group_path = group_id.replace('.', '/')
pom_url = (f"https://dl.google.com/dl/android/maven2"
f"/{group_path}/{artifact_id}/{version}/{artifact_id}-{version}.pom")
+ result = f"{mvnrepo_url}{group_id}/{artifact_id}/{version}"
try:
resp = requests.head(pom_url, timeout=5, allow_redirects=True)
if resp.status_code == 200:
- return f"https://maven.google.com/web/index.html#{group_id}:{artifact_id}:{version}"
+ result = f"https://maven.google.com/web/index.html#{group_id}:{artifact_id}:{version}"
except Exception:
logger.debug(f"Failed to check Google Maven URL: {pom_url}")
- return f"{mvnrepo_url}{group_id}/{artifact_id}/{version}"
+ _GOOGLE_MAVEN_CACHE[cache_key] = result
+ return result🧰 Tools
🪛 Ruff (0.15.11)
[warning] 588-588: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/fosslight_dependency/_package_manager.py` around lines 568 - 590,
get_download_location/get_google_maven_url currently issues a synchronous
requests.head per artifact which can cause large cumulative delays; add a simple
in-memory memoization keyed by (group_id, artifact_id, version) in the module
and have get_google_maven_url check that cache before issuing requests.head and
store both positive (Google URL) and negative (fallback) outcomes; additionally
track a short-lived failure counter (or timestamped negative cache) to
short‑circuit repeated probes after several failures and return the mvnrepo
fallback immediately; update get_google_maven_url to use the cache and failure
threshold and keep logging via logger when a probe is skipped.
Description
Change package URL output to Google Maven format
Summary by CodeRabbit