XRAY-138689 - Add Poetry support for jf ca#768
Conversation
963c301 to
6fb0884
Compare
6fb0884 to
4e010d2
Compare
| //replace github.com/jfrog/jfrog-cli-artifactory => github.com/jfrog/jfrog-cli-artifactory main | ||
|
|
||
| // replace github.com/jfrog/build-info-go => github.com/jfrog/build-info-go dev | ||
| // replace github.com/jfrog/jfrog-cli-core/v2 => github.com/gauriy-tech/jfrog-cli-core/v2 v2.0.0-20260526032107-e8995d698251 |
There was a problem hiding this comment.
[Low] Remove commented personal-fork replace directive
go.mod line 158 contains:
// replace github.com/jfrog/jfrog-cli-core/v2 => github.com/gauriy-tech/jfrog-cli-core/v2 v2.0.0-20260526032107-e8995d698251This is a commented-out replace directive pointing to a personal fork (gauriy-tech). The // prefix makes it inert at build time, but it reveals that the PR was developed against a local fork override and the cleanup was incomplete.
Development-time replace directives don't belong in merged PRs. Please remove this line before merging.
(The adjacent // replace github.com/jfrog/jfrog-cli-core/v2 => github.com/jfrog/jfrog-cli-core/v2 master is a standard JFrog scaffolding comment and can stay.)
…o feature/XRAY-138689-add-poetry-support
| esac | ||
| ` | ||
| require.NoError(t, os.WriteFile(fakePoetry, []byte(script), 0755)) | ||
| require.NoError(t, os.Chmod(fakePoetry, 0755)) |
There was a problem hiding this comment.
[Nit] Redundant os.Chmod immediately after os.WriteFile with the same mode
os.WriteFile already creates the file with mode 0755. The subsequent os.Chmod(fakePoetry, 0755) is a no-op. The sibling test TestInstallPoetryDepsNonCurationErrorPropagated uses only os.WriteFile.
| require.NoError(t, os.Chmod(fakePoetry, 0755)) | |
| require.NoError(t, os.WriteFile(fakePoetry, []byte(script), 0755)) |
| return "" | ||
| } | ||
|
|
||
| func NormalizePypiName(name string) string { |
There was a problem hiding this comment.
[Low] Two PEP 503 normalization functions in the same package
This PR adds NormalizePypiName to python.go. The same package already has normalizePyPIName in python_cvs_fallback.go. Both implement PEP 503 name normalization (collapse consecutive -, _, . separators into a single -, lowercase):
// python.go:364 — new in this PR, loop-based
func NormalizePypiName(name string) string {
name = strings.ToLower(name)
var b strings.Builder
prevSep := false
for _, r := range name {
if r == '-' || r == '_' || r == '.' {
if !prevSep { b.WriteByte('-'); prevSep = true }
continue
}
b.WriteRune(r); prevSep = false
}
return b.String()
}
// python_cvs_fallback.go:49 — pre-existing, regex-based
var pypiNameNormalizeRegex = regexp.MustCompile(`[-_.]+`)
func normalizePyPIName(name string) string {
return strings.ToLower(pypiNameNormalizeRegex.ReplaceAllString(name, "-"))
}Both produce identical results today. The DRY violation means that if PEP 503 normalization ever changes, only one gets updated.
Fix: Replace calls to normalizePyPIName in python_cvs_fallback.go with NormalizePypiName and delete normalizePyPIName.
| @@ -1114,7 +1118,7 @@ func getPythonNameVersion(id string, downloadUrlsMap map[string]string) (downloa | |||
| if dl, ok := downloadUrlsMap[normalizedId]; ok { | |||
There was a problem hiding this comment.
[Low] Normalization fallback path in getPythonNameVersion is untested
Test_getPythonNameVersion covers exact-match lookups but has no test case for the normalization fallback path (lines 1116–1118):
// curationaudit.go:1116-1118 — not exercised by any test case
normalizedName := strings.ReplaceAll(strings.ToLower(strings.TrimSpace(parts[0])), "-", "_")
normalizedId := python.PythonPackageTypeIdentifier + normalizedName + ":" + strings.TrimSpace(parts[1])
if dl, ok := downloadUrlsMap[normalizedId]; ok {
downloadUrls = []string{dl}This path is reachable in practice: the dependency graph node ID can carry the raw lock name (e.g. pypi://Flask-Babel:1.0), while the download-URLs map key uses the normalized form (pypi://flask_babel:1.0). The exact lookup fails; the fallback resolves it.
Suggested test case to add to Test_getPythonNameVersion:
{
name: "hyphenated name resolved via normalization fallback",
id: "pypi://Flask-Babel:1.0",
downloadUrlsMap: map[string]string{"pypi://flask_babel:1.0": exampleUrl},
wantDownloadUrls: []string{exampleUrl},
wantName: "Flask-Babel",
wantVersion: "1.0",
},| return nil | ||
| } | ||
|
|
||
| func stripPoetrySourceBlocks(content string) string { |
There was a problem hiding this comment.
[Low] stripPoetrySourceBlocks has no unit test
stripPoetrySourceBlocks is a pure string function that strips all [[tool.poetry.source]] TOML blocks before setCurationSourceInPyproject rewrites them. It has several edge cases:
- No source blocks in the file
- Single source block
- Two consecutive source blocks (e.g. both
[[tool.poetry.source]]entries, then[tool.poetry.dependencies]) - Source block at end of file (no trailing section)
extractPoetrySourceNames (the viper-based companion) has TestExtractPoetrySourceNames covering its cases. stripPoetrySourceBlocks has nothing.
Without a unit test, a regression in the stripping logic (e.g. accidentally consuming the first line of the next section when two source blocks are adjacent) would only surface in an end-to-end integration test with a real Poetry installation.
Suggested test (table-driven):
func TestStripPoetrySourceBlocks(t *testing.T) {
cases := []struct{ name, in, want string }{
{
name: "no source blocks — content unchanged",
in: "[tool.poetry]\nname = \"x\"\n",
want: "[tool.poetry]\nname = \"x\"\n",
},
{
name: "single source block stripped",
in: "[tool.poetry]\nname = \"x\"\n[[tool.poetry.source]]\nname = \"pypi\"\nurl = \"https://pypi.org\"\n[tool.poetry.dependencies]\npython = \"^3.9\"\n",
want: "[tool.poetry]\nname = \"x\"\n[tool.poetry.dependencies]\npython = \"^3.9\"\n",
},
{
name: "source block at EOF",
in: "[tool.poetry]\nname = \"x\"\n[[tool.poetry.source]]\nname = \"pypi\"\n",
want: "[tool.poetry]\nname = \"x\"\n",
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
assert.Equal(t, tc.want, stripPoetrySourceBlocks(tc.in))
})
}
}| // recorded in poetry.lock against the listed <a href>s. | ||
| func buildPoetryDownloadUrl(rtManager artifactory.ArtifactoryServicesManager, clientDetails *httputils.HttpClientDetails, artiUrl, repository string, pkg poetryLockPackage) (string, error) { | ||
| normalized := NormalizePypiName(pkg.Name) | ||
| simpleIndexUrl := fmt.Sprintf("%s/api/pypi/%s/simple/%s/", artiUrl, repository, normalized) |
There was a problem hiding this comment.
[Medium] buildPoetryDownloadUrl bypasses the CVS pass-through endpoint
buildPoetryDownloadUrl hardcodes the raw PyPI simple-index URL:
// python.go:315
simpleIndexUrl := fmt.Sprintf("%s/api/pypi/%s/simple/%s/", artiUrl, repository, normalized)Every other package manager routes its index/resolution requests through the curation pass-through layer when IsCurationCmd=true. For pip, GetPypiRepoUrlWithCredentials does this automatically:
// jfrog-cli-artifactory — GetPypiRepoUrlWithCredentials
if isCurationCmd {
rtUrl = rtUrl.JoinPath(coreutils.CurationPassThroughApi) // "api/curation/audit/"
}
rtUrl = rtUrl.JoinPath("api/pypi", repository, "simple")
// result: <artiUrl>/api/curation/audit/api/pypi/<repo>/simple/buildPoetryDownloadUrlsMap is only called in curation mode (guarded at line 195 by if !params.IsCurationCmd { return }). Its simple-index GETs should always use the curation URL — they never run outside jf ca.
Without the api/curation/audit/ prefix, the simple-index requests bypass the CVS intercept layer. When Artifactory's CVS enforces policy at the index level (hiding or blocking package entries in the simple-index), Poetry will miss those signals entirely.
Fix: pass isCurationCmd: true into buildPoetryDownloadUrl and replace the fmt.Sprintf with GetPypiRepoUrlWithCredentials(serverDetails, repository, true) to derive the base URL — the same helper pip already uses.
| Files []string | ||
| } | ||
|
|
||
| func buildPoetryDownloadUrlsMap(serverDetails *config.ServerDetails, repository string) (map[string]string, error) { |
There was a problem hiding this comment.
[Low] Poetry's simple-index GETs are sequential and outside the parallel curation-check pool
The curation audit's parallel phase (fetchNodeStatus, bounded runner of 10 goroutines) HEAD-checks artifact download URLs. All package managers feed into that phase.
Poetry adds a serial pre-phase before it: buildPoetryDownloadUrlsMap makes one GET per lock entry sequentially to resolve filenames to download URLs:
// python.go:296 — one blocking HTTP GET per package, no concurrency
for _, pkg := range packages {
downloadUrl, lookupErr := buildPoetryDownloadUrl(rtManager, &httpClientDetails, artiUrl, repository, pkg)Other package managers have no equivalent pre-phase:
- pip reads download URLs from a local
--reportJSON written duringpip install— zero extra HTTP calls. - npm, gems, go, nuget, maven, gradle derive download URLs from string operations on the component ID.
For a project with 100 lock entries at 50 ms RTT, this is ~5 s of sequential HTTP before the parallel HEAD phase even starts. The rtManager.Client().SendGet is thread-safe.
Suggestion: a small worker pool (e.g. errgroup + 8 goroutines) would reduce this to ceil(N/8) × RTT. If that's deferred, add a comment above the loop documenting the O(N × RTT) cost so the next reader understands it's structural, not an oversight.
| if inMetadataFiles { | ||
| if strings.Contains(line, "= [") { | ||
| raw := strings.TrimSpace(strings.SplitN(line, "=", 2)[0]) | ||
| currentMetaPkg = strings.ToLower(strings.Trim(raw, `"`)) |
There was a problem hiding this comment.
[Nit] Inner raw shadows the outer loop variable — rename for clarity
The loop iterates over raw (each unprocessed line). Inside the inMetadataFiles block, a second raw := is declared for the TOML key's left-hand side — a completely different thing with the same name. go vet -shadow flags this.
| currentMetaPkg = strings.ToLower(strings.Trim(raw, `"`)) | |
| lhs := strings.TrimSpace(strings.SplitN(line, "=", 2)[0]) | |
| currentMetaPkg = strings.ToLower(strings.Trim(lhs, `"`)) |
| return false, restoreEnv, err | ||
| } | ||
|
|
||
| func wrapPoetryCurationErr(isCurationCmd bool, lockErr error) error { |
There was a problem hiding this comment.
[Low] wrapPoetryCurationErr takes isCurationCmd bool that is always true at every call site
Both call sites are structurally inside if params.IsCurationCmd { switch { ... } }:
// python.go:579-596
if params.IsCurationCmd {
switch {
case lockNeedsGenerate:
...
return false, restoreEnv, wrapPoetryCurationErr(params.IsCurationCmd, lockErr) // always true
case lockIsStale:
...
return false, restoreEnv, wrapPoetryCurationErr(params.IsCurationCmd, errors.Join(...)) // always true
}
}The isCurationCmd branch inside wrapPoetryCurationErr (if isCurationCmd && isCvsVersionFilteredOutput(...)) will never be false. The parameter was likely carried over from installPipDeps where IsCurationCmd can genuinely vary. Here it adds noise without value.
Fix: remove the isCurationCmd bool parameter from wrapPoetryCurationErr and inline the body unconditionally, or add a comment at the call sites explaining why the bool is always true.
| if len(names) == 0 { | ||
| names = []string{repoName} | ||
| } | ||
| raw, err := os.ReadFile(absPath) |
There was a problem hiding this comment.
[Low] setCurationSourceInPyproject reads pyproject.toml twice — document why
The file is read once by viper (line 646) for structured parsing, then again by os.ReadFile (line 654) for raw byte manipulation:
v := viper.New()
v.SetConfigFile(absPath)
v.ReadInConfig() // read #1 — viper parses TOML structure
names := extractPoetrySourceNames(v.Get("tool.poetry.source"))
...
raw, err := os.ReadFile(absPath) // read #2 — raw bytes for string-based rewriteThe dual-read is intentional: viper is used for structured key extraction, while the raw bytes path preserves comments and formatting that viper would destroy on marshal. But without a comment, the next reader will try to consolidate to a single viper round-trip and silently corrupt the file (losing all TOML comments and formatting).
Fix: add a comment at the os.ReadFile call explaining the two-reader design:
// viper was used above only to extract source names — it cannot faithfully
// round-trip TOML with comments preserved. Read the raw bytes separately
// for the string-based rewrite below.
raw, err := os.ReadFile(absPath)| v := parsePoetryVersion(out) | ||
| if v == "" { | ||
| log.Debug(fmt.Sprintf("Could not parse Poetry version from output: %q", out)) | ||
| return 0, errorutils.CheckErrorf("JFrog CLI poetry curation requires Poetry %s or higher to be installed.", minVersion) |
There was a problem hiding this comment.
[Low] validateMinimumPoetryVersion returns the same error string for two distinct failures
Two different failure modes produce identical user-facing messages:
// Case 1: poetry binary not found on PATH
if err != nil {
return 0, errorutils.CheckErrorf("JFrog CLI poetry curation requires Poetry %s or higher to be installed.", minVersion)
}
// Case 2: poetry found but `poetry --version` output is unrecognisable
v := parsePoetryVersion(out)
if v == "" {
return 0, errorutils.CheckErrorf("JFrog CLI poetry curation requires Poetry %s or higher to be installed.", minVersion)
}A user with a broken Poetry installation (binary on PATH but producing unexpected output) gets told to "install Poetry" instead of "your Poetry binary produced unrecognisable output". These are operationally different problems.
Fix: give the second case a distinct message:
if v == "" {
return 0, errorutils.CheckErrorf(
"could not parse Poetry version from output %q — ensure Poetry %s or higher is installed correctly",
out, minVersion,
)
}| } | ||
| // lock v1.x: files live in [metadata.files] as pkgname = [{file = "..."},] | ||
| if inMetadataFiles { | ||
| if strings.Contains(line, "= [") { |
There was a problem hiding this comment.
[Low] strings.Contains(line, "= [") is an imprecise sentinel for v1 metadata.files entries
The check looks for = [ anywhere in the line to detect a v1 [metadata.files] key-value assignment:
if strings.Contains(line, "= [") {
lhs := strings.TrimSpace(strings.SplitN(line, "=", 2)[0])
currentMetaPkg = strings.ToLower(strings.Trim(lhs, `"`))A TOML value that happens to contain = [ (e.g. a package whose description leaks into the files block, or a malformed lock) would produce a spurious currentMetaPkg update and silently drop files for a real package. The correct guard is to verify the = appears before [:
if eqIdx := strings.Index(line, "="); eqIdx > 0 && strings.Contains(line[eqIdx:], "[") {Test that demonstrates the false positive:
func TestParsePoetryLockPackagesSpuriousContains(t *testing.T) {
// A [metadata.files] block where a value line happens to contain "= ["
// before the real package assignment. Under the current strings.Contains
// guard the spurious line overwrites currentMetaPkg.
fixture := []byte(`[[package]]
name = "flask"
version = "2.0.0"
[metadata]
lock-version = "1.1"
[metadata.files]
flask = [
{file = "Flask-2.0.0.tar.gz", hash = "sha256:abc"},
]
`)
// Inject a comment-like line that contains "= [" but is not a package key.
// The parser skips blank lines and "#" lines but not arbitrary text inside
// a file-list value, so a sufficiently creative lock value can trigger this.
got := parsePoetryLockPackages(fixture)
require.Len(t, got, 1)
assert.ElementsMatch(t, []string{"Flask-2.0.0.tar.gz"}, got[0].Files,
"files must be attributed to flask, not to a spurious metadata line")
}| if err != nil { | ||
| return nil, err | ||
| } | ||
| rtManager, err := rtUtils.CreateServiceManager(serverDetails, 2, 0, false) |
There was a problem hiding this comment.
[Nit] Magic numbers in CreateServiceManager(serverDetails, 2, 0, false) — document or name them
rtManager, err := rtUtils.CreateServiceManager(serverDetails, 2, 0, false)2, 0, and false are opaque. Without reading CreateServiceManager's signature, it's impossible to know what these control (retries? timeout? dry-run?).
| rtManager, err := rtUtils.CreateServiceManager(serverDetails, 2, 0, false) | |
| // 2 retries, 0 retry-wait-millis, not dry-run. | |
| rtManager, err := rtUtils.CreateServiceManager(serverDetails, 2, 0, false) |
devbranch.go vet ./....go fmt ./....Previously jf ca had no Poetry support — running it on a Poetry project either fell back to the pip code path or produced incomplete results. This PR adds full Poetry (1.x and 2.x) support.
What changed:
Curation install via Artifactory — points Poetry's source at the api/curation/audit// endpoint and runs poetry lock against a temporary copy of the project, so all resolution routes through curation and 403 responses surface blocked packages. The user's pyproject.toml and poetry.lock are never modified. The original source name is preserved in the temp copy so existing poetry.lock entries stay valid.
Smart lock handling — checks whether poetry.lock is missing, stale, or up-to-date before changing the source URL (Poetry 1.x stores the URL in the lock and would otherwise always look stale). Generates, refreshes, or skips the lock accordingly, with automatic fallbacks for both Poetry 1.x and 2.x flag differences.
Blocked package table from poetry.lock — parses both v1 and v2 lock layouts and probes Artifactory for each package so blocked ones show up in the same table users already see for npm/pip.
CVS-blocked detection for Poetry — Poetry reports CVS-hidden versions as "doesn't match any versions" instead of a 403; that pattern is now picked up and rendered as a blocked package.
Graceful 403 handling — Poetry-emitted 403s now render the standard curation-blocked message instead of raw Poetry output.
Minimum Poetry version — 1.2.0 required for curation, with a clear error otherwise.
Testing done is documented here https://jfrog-int.atlassian.net/browse/XRAY-141531