Skip to content

XRAY-138689 - Add Poetry support for jf ca#768

Open
Phavya-jfrog wants to merge 8 commits into
jfrog:devfrom
Phavya-jfrog:feature/XRAY-138689-add-poetry-support
Open

XRAY-138689 - Add Poetry support for jf ca#768
Phavya-jfrog wants to merge 8 commits into
jfrog:devfrom
Phavya-jfrog:feature/XRAY-138689-add-poetry-support

Conversation

@Phavya-jfrog
Copy link
Copy Markdown

@Phavya-jfrog Phavya-jfrog commented May 27, 2026

  • The pull request is targeting the dev branch.
  • The code has been validated to compile successfully by running go vet ./....
  • The code has been formatted properly using go fmt ./....
  • All static analysis checks passed.
  • All tests have passed. If this feature is not already covered by the tests, new tests have been added.
  • Updated the Contributing page / ReadMe page / CI Workflow files if needed.
  • All changes are detailed at the description. if not already covered at JFrog Documentation, new documentation have been added.

Previously jf ca had no Poetry support — running it on a Poetry project either fell back to the pip code path or produced incomplete results. This PR adds full Poetry (1.x and 2.x) support.

What changed:

Curation install via Artifactory — points Poetry's source at the api/curation/audit// endpoint and runs poetry lock against a temporary copy of the project, so all resolution routes through curation and 403 responses surface blocked packages. The user's pyproject.toml and poetry.lock are never modified. The original source name is preserved in the temp copy so existing poetry.lock entries stay valid.

Smart lock handling — checks whether poetry.lock is missing, stale, or up-to-date before changing the source URL (Poetry 1.x stores the URL in the lock and would otherwise always look stale). Generates, refreshes, or skips the lock accordingly, with automatic fallbacks for both Poetry 1.x and 2.x flag differences.

Blocked package table from poetry.lock — parses both v1 and v2 lock layouts and probes Artifactory for each package so blocked ones show up in the same table users already see for npm/pip.

CVS-blocked detection for Poetry — Poetry reports CVS-hidden versions as "doesn't match any versions" instead of a 403; that pattern is now picked up and rendered as a blocked package.

Graceful 403 handling — Poetry-emitted 403s now render the standard curation-blocked message instead of raw Poetry output.

Minimum Poetry version — 1.2.0 required for curation, with a clear error otherwise.

Testing done is documented here https://jfrog-int.atlassian.net/browse/XRAY-141531

@Phavya-jfrog Phavya-jfrog force-pushed the feature/XRAY-138689-add-poetry-support branch from 963c301 to 6fb0884 Compare May 27, 2026 14:56
@Phavya-jfrog Phavya-jfrog changed the title Feature/xray 138689 add poetry support for jf ca XRAY-138689 - Add Poetry support for jf ca May 28, 2026
@Phavya-jfrog Phavya-jfrog force-pushed the feature/XRAY-138689-add-poetry-support branch from 6fb0884 to 4e010d2 Compare May 28, 2026 13:00
Comment thread sca/bom/buildinfo/technologies/python/python.go Outdated
Comment thread go.mod Outdated
//replace github.com/jfrog/jfrog-cli-artifactory => github.com/jfrog/jfrog-cli-artifactory main

// replace github.com/jfrog/build-info-go => github.com/jfrog/build-info-go dev
// replace github.com/jfrog/jfrog-cli-core/v2 => github.com/gauriy-tech/jfrog-cli-core/v2 v2.0.0-20260526032107-e8995d698251
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] Remove commented personal-fork replace directive

go.mod line 158 contains:

// replace github.com/jfrog/jfrog-cli-core/v2 => github.com/gauriy-tech/jfrog-cli-core/v2 v2.0.0-20260526032107-e8995d698251

This is a commented-out replace directive pointing to a personal fork (gauriy-tech). The // prefix makes it inert at build time, but it reveals that the PR was developed against a local fork override and the cleanup was incomplete.

Development-time replace directives don't belong in merged PRs. Please remove this line before merging.

(The adjacent // replace github.com/jfrog/jfrog-cli-core/v2 => github.com/jfrog/jfrog-cli-core/v2 master is a standard JFrog scaffolding comment and can stay.)

Comment thread sca/bom/buildinfo/technologies/python/python.go
Comment thread sca/bom/buildinfo/technologies/python/python.go
Comment thread sca/bom/buildinfo/technologies/python/python.go Outdated
Comment thread sca/bom/buildinfo/technologies/python/python.go
Comment thread sca/bom/buildinfo/technologies/python/python.go
Comment thread go.mod Outdated
…o feature/XRAY-138689-add-poetry-support
esac
`
require.NoError(t, os.WriteFile(fakePoetry, []byte(script), 0755))
require.NoError(t, os.Chmod(fakePoetry, 0755))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Redundant os.Chmod immediately after os.WriteFile with the same mode

os.WriteFile already creates the file with mode 0755. The subsequent os.Chmod(fakePoetry, 0755) is a no-op. The sibling test TestInstallPoetryDepsNonCurationErrorPropagated uses only os.WriteFile.

Suggested change
require.NoError(t, os.Chmod(fakePoetry, 0755))
require.NoError(t, os.WriteFile(fakePoetry, []byte(script), 0755))

return ""
}

func NormalizePypiName(name string) string {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] Two PEP 503 normalization functions in the same package

This PR adds NormalizePypiName to python.go. The same package already has normalizePyPIName in python_cvs_fallback.go. Both implement PEP 503 name normalization (collapse consecutive -, _, . separators into a single -, lowercase):

// python.go:364 — new in this PR, loop-based
func NormalizePypiName(name string) string {
    name = strings.ToLower(name)
    var b strings.Builder
    prevSep := false
    for _, r := range name {
        if r == '-' || r == '_' || r == '.' {
            if !prevSep { b.WriteByte('-'); prevSep = true }
            continue
        }
        b.WriteRune(r); prevSep = false
    }
    return b.String()
}

// python_cvs_fallback.go:49 — pre-existing, regex-based
var pypiNameNormalizeRegex = regexp.MustCompile(`[-_.]+`)
func normalizePyPIName(name string) string {
    return strings.ToLower(pypiNameNormalizeRegex.ReplaceAllString(name, "-"))
}

Both produce identical results today. The DRY violation means that if PEP 503 normalization ever changes, only one gets updated.

Fix: Replace calls to normalizePyPIName in python_cvs_fallback.go with NormalizePypiName and delete normalizePyPIName.

@@ -1114,7 +1118,7 @@ func getPythonNameVersion(id string, downloadUrlsMap map[string]string) (downloa
if dl, ok := downloadUrlsMap[normalizedId]; ok {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] Normalization fallback path in getPythonNameVersion is untested

Test_getPythonNameVersion covers exact-match lookups but has no test case for the normalization fallback path (lines 1116–1118):

// curationaudit.go:1116-1118 — not exercised by any test case
normalizedName := strings.ReplaceAll(strings.ToLower(strings.TrimSpace(parts[0])), "-", "_")
normalizedId := python.PythonPackageTypeIdentifier + normalizedName + ":" + strings.TrimSpace(parts[1])
if dl, ok := downloadUrlsMap[normalizedId]; ok {
    downloadUrls = []string{dl}

This path is reachable in practice: the dependency graph node ID can carry the raw lock name (e.g. pypi://Flask-Babel:1.0), while the download-URLs map key uses the normalized form (pypi://flask_babel:1.0). The exact lookup fails; the fallback resolves it.

Suggested test case to add to Test_getPythonNameVersion:

{
    name:             "hyphenated name resolved via normalization fallback",
    id:               "pypi://Flask-Babel:1.0",
    downloadUrlsMap:  map[string]string{"pypi://flask_babel:1.0": exampleUrl},
    wantDownloadUrls: []string{exampleUrl},
    wantName:         "Flask-Babel",
    wantVersion:      "1.0",
},

return nil
}

func stripPoetrySourceBlocks(content string) string {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] stripPoetrySourceBlocks has no unit test

stripPoetrySourceBlocks is a pure string function that strips all [[tool.poetry.source]] TOML blocks before setCurationSourceInPyproject rewrites them. It has several edge cases:

  • No source blocks in the file
  • Single source block
  • Two consecutive source blocks (e.g. both [[tool.poetry.source]] entries, then [tool.poetry.dependencies])
  • Source block at end of file (no trailing section)

extractPoetrySourceNames (the viper-based companion) has TestExtractPoetrySourceNames covering its cases. stripPoetrySourceBlocks has nothing.

Without a unit test, a regression in the stripping logic (e.g. accidentally consuming the first line of the next section when two source blocks are adjacent) would only surface in an end-to-end integration test with a real Poetry installation.

Suggested test (table-driven):

func TestStripPoetrySourceBlocks(t *testing.T) {
    cases := []struct{ name, in, want string }{
        {
            name: "no source blocks — content unchanged",
            in:   "[tool.poetry]\nname = \"x\"\n",
            want: "[tool.poetry]\nname = \"x\"\n",
        },
        {
            name: "single source block stripped",
            in:   "[tool.poetry]\nname = \"x\"\n[[tool.poetry.source]]\nname = \"pypi\"\nurl = \"https://pypi.org\"\n[tool.poetry.dependencies]\npython = \"^3.9\"\n",
            want: "[tool.poetry]\nname = \"x\"\n[tool.poetry.dependencies]\npython = \"^3.9\"\n",
        },
        {
            name: "source block at EOF",
            in:   "[tool.poetry]\nname = \"x\"\n[[tool.poetry.source]]\nname = \"pypi\"\n",
            want: "[tool.poetry]\nname = \"x\"\n",
        },
    }
    for _, tc := range cases {
        t.Run(tc.name, func(t *testing.T) {
            assert.Equal(t, tc.want, stripPoetrySourceBlocks(tc.in))
        })
    }
}

// recorded in poetry.lock against the listed <a href>s.
func buildPoetryDownloadUrl(rtManager artifactory.ArtifactoryServicesManager, clientDetails *httputils.HttpClientDetails, artiUrl, repository string, pkg poetryLockPackage) (string, error) {
normalized := NormalizePypiName(pkg.Name)
simpleIndexUrl := fmt.Sprintf("%s/api/pypi/%s/simple/%s/", artiUrl, repository, normalized)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] buildPoetryDownloadUrl bypasses the CVS pass-through endpoint

buildPoetryDownloadUrl hardcodes the raw PyPI simple-index URL:

// python.go:315
simpleIndexUrl := fmt.Sprintf("%s/api/pypi/%s/simple/%s/", artiUrl, repository, normalized)

Every other package manager routes its index/resolution requests through the curation pass-through layer when IsCurationCmd=true. For pip, GetPypiRepoUrlWithCredentials does this automatically:

// jfrog-cli-artifactory — GetPypiRepoUrlWithCredentials
if isCurationCmd {
    rtUrl = rtUrl.JoinPath(coreutils.CurationPassThroughApi)  // "api/curation/audit/"
}
rtUrl = rtUrl.JoinPath("api/pypi", repository, "simple")
// result: <artiUrl>/api/curation/audit/api/pypi/<repo>/simple/

buildPoetryDownloadUrlsMap is only called in curation mode (guarded at line 195 by if !params.IsCurationCmd { return }). Its simple-index GETs should always use the curation URL — they never run outside jf ca.

Without the api/curation/audit/ prefix, the simple-index requests bypass the CVS intercept layer. When Artifactory's CVS enforces policy at the index level (hiding or blocking package entries in the simple-index), Poetry will miss those signals entirely.

Fix: pass isCurationCmd: true into buildPoetryDownloadUrl and replace the fmt.Sprintf with GetPypiRepoUrlWithCredentials(serverDetails, repository, true) to derive the base URL — the same helper pip already uses.

Files []string
}

func buildPoetryDownloadUrlsMap(serverDetails *config.ServerDetails, repository string) (map[string]string, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] Poetry's simple-index GETs are sequential and outside the parallel curation-check pool

The curation audit's parallel phase (fetchNodeStatus, bounded runner of 10 goroutines) HEAD-checks artifact download URLs. All package managers feed into that phase.

Poetry adds a serial pre-phase before it: buildPoetryDownloadUrlsMap makes one GET per lock entry sequentially to resolve filenames to download URLs:

// python.go:296 — one blocking HTTP GET per package, no concurrency
for _, pkg := range packages {
    downloadUrl, lookupErr := buildPoetryDownloadUrl(rtManager, &httpClientDetails, artiUrl, repository, pkg)

Other package managers have no equivalent pre-phase:

  • pip reads download URLs from a local --report JSON written during pip install — zero extra HTTP calls.
  • npm, gems, go, nuget, maven, gradle derive download URLs from string operations on the component ID.

For a project with 100 lock entries at 50 ms RTT, this is ~5 s of sequential HTTP before the parallel HEAD phase even starts. The rtManager.Client().SendGet is thread-safe.

Suggestion: a small worker pool (e.g. errgroup + 8 goroutines) would reduce this to ceil(N/8) × RTT. If that's deferred, add a comment above the loop documenting the O(N × RTT) cost so the next reader understands it's structural, not an oversight.

if inMetadataFiles {
if strings.Contains(line, "= [") {
raw := strings.TrimSpace(strings.SplitN(line, "=", 2)[0])
currentMetaPkg = strings.ToLower(strings.Trim(raw, `"`))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Inner raw shadows the outer loop variable — rename for clarity

The loop iterates over raw (each unprocessed line). Inside the inMetadataFiles block, a second raw := is declared for the TOML key's left-hand side — a completely different thing with the same name. go vet -shadow flags this.

Suggested change
currentMetaPkg = strings.ToLower(strings.Trim(raw, `"`))
lhs := strings.TrimSpace(strings.SplitN(line, "=", 2)[0])
currentMetaPkg = strings.ToLower(strings.Trim(lhs, `"`))

return false, restoreEnv, err
}

func wrapPoetryCurationErr(isCurationCmd bool, lockErr error) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] wrapPoetryCurationErr takes isCurationCmd bool that is always true at every call site

Both call sites are structurally inside if params.IsCurationCmd { switch { ... } }:

// python.go:579-596
if params.IsCurationCmd {
    switch {
    case lockNeedsGenerate:
        ...
        return false, restoreEnv, wrapPoetryCurationErr(params.IsCurationCmd, lockErr)   // always true
    case lockIsStale:
        ...
        return false, restoreEnv, wrapPoetryCurationErr(params.IsCurationCmd, errors.Join(...))  // always true
    }
}

The isCurationCmd branch inside wrapPoetryCurationErr (if isCurationCmd && isCvsVersionFilteredOutput(...)) will never be false. The parameter was likely carried over from installPipDeps where IsCurationCmd can genuinely vary. Here it adds noise without value.

Fix: remove the isCurationCmd bool parameter from wrapPoetryCurationErr and inline the body unconditionally, or add a comment at the call sites explaining why the bool is always true.

if len(names) == 0 {
names = []string{repoName}
}
raw, err := os.ReadFile(absPath)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] setCurationSourceInPyproject reads pyproject.toml twice — document why

The file is read once by viper (line 646) for structured parsing, then again by os.ReadFile (line 654) for raw byte manipulation:

v := viper.New()
v.SetConfigFile(absPath)
v.ReadInConfig()                    // read #1 — viper parses TOML structure

names := extractPoetrySourceNames(v.Get("tool.poetry.source"))
...
raw, err := os.ReadFile(absPath)    // read #2 — raw bytes for string-based rewrite

The dual-read is intentional: viper is used for structured key extraction, while the raw bytes path preserves comments and formatting that viper would destroy on marshal. But without a comment, the next reader will try to consolidate to a single viper round-trip and silently corrupt the file (losing all TOML comments and formatting).

Fix: add a comment at the os.ReadFile call explaining the two-reader design:

// viper was used above only to extract source names — it cannot faithfully
// round-trip TOML with comments preserved. Read the raw bytes separately
// for the string-based rewrite below.
raw, err := os.ReadFile(absPath)

v := parsePoetryVersion(out)
if v == "" {
log.Debug(fmt.Sprintf("Could not parse Poetry version from output: %q", out))
return 0, errorutils.CheckErrorf("JFrog CLI poetry curation requires Poetry %s or higher to be installed.", minVersion)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] validateMinimumPoetryVersion returns the same error string for two distinct failures

Two different failure modes produce identical user-facing messages:

// Case 1: poetry binary not found on PATH
if err != nil {
    return 0, errorutils.CheckErrorf("JFrog CLI poetry curation requires Poetry %s or higher to be installed.", minVersion)
}

// Case 2: poetry found but `poetry --version` output is unrecognisable
v := parsePoetryVersion(out)
if v == "" {
    return 0, errorutils.CheckErrorf("JFrog CLI poetry curation requires Poetry %s or higher to be installed.", minVersion)
}

A user with a broken Poetry installation (binary on PATH but producing unexpected output) gets told to "install Poetry" instead of "your Poetry binary produced unrecognisable output". These are operationally different problems.

Fix: give the second case a distinct message:

if v == "" {
    return 0, errorutils.CheckErrorf(
        "could not parse Poetry version from output %q — ensure Poetry %s or higher is installed correctly",
        out, minVersion,
    )
}

}
// lock v1.x: files live in [metadata.files] as pkgname = [{file = "..."},]
if inMetadataFiles {
if strings.Contains(line, "= [") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Low] strings.Contains(line, "= [") is an imprecise sentinel for v1 metadata.files entries

The check looks for = [ anywhere in the line to detect a v1 [metadata.files] key-value assignment:

if strings.Contains(line, "= [") {
    lhs := strings.TrimSpace(strings.SplitN(line, "=", 2)[0])
    currentMetaPkg = strings.ToLower(strings.Trim(lhs, `"`))

A TOML value that happens to contain = [ (e.g. a package whose description leaks into the files block, or a malformed lock) would produce a spurious currentMetaPkg update and silently drop files for a real package. The correct guard is to verify the = appears before [:

if eqIdx := strings.Index(line, "="); eqIdx > 0 && strings.Contains(line[eqIdx:], "[") {

Test that demonstrates the false positive:

func TestParsePoetryLockPackagesSpuriousContains(t *testing.T) {
    // A [metadata.files] block where a value line happens to contain "= ["
    // before the real package assignment. Under the current strings.Contains
    // guard the spurious line overwrites currentMetaPkg.
    fixture := []byte(`[[package]]
name = "flask"
version = "2.0.0"

[metadata]
lock-version = "1.1"

[metadata.files]
flask = [
    {file = "Flask-2.0.0.tar.gz", hash = "sha256:abc"},
]
`)
    // Inject a comment-like line that contains "= [" but is not a package key.
    // The parser skips blank lines and "#" lines but not arbitrary text inside
    // a file-list value, so a sufficiently creative lock value can trigger this.
    got := parsePoetryLockPackages(fixture)
    require.Len(t, got, 1)
    assert.ElementsMatch(t, []string{"Flask-2.0.0.tar.gz"}, got[0].Files,
        "files must be attributed to flask, not to a spurious metadata line")
}

if err != nil {
return nil, err
}
rtManager, err := rtUtils.CreateServiceManager(serverDetails, 2, 0, false)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Magic numbers in CreateServiceManager(serverDetails, 2, 0, false) — document or name them

rtManager, err := rtUtils.CreateServiceManager(serverDetails, 2, 0, false)

2, 0, and false are opaque. Without reading CreateServiceManager's signature, it's impossible to know what these control (retries? timeout? dry-run?).

Suggested change
rtManager, err := rtUtils.CreateServiceManager(serverDetails, 2, 0, false)
// 2 retries, 0 retry-wait-millis, not dry-run.
rtManager, err := rtUtils.CreateServiceManager(serverDetails, 2, 0, false)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants