Skip to content

RHAIENG-5134: chore(ci) Fix RHOAI 3.3 CI test failures [rhoai-3.3]#2434

Open
mtchoum1 wants to merge 9 commits into
rhoai-3.3from
mtchoum1/green-3.3
Open

RHAIENG-5134: chore(ci) Fix RHOAI 3.3 CI test failures [rhoai-3.3]#2434
mtchoum1 wants to merge 9 commits into
rhoai-3.3from
mtchoum1/green-3.3

Conversation

@mtchoum1

@mtchoum1 mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown

Description

This PR addresses remaining CI failures on the RHOAI 3.3 branch across GHA build/test workflows, container payload validation, papermill e2e tests, and multi-arch (s390x/ppc64le) image builds.

CI / GHA infrastructure

  • Build template (.github/workflows/build-notebooks-TEMPLATE.yaml): improved matrix job orchestration, disk/prune tuning, and papermill test integration; removed obsolete gha_lvm_overlay.sh.
  • APT installation: centralized apt-get usage across workflows via the shared apt-install composite action (ISSUE#2668 backport).
  • K8s provisioning (.github/actions/provision-k8s): bumped CRI-O and Kubernetes to 1.36; added retry with exponential backoff and fallback CDN for transient pkgs.k8s.io signing-key download failures (RHOAIENG-67536).
  • Playwright tests: updated browser test action and added tests/browser/package.json5 for codeserver UI validation.
  • Disk space / FIPS: improved prune tuning in free-up-disk-space and make_test.py; fixed s390x FIPS scan via check-payload dependency update.

Container / Dockerfile fixes

  • Minimal PDF deps (RHAIENG-3347): aligned PDF export dependency installation between Dockerfile.cpu and Dockerfile.konflux.cpu; consolidated logic into install_pdf_deps.sh and removed redundant install_pandoc.sh.
  • Runtime datascience/minimal (s390x/ppc64le): updated Dockerfiles and build-args for correct multi-arch package installation on big-endian platforms.
  • TrustyAI: Dockerfile alignment between cpu/konflux variants; papermill test notebook now uses local test CSVs instead of obsolete 3.11 GitHub raw URLs that return 404.

Test / script fixes

  • Papermill runner (scripts/test_jupyter_with_papermill.sh): improved derived-image version resolution and test orchestration.
  • Datascience test notebook: refactored stack tests to reduce peak memory usage on resource-constrained CI runners.
  • Manifests: updated kustomize statefulset resources for pytorch+llmcompressor images.

How Has This Been Tested?

Note: The codeserver, s390x, and ppc64le test jobs take a long time to complete and occasionally hit workflow timeouts. Re-running the failed/timed-out jobs typically succeeds.

Self checklist (all need to be checked):

  • Ensure that you have run make test (gmake on macOS) before asking for review
  • Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Made with Cursor

Summary by CodeRabbit

  • New Features
    • Added a Konflux build switch for notebook image variants.
    • Introduced/expanded Playwright-based browser test coverage with a dedicated setup manifest.
  • Bug Fixes
    • Improved CI reliability via stronger dependency installation flow, resilient key fetching, and proactive disk/temp cleanup.
    • Increased Jupyter runtime stability (probe timing and memory resources).
    • Updated PDF/Pandoc install behavior and made notebook tests more robust with local fixtures and safer version resolution.
  • Chores
    • Refreshed CPU image bases/labels and updated dependency/version pins for consistency.

mtchoum1 and others added 7 commits June 30, 2026 11:55
…able composite action (opendatahub-io#3096)

Consolidated various `apt-get` commands across GitHub workflows into a single `apt-install` composite action for improved maintainability. Refactored workflows to leverage the new action, reducing duplication and ensuring consistent behavior.

Streamlined APT configuration by introducing CI-specific optimizations (e.g., disabling docs/man pages) and breaking steps into smaller, reusable blocks. Added `eatmydata` usage to enhance package installation performance.
The provision-k8s CI action fails intermittently when downloading the
Kubernetes apt signing key from pkgs.k8s.io due to transient 403 errors
from CloudFront WAF blocking GitHub Actions runner IPs.

Add a retry_cmd helper with exponential backoff (3 attempts, 5/10/20s
delays) and wrap the three failure-prone commands:

- Kubernetes signing key fetch: retries primary URL (pkgs.k8s.io), then
  falls back to prod-cdn.packages.k8s.io if all primary attempts fail
- CRI-O signing key fetch: retries with backoff
- apt-get update: retries with backoff
…ing (opendatahub-io#3869)

Co-authored-by: Jiri Daněk <jdanek@redhat.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backport the TrustyAI papermill fix from odh/main: test data is already
copied into the pod via kubectl cp, so remote raw.githubusercontent.com
URLs are unnecessary and fail with 404.

Co-authored-by: Cursor <cursoragent@cursor.com>
@openshift-ci openshift-ci Bot requested review from dibryant and jiridanek July 2, 2026 14:27
@openshift-ci

openshift-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign atheo89 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@mtchoum1, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 30 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a52be955-9969-46cf-a6f1-acd01cf584c1

📥 Commits

Reviewing files that changed from the base of the PR and between 1559073 and c4438ef.

📒 Files selected for processing (4)
  • jupyter/datascience/ubi9-python-3.12/Dockerfile.cpu
  • jupyter/datascience/ubi9-python-3.12/Dockerfile.konflux.cpu
  • runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu
  • runtimes/datascience/ubi9-python-3.12/Dockerfile.konflux.cpu
📝 Walkthrough

Walkthrough

This PR updates CI and build workflow storage, dependency-install, and test gating flows, while also changing base images, runtime Dockerfiles, PDF dependency handling, RStudio repository usage, TrustyAI runtime setup, and llmcompressor notebook/test support.

Changes

CI and build workflow updates

Layer / File(s) Summary
Storage path migration to /mnt/containers/storage
.github/actions/install-podman-action/action.yml, ci/cached-builds/crio.conf, ci/cached-builds/storage.conf, ci/cached-builds/kubeadm.yaml, .github/workflows/software-versions.yaml, .github/workflows/build-notebooks-TEMPLATE.yaml, .github/workflows/test-install-podman.yaml
Podman and CRI-O storage paths move to /mnt/containers/storage, TMPDIR bind-mount workarounds are removed, and related workflow setup steps create and chown the new storage directory.
Cleanup, apt-install reuse, and provisioning retries
.github/actions/free-up-disk-space/action.yml, .github/actions/provision-k8s/action.yml, .github/workflows/code-quality.yaml, .github/workflows/notebooks-digest-updater.yaml, .github/workflows/params-env.yaml, .github/workflows/pr-merge-image-delete.yml, .github/workflows/sec-scan.yml, .github/workflows/test-trivy-scan-action.yaml
Disk cleanup expands, several workflows switch to the local apt-install composite action, provisioning gains retry logic and version bumps, and trivy dependency installation is split into detection and install steps.
Build template gating and Playwright wiring
.github/workflows/build-notebooks-TEMPLATE.yaml, .github/actions/playwright-test/action.yml, tests/browser/package.json5
The build workflow adds the konflux input, environment preparation, prefetch dependency support, stricter step gating, a check-payload/go setup adjustment, and replaces inline Playwright execution with the composite action.
llmcompressor test harness updates
scripts/test_jupyter_with_papermill.sh, tests/manifests.py
The Papermill harness adds llmcompressor notebook mapping, splits execution into wrapper and runner paths, expands datascience-derived handling, and updates manifest target selection for the llmcompressor workload.

Estimated code review effort: 4 (Complex) | ~55 minutes

Image and runtime updates

Layer / File(s) Summary
Base image and build-args updates
jupyter/datascience/.../build-args/cpu.conf, runtimes/datascience/.../build-args/cpu.conf, runtimes/minimal/.../build-args/cpu.conf
Build-args files switch to new ODH base images and add lock flavor, label, product, and release metadata.
StatefulSet probe and memory changes
jupyter/datascience/.../statefulset.yaml, jupyter/minimal/.../statefulset.yaml
Notebook statefulsets increase probe delays and timing values, and the datascience statefulset raises memory limits and requests.
PDF dependency installation changes
jupyter/minimal/.../Dockerfile.cpu, jupyter/minimal/.../Dockerfile.konflux.cpu, jupyter/utils/install_pdf_deps.sh, jupyter/utils/install_pandoc.sh, jupyter/trustyai/.../Dockerfile.cpu
PDF build stages are removed, PDF dependency setup is consolidated, and pandoc installation gains architecture-specific handling in the shared script.
RStudio dnf repository restrictions
rstudio/c9s-python-3.12/*, rstudio/rhel9-python-3.12/*
RStudio Dockerfiles add repo-disabling flags for dependency installs, and the CUDA image also disables the cuda repo during nginx setup.
Runtime package arrays and labels
runtimes/datascience/.../Dockerfile*.cpu, runtimes/minimal/.../Dockerfile*.cpu
Runtime Dockerfiles switch OS package installation to Bash arrays, add a dnf cache mount for pyarrow, and replace runtime LABEL metadata blocks.
TrustyAI runtime setup and test data
jupyter/trustyai/.../Dockerfile*.cpu, jupyter/trustyai/.../test/test_notebook.ipynb
TrustyAI Dockerfiles add build constraints and move Jupyter runtime/configuration setup into the main install path, while notebook tests load local CSV fixtures instead of remote URLs.
llmcompressor dependency and notebook test updates
jupyter/pytorch+llmcompressor/.../pylock.toml, pyproject.toml, test/test_notebook.ipynb
chardet is downgraded and constrained, the notebook test helper tolerates missing expected_versions data and adds fallback lookup logic, and the dependency assertions use the new helper for llmcompressor-related packages.

Estimated code review effort: 4 (Complex) | ~50 minutes

Possibly related PRs

Suggested labels: lgtm, approved

Suggested reviewers: ysok, jiridanek, atheo89

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main change: fixing RHOAI 3.3 CI test failures.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description follows the required template and includes a detailed summary, testing notes, checklist, and merge criteria.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mtchoum1/green-3.3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

@mtchoum1 mtchoum1 changed the title chore(ci): Fix RHOAI 3.3 CI test failures [rhoai-3.3] RHAIENG-5134: chore(ci) Fix RHOAI 3.3 CI test failures [rhoai-3.3] Jul 2, 2026
@mtchoum1 mtchoum1 force-pushed the mtchoum1/green-3.3 branch from 70d9268 to 22f676e Compare July 2, 2026 14:42
@mtchoum1

mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown
Author

/build-konflux

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
jupyter/utils/install_pdf_deps.sh (1)

20-24: 🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

New ppc64le pandoc install path is unreachable dead code.

Lines 20-24 unconditionally exit 0 for both s390x and ppc64le before the script ever reaches the newly added pandoc installation logic at Lines 92-104. The EPEL-based ppc64le branch you just added (Lines 92-99) will never execute, since the script already exited earlier for that architecture. This defeats the purpose of the change (per the PR objectives, adding multi-arch pandoc support for ppc64le).

You likely need to remove ppc64le from the early skip condition (keeping the skip for s390x only, since the GitHub tarball has no s390x build and no alternate install path exists for it), or otherwise restructure the flow so the ppc64le EPEL branch is reachable.

🐛 Proposed fix
-# Skip PDF export installation for s390x and ppc64le architectures
-if [[ "$(uname -m)" == "s390x" || "$(uname -m)" == "ppc64le" ]]; then
-    echo "PDF export functionality is not supported on $(uname -m) architecture. Skipping installation."
+# Skip PDF export installation for s390x architecture (no pandoc/texlive path available)
+if [[ "$(uname -m)" == "s390x" ]]; then
+    echo "PDF export functionality is not supported on $(uname -m) architecture. Skipping installation."
     exit 0
 fi

Also applies to: 92-104

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jupyter/utils/install_pdf_deps.sh` around lines 20 - 24, The early
architecture skip in install_pdf_deps.sh makes the new ppc64le pandoc install
branch unreachable. Update the uname -m guard so only s390x exits early, and
allow ppc64le to continue into the existing pandoc installation flow in
install_pdf_deps.sh; verify the ppc64le EPEL logic remains reachable while the
s390x fallback still skips as intended.
.github/workflows/build-notebooks-TEMPLATE.yaml (1)

78-90: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Multiple GitHub Actions references are pinned to mutable tags, not commit SHAs.

actions/checkout@v6 (Lines 78, 85), actions/setup-go@v6 (Lines 141, 481), docker/login-action@v3 (Line 149), and astral-sh/setup-uv@v7 (Line 327) are all referenced by tag. Elsewhere in this same PR's files, actions are consistently pinned by commit SHA with a trailing version comment (e.g. actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5, actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2). Mutable tags can be repointed (supply-chain risk), which is exactly the convention this repo's SHA-pinning flow is meant to prevent.

As per coding guidelines, "When editing GitHub Actions or action metadata, follow the SHA pinning flow described in .github/AGENTS.md."

Also applies to: 140-149, 326-329, 475-491

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/build-notebooks-TEMPLATE.yaml around lines 78 - 90, The
workflow is still using mutable action tags, so update each referenced action in
the affected steps to a full commit SHA and keep the trailing version comment
consistent with the repo’s pinning convention. Apply this to the checkout logic
in the current block, plus the other occurrences of actions/setup-go,
docker/login-action, and astral-sh/setup-uv in the workflow. Use the existing
SHA-pinning pattern already used elsewhere in the repository and follow the
`.github/AGENTS.md` guidance for action metadata edits.

Source: Coding guidelines

ci/cached-builds/make_test.py (1)

80-80: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

wait_for_stability never receives target, so the heavy-target timeout is dead code.

run_tests calls wait_for_stability(pod) at Line 80 without the target argument, so target always defaults to "", meaning any(h in target for h in _HEAVY_TARGETS) never matches and heavy images (jupyter-datascience, jupyter-trustyai) never get the intended 200s timeout — they'll keep using the 100s timeout that presumably caused flakiness in the first place.

🐛 Proposed fix
     check_call(f"make {deploy}-{deploy_target}", shell=True)
-    wait_for_stability(pod)
+    wait_for_stability(pod, target)

Also applies to: 142-152

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci/cached-builds/make_test.py` at line 80, `wait_for_stability` is always
called without the `target` value, so its heavy-target timeout branch in
`wait_for_stability` never triggers. Update the `run_tests` call sites
(including the ones around the other `wait_for_stability` usages) to pass the
current `target` through, and make sure the `wait_for_stability(pod, target)`
signature is used consistently so `_HEAVY_TARGETS` can correctly select the
longer timeout for heavy images.
♻️ Duplicate comments (1)
jupyter/minimal/ubi9-python-3.12/Dockerfile.konflux.cpu (1)

83-88: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Same missing-retry concern as the non-konflux CPU variant.

Same issue as jupyter/minimal/ubi9-python-3.12/Dockerfile.cpu: direct ./utils/install_pdf_deps.sh call bypasses the retry wrapper used elsewhere (install_with_retry.sh texlive-install), reintroducing network-flakiness risk during builds.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jupyter/minimal/ubi9-python-3.12/Dockerfile.konflux.cpu` around lines 83 -
88, The PDF export dependency install in the Dockerfile.konflux.cpu variant is
calling the script directly instead of using the retry wrapper, which
reintroduces flaky build failures. Update the `RUN ./utils/install_pdf_deps.sh`
step to use the same retry mechanism as other Dockerfiles by invoking
`install_with_retry.sh texlive-install`, and keep the surrounding `PDF export`
block unchanged.
🧹 Nitpick comments (5)
scripts/test_jupyter_with_papermill.sh (3)

304-320: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Docstring is now stale.

The comment still says "copies the relevant test_notebook.ipynb file," but _run_test (via _run_test_notebooks_only) now copies the entire test directory ("${repo_test_directory}/."), not just the single notebook file.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test_jupyter_with_papermill.sh` around lines 304 - 320, The
`_run_test` function’s docstring is stale because it still says the runner
copies a single `test_notebook.ipynb`, but the current flow through
`_run_test_notebooks_only` copies the whole test directory. Update the comment
above `_run_test` to describe the actual behavior, keeping the reference to
`_run_test_notebooks_only` and the test directory copy so the documentation
matches the implementation.

274-283: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add the same shellcheck suppression used elsewhere for this intentional single-quote pattern.

Static analysis flags SC2016 here, but $f is meant to be expanded inside the remote /bin/sh -c script, not locally — the same intentional pattern already documented with # shellcheck disable=SC2016 a few lines above (around _create_test_versions_source_of_truth). Adding the same comment here keeps the lint output clean and consistent with the established convention in this file.

🧹 Suggested fix
+	# shellcheck disable=SC2016
 	"${kbin}" exec "${notebook_workload_name}" -- /bin/sh -c 'f="'"${output_file_prefix}"'_error.txt"; [ -f "$f" ] || : >"$f"'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test_jupyter_with_papermill.sh` around lines 274 - 283, The remote
/bin/sh -c cleanup command in scripts/test_jupyter_with_papermill.sh is using an
intentional single-quote expansion pattern that triggers SC2016. Add the same
shellcheck suppression convention already used elsewhere in this script (around
_create_test_versions_source_of_truth) immediately before the "${kbin}" exec
invocation, so the $f expansion remains local to the remote shell script and the
lint stays consistent.

Source: Linters/SAST tools


146-149: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Hardcoded os/python-flavor string is brittle to future version bumps.

Unlike the neighboring cases which match against variables ($jupyter_minimal_notebook_id, $rocm_target_prefix), this new case hardcodes the full ubi9-python-3-12 suffix. If the Python/OS flavor for this image changes later, this exact-match case silently stops matching and falls through to the default branch, silently reverting to the wrong (non-shortened) workload name.

♻️ Suggested fix using dynamic os/python flavor
-        jupyter-pytorch-llmcompressor-ubi9-python-3-12)
+        jupyter-pytorch-llmcompressor-"${os_flavor}-${python_flavor//./-}")
             # Kustomize uses shortened namePrefix/label (llmc) for this notebook
             notebook_name="jupyter-pytorch-llmc-ubi9-python-3-12"
             ;;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test_jupyter_with_papermill.sh` around lines 146 - 149, The case in
scripts/test_jupyter_with_papermill.sh is hardcoding the full
jupyter-pytorch-llmcompressor-ubi9-python-3-12 image name, which makes the match
brittle when the OS/Python flavor changes. Update the matching logic around the
notebook_name assignment to use the same dynamic pattern as neighboring cases
(for example, derive the suffix from existing flavor variables used in the
script) so the shortened jupyter-pytorch-llmc-ubi9-python-3-12 name is selected
without depending on an exact version string.
jupyter/utils/install_pdf_deps.sh (1)

100-102: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low value

Predictable /tmp path for downloaded tarball.

Static analysis flags /tmp/pandoc.tar.gz as a predictable temp path (TOCTOU/symlink risk). In an isolated, single-process container build this risk is minimal, but mktemp would be a trivial hardening.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jupyter/utils/install_pdf_deps.sh` around lines 100 - 102, The pandoc
download step uses a predictable /tmp/pandoc.tar.gz path, which is flagged as a
temp-file hardening issue. Update the install_pdf_deps.sh flow to use a unique
temp file created with mktemp for the curl download, then pass that generated
path into the existing tar extraction step and clean it up afterward; keep the
change localized around the pandoc download/extract commands.

Source: Linters/SAST tools

.github/actions/install-podman-action/action.yml (1)

66-74: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Storage-prep block duplicated across three files.

This mkdir /mnt/containers/storage(+tmp) / chown -R sequence is repeated near-identically here, in software-versions.yaml ("Configure Podman"/"Prepare build environment"), and in build-notebooks-TEMPLATE.yaml ("Prepare build environment"). Any future change to the storage path or permissions model needs to be applied in three places consistently.

Consider factoring this into a small composite action (similar to free-up-disk-space or install-podman-action itself) that all three call sites can reuse.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/actions/install-podman-action/action.yml around lines 66 - 74, The
Podman storage-prep sequence is duplicated in this action and in the other two
build workflows, so any path or permission change must be made in multiple
places. Extract the shared mkdir/chown setup into a small reusable composite
action or helper step, then have install-podman-action, the software-versions
workflow, and the build-notebooks template invoke that shared logic instead of
keeping separate copies.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/build-notebooks-TEMPLATE.yaml:
- Around line 356-359: The “Check if we have tests or not” step in the workflow
is still directly interpolating the target expression in the run command, so
update the have-tests step to pass inputs.target through an environment variable
instead of embedding it inline. Apply the same env-var pattern used elsewhere in
this workflow, and keep the change localized to the step with id have-tests /
the has_tests.py invocation.
- Around line 260-267: The workflow still interpolates inputs.target directly
into shell commands, which risks template injection in the Prefetch hermetic
build dependencies step and the later make invocation. Move the GitHub input
into a step-level env variable such as TARGET for the relevant steps in
build-notebooks-TEMPLATE.yaml, then reference that variable inside the run
scripts instead of using ${{ inputs.target }} directly. Update the COMPONENT_DIR
assignment in the prefetch step and the make call in the same job to use the
env-backed TARGET value, keeping the existing sed transformation and command
behavior intact.
- Around line 298-310: The background stats loop in the “Build: make ${{
inputs.target }}” step is started without lifecycle management, so it can
continue after make finishes. Update this step to capture the PID of the `(while
true; do ... sleep 30; done) &` process, run `make ${{ inputs.target }}`, and
then ensure the loop is terminated and waited on before the step exits. Use the
existing shell block in build-notebooks-TEMPLATE.yaml to add cleanup around the
make invocation.
- Around line 260-297: The prefetch step is using an incomplete target-to-path
conversion in the Prefetch hermetic build dependencies script, so targets with
multiple hyphens can’t resolve to the correct component directory. Update the
COMPONENT_DIR derivation in the workflow step to use the same target→path
mapping logic used elsewhere, and verify the check against prefetch-input and
the downstream calls to prefetch-all.sh and post-prefetch.sh all use the
corrected path.

In `@jupyter/minimal/ubi9-python-3.12/Dockerfile.cpu`:
- Around line 94-99: The PDF export dependency setup in the Dockerfile.cpu block
is missing the retry wrapper used by the ROCm variant, so it can fail on
transient network issues. Update the dependency install step to use the same
retry path as Dockerfile.rocm by invoking install_with_retry.sh with the
texlive-install target instead of calling install_pdf_deps.sh directly, while
keeping the PATH export unchanged.

In `@tests/manifests.py`:
- Around line 328-331: The runtime llmcompressor manifest mapping is pointing at
the base notebook imagestream instead of the llmcompressor imagestream. Update
the manifests mapping entry for
runtime-cuda-pytorch-llmcompressor-ubi9-python-3.12 in tests/manifests.py so it
references jupyter-pytorch-llmcompressor-imagestream.yaml, keeping it aligned
with the cuda-jupyter-pytorch-llmcompressor-ubi9-python-3.12 target.

---

Outside diff comments:
In @.github/workflows/build-notebooks-TEMPLATE.yaml:
- Around line 78-90: The workflow is still using mutable action tags, so update
each referenced action in the affected steps to a full commit SHA and keep the
trailing version comment consistent with the repo’s pinning convention. Apply
this to the checkout logic in the current block, plus the other occurrences of
actions/setup-go, docker/login-action, and astral-sh/setup-uv in the workflow.
Use the existing SHA-pinning pattern already used elsewhere in the repository
and follow the `.github/AGENTS.md` guidance for action metadata edits.

In `@ci/cached-builds/make_test.py`:
- Line 80: `wait_for_stability` is always called without the `target` value, so
its heavy-target timeout branch in `wait_for_stability` never triggers. Update
the `run_tests` call sites (including the ones around the other
`wait_for_stability` usages) to pass the current `target` through, and make sure
the `wait_for_stability(pod, target)` signature is used consistently so
`_HEAVY_TARGETS` can correctly select the longer timeout for heavy images.

In `@jupyter/utils/install_pdf_deps.sh`:
- Around line 20-24: The early architecture skip in install_pdf_deps.sh makes
the new ppc64le pandoc install branch unreachable. Update the uname -m guard so
only s390x exits early, and allow ppc64le to continue into the existing pandoc
installation flow in install_pdf_deps.sh; verify the ppc64le EPEL logic remains
reachable while the s390x fallback still skips as intended.

---

Duplicate comments:
In `@jupyter/minimal/ubi9-python-3.12/Dockerfile.konflux.cpu`:
- Around line 83-88: The PDF export dependency install in the
Dockerfile.konflux.cpu variant is calling the script directly instead of using
the retry wrapper, which reintroduces flaky build failures. Update the `RUN
./utils/install_pdf_deps.sh` step to use the same retry mechanism as other
Dockerfiles by invoking `install_with_retry.sh texlive-install`, and keep the
surrounding `PDF export` block unchanged.

---

Nitpick comments:
In @.github/actions/install-podman-action/action.yml:
- Around line 66-74: The Podman storage-prep sequence is duplicated in this
action and in the other two build workflows, so any path or permission change
must be made in multiple places. Extract the shared mkdir/chown setup into a
small reusable composite action or helper step, then have install-podman-action,
the software-versions workflow, and the build-notebooks template invoke that
shared logic instead of keeping separate copies.

In `@jupyter/utils/install_pdf_deps.sh`:
- Around line 100-102: The pandoc download step uses a predictable
/tmp/pandoc.tar.gz path, which is flagged as a temp-file hardening issue. Update
the install_pdf_deps.sh flow to use a unique temp file created with mktemp for
the curl download, then pass that generated path into the existing tar
extraction step and clean it up afterward; keep the change localized around the
pandoc download/extract commands.

In `@scripts/test_jupyter_with_papermill.sh`:
- Around line 304-320: The `_run_test` function’s docstring is stale because it
still says the runner copies a single `test_notebook.ipynb`, but the current
flow through `_run_test_notebooks_only` copies the whole test directory. Update
the comment above `_run_test` to describe the actual behavior, keeping the
reference to `_run_test_notebooks_only` and the test directory copy so the
documentation matches the implementation.
- Around line 274-283: The remote /bin/sh -c cleanup command in
scripts/test_jupyter_with_papermill.sh is using an intentional single-quote
expansion pattern that triggers SC2016. Add the same shellcheck suppression
convention already used elsewhere in this script (around
_create_test_versions_source_of_truth) immediately before the "${kbin}" exec
invocation, so the $f expansion remains local to the remote shell script and the
lint stays consistent.
- Around line 146-149: The case in scripts/test_jupyter_with_papermill.sh is
hardcoding the full jupyter-pytorch-llmcompressor-ubi9-python-3-12 image name,
which makes the match brittle when the OS/Python flavor changes. Update the
matching logic around the notebook_name assignment to use the same dynamic
pattern as neighboring cases (for example, derive the suffix from existing
flavor variables used in the script) so the shortened
jupyter-pytorch-llmc-ubi9-python-3-12 name is selected without depending on an
exact version string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5bdaca58-914e-4039-9054-11fc54944a2e

📥 Commits

Reviewing files that changed from the base of the PR and between 601928d and 70d9268.

⛔ Files ignored due to path filters (1)
  • scripts/check-payload/go.sum is excluded by !**/*.sum
📒 Files selected for processing (47)
  • .github/actions/free-up-disk-space/action.yml
  • .github/actions/install-podman-action/action.yml
  • .github/actions/playwright-test/action.yml
  • .github/actions/provision-k8s/action.yml
  • .github/workflows/build-notebooks-TEMPLATE.yaml
  • .github/workflows/code-quality.yaml
  • .github/workflows/notebooks-digest-updater.yaml
  • .github/workflows/params-env.yaml
  • .github/workflows/pr-merge-image-delete.yml
  • .github/workflows/sec-scan.yml
  • .github/workflows/software-versions.yaml
  • .github/workflows/test-install-podman.yaml
  • .github/workflows/test-trivy-scan-action.yaml
  • ci/cached-builds/crio.conf
  • ci/cached-builds/gha_lvm_overlay.sh
  • ci/cached-builds/kubeadm.yaml
  • ci/cached-builds/make_test.py
  • ci/cached-builds/storage.conf
  • jupyter/datascience/ubi9-python-3.12/build-args/cpu.conf
  • jupyter/datascience/ubi9-python-3.12/kustomize/base/statefulset.yaml
  • jupyter/minimal/ubi9-python-3.12/Dockerfile.cpu
  • jupyter/minimal/ubi9-python-3.12/Dockerfile.konflux.cpu
  • jupyter/minimal/ubi9-python-3.12/kustomize/base/statefulset.yaml
  • jupyter/pytorch+llmcompressor/ubi9-python-3.12/pylock.toml
  • jupyter/pytorch+llmcompressor/ubi9-python-3.12/pyproject.toml
  • jupyter/pytorch+llmcompressor/ubi9-python-3.12/test/test_notebook.ipynb
  • jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu
  • jupyter/trustyai/ubi9-python-3.12/Dockerfile.konflux.cpu
  • jupyter/trustyai/ubi9-python-3.12/test/test_notebook.ipynb
  • jupyter/utils/install_pandoc.sh
  • jupyter/utils/install_pdf_deps.sh
  • rstudio/c9s-python-3.12/Dockerfile.cpu
  • rstudio/c9s-python-3.12/Dockerfile.cuda
  • rstudio/rhel9-python-3.12/Dockerfile.cpu
  • rstudio/rhel9-python-3.12/Dockerfile.cuda
  • rstudio/rhel9-python-3.12/Dockerfile.konflux.cpu
  • rstudio/rhel9-python-3.12/Dockerfile.konflux.cuda
  • runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu
  • runtimes/datascience/ubi9-python-3.12/Dockerfile.konflux.cpu
  • runtimes/datascience/ubi9-python-3.12/build-args/cpu.conf
  • runtimes/minimal/ubi9-python-3.12/Dockerfile.cpu
  • runtimes/minimal/ubi9-python-3.12/Dockerfile.konflux.cpu
  • runtimes/minimal/ubi9-python-3.12/build-args/cpu.conf
  • scripts/check-payload/go.mod
  • scripts/test_jupyter_with_papermill.sh
  • tests/browser/package.json5
  • tests/manifests.py
💤 Files with no reviewable changes (2)
  • jupyter/utils/install_pandoc.sh
  • ci/cached-builds/gha_lvm_overlay.sh

Comment thread .github/workflows/build-notebooks-TEMPLATE.yaml Outdated
Comment thread .github/workflows/build-notebooks-TEMPLATE.yaml
Comment thread .github/workflows/build-notebooks-TEMPLATE.yaml Outdated
Comment thread .github/workflows/build-notebooks-TEMPLATE.yaml
Comment thread jupyter/minimal/ubi9-python-3.12/Dockerfile.cpu
Comment thread tests/manifests.py Outdated
@mtchoum1 mtchoum1 force-pushed the mtchoum1/green-3.3 branch 2 times, most recently from fcaa6b1 to 1b0d5b2 Compare July 2, 2026 15:02
@mtchoum1

mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown
Author

@coderabbitai full review

@mtchoum1

mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown
Author

/build-konflux

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@mtchoum1 mtchoum1 force-pushed the mtchoum1/green-3.3 branch from 1b0d5b2 to b3e96a4 Compare July 2, 2026 15:14
@mtchoum1

mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown
Author

/build-konflux

@mtchoum1 mtchoum1 force-pushed the mtchoum1/green-3.3 branch from b3e96a4 to 1559073 Compare July 2, 2026 16:40
@mtchoum1

mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown
Author

/build-konflux

2 similar comments
@mtchoum1

mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown
Author

/build-konflux

@mtchoum1

mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown
Author

/build-konflux

@mtchoum1 mtchoum1 force-pushed the mtchoum1/green-3.3 branch from 7c4a72f to 9a0ba82 Compare July 2, 2026 18:53
@mtchoum1 mtchoum1 force-pushed the mtchoum1/green-3.3 branch from 9a0ba82 to 1559073 Compare July 2, 2026 18:53
@mtchoum1

mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown
Author

/build-konflux

@mtchoum1 mtchoum1 force-pushed the mtchoum1/green-3.3 branch from 5786688 to c4438ef Compare July 2, 2026 19:23
@mtchoum1

mtchoum1 commented Jul 2, 2026

Copy link
Copy Markdown
Author

/build-konflux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants