oetf: resolve spec paths from caller's runfiles repo; restore script bundle by jiaenren · Pull Request #1114 · NVIDIA/OSMO

jiaenren · 2026-06-16T23:38:25Z

Fixes two OETF migration breakages that surfaced in the gitlab-master staging Jenkins pipeline run after MR !6103 merged:

1. Spec-file resolution under bzlmod

runner_fixture._workspace_root() uses TEST_WORKSPACE, which bzlmod always sets to _main regardless of which module the test target lives in. Tests in @osmo_workspace//test/scenarios:* that call self.workflow("test/workflow/X.yaml") resolved against <TEST_SRCDIR>/_main/ — but their YAML data deps land under <TEST_SRCDIR>/osmo_workspace+/. Every external scenario in staging Jenkins was failing with:

FileNotFoundError: spec file not found: .../runfiles/_main/test/workflow/<X>.yaml

Fix: new _caller_runfiles_repo_root() helper walks from the caller's __file__ up to the immediate child of TEST_SRCDIR — that's the runfiles repo dir that actually owns the test. _resolve_spec_path() tries this candidate first for workspace-relative paths, falling back to _workspace_root() for BUILD_WORKSPACE_DIRECTORY / pre-bzlmod compat.

Unit test added: test_runner_fixture.py::CallerRunfilesRepoRootTest covers both _main and dep-module (<name>+) layouts.

2. Script + input bundles dropped during the OETF migration

Several yamls under test/workflow/ (app_cli.yaml, dataset_cli.yaml, parallel_workflow_mount_v2.yaml, etc.) reference localpath: scripts/X.sh or localpath: input/Y. The scripts/ and input/ subdirs existed under validation/workflow/ in the internal repo but weren't moved over with the YAMLs. The yaml-only migration left every script-referencing workflow broken end-to-end (even after fix #1 unblocks the YAML load).

Fix: restore the script bundle alongside the yamls (bit-identical copy from internal validation/workflow/{scripts,input}/) so the public yamls are self-sufficient. test/workflow/BUILD's filegroup already globs scripts/** + input/** with allow_empty=True; no BUILD edit needed.

3. Platform/bucket auto-injection in `WorkflowBuilder.submit()`

Public yamls use {{platform}} placeholders with KIND-oriented default-values: (cpu-x86); on the internal staging Jenkins they need x86-5090 / agx-orin-jp6 / etc. WorkflowBuilder.submit() now auto-injects platform={fixture.default_platform} and bucket={fixture.default_bucket} as workflow variables when the caller hasn't set them explicitly. The fixture defaults are env-aware (OETF_DEFAULT_PLATFORM, OETF_DEFAULT_BUCKET), so the same scenarios run unchanged across KIND (cpu), staging (x86-5090), and other envs.

The internal-repo follow-up exports OETF_DEFAULT_PLATFORM=x86-5090 in jenkins/vars/oetfTests.groovy so this kicks in for the staging job.

Verification

Local against https://staging.osmo.nvidia.com on isaac-nightly pool with OETF_DEFAULT_PLATFORM=x86-5090:

Test	Before	After
`app-cli`	`FileNotFoundError` at 3.4s	submitted, ran 106s on staging (`app-cli-1621`); task-level exit-1 is a separate pre-existing issue
`command-validation`	`FileNotFoundError` at 3.3s	PASS
`mount-validation`	`FileNotFoundError` at 3.3s	PASS

All existing framework unit tests + pylint stay green: bazel test //test/oetf/tests:test_runner_fixture //test/oetf:runner_fixture-pylint.

Pairs with

Internal gitlab-master OSMO follow-up to bump the external submodule + export OETF_DEFAULT_PLATFORM in Jenkins is queued separately.

Summary by CodeRabbit

Tests
- Improved test fixture path handling for Bazel bzlmod execution so workspace-relative spec files resolve correctly.
- Updated CLI submission testing for localpath: to run from the staging directory and improved failure diagnostics with richer, cleaned CLI output.
- Enhanced workflow test submissions to automatically apply default platform/bucket when not explicitly provided.
- Added new CLI end-to-end test scripts and strengthened query/input fixtures, plus a new test suite validating repository discovery.
Chores
- Added/adjusted workflow YAML templates and test input files to drive platform selection via {{platform}}.

…bundle Two migration-induced breakages on the staging Jenkins OETF pipeline: 1. Spec-file resolution under bzlmod. _workspace_root() uses TEST_WORKSPACE which bzlmod always sets to "_main", regardless of which module the test target lives in. So tests in @osmo_workspace//test/scenarios:* that call self.workflow("test/workflow/X.yaml") resolved against <TEST_SRCDIR>/_main/ even though their data deps land under <TEST_SRCDIR>/osmo_workspace+/. Every external scenario in the staging Jenkins run was failing with FileNotFoundError: spec file not found: ...runfiles/_main/test/workflow/X.yaml Fix: new _caller_runfiles_repo_root() helper walks from the caller's __file__ up to the immediate child of TEST_SRCDIR — that's the runfiles repo dir that actually owns the test (e.g. osmo_workspace+ or _main). _resolve_spec_path() tries this candidate first for workspace-relative paths, falling back to _workspace_root() for the BUILD_WORKSPACE_DIRECTORY path (bazel run) and pre-bzlmod compat. 2. Script + input bundles dropped during the OETF migration. Several yamls under test/workflow/ (app_cli.yaml, dataset_cli.yaml, parallel_workflow_ mount_v2.yaml, etc.) reference localpath: scripts/X.sh or localpath: input/Y — the scripts/ and input/ subdirs existed under validation/workflow/ in the internal repo but weren't moved over with the YAMLs. The yaml-only migration left every script-referencing workflow broken end-to-end (even after fix #1 unblocks the YAML load). Fix: restore the script bundle alongside the yamls (bit-identical copy from the internal repo's validation/workflow/{scripts,input}/) so the public yamls are self-sufficient. test/workflow/BUILD's filegroup already globs scripts/** + input/** with allow_empty=True; no BUILD edit needed. Bonus: WorkflowBuilder.submit() now auto-injects platform={default_platform} and bucket={default_bucket} as workflow variables when the caller hasn't set them explicitly. Public yamls use {{platform}} placeholders with KIND- oriented default-values (cpu-x86); on the internal staging Jenkins they need x86-5090 / agx-orin-jp6 / etc. Auto-injection from the fixture's env- aware defaults (OETF_DEFAULT_PLATFORM, OETF_DEFAULT_BUCKET) lets the same scenarios run unchanged across KIND, staging, and any other env. Verification (local, against staging.osmo.nvidia.com on isaac-nightly pool with OETF_DEFAULT_PLATFORM=x86-5090): @osmo_workspace//test/scenarios:app-cli submitted → ran 106s on staging (workflow app-cli-1621) @osmo_workspace//test/scenarios:command-validation PASS @osmo_workspace//test/scenarios:mount-validation PASS (Some scenarios still fail for unrelated reasons — actual task runtime, expected-outcome drift — but the migration breakage that took out the whole suite is gone.) Unit test added: test_runner_fixture.py::CallerRunfilesRepoRootTest covers both _main and dep-module (<name>+) layouts. All existing test_runner_ fixture tests + the runner_fixture pylint target still green: bazel test //test/oetf/tests:test_runner_fixture //test/oetf:runner_fixture-pylint

coderabbitai · 2026-06-16T23:38:38Z

📝 Walkthrough

Walkthrough

Adds _caller_runfiles_repo_root() to the OETF runner fixture for Bazel bzlmod-aware spec path resolution, enhances CLI submission and error handling, auto-injects platform and bucket variables into WorkflowBuilder.submit(), and introduces a CallerRunfilesRepoRootTest suite. Also adds a collection of new end-to-end CLI integration test scripts covering app, dataset/collection, workflow, and server/client scenarios, plus parameterizes 14 workflow test fixtures to use platform template variables.

Changes

OETF Runner Fixture Enhancements

Layer / File(s)	Summary
Caller runfiles repo root helper and spec path resolution `test/oetf/runner_fixture.py`	Adds `_caller_runfiles_repo_root()` that scans the Python call stack and walks the filesystem to locate the caller's runfiles repo root under `TEST_SRCDIR`. Updates `_resolve_spec_path()` to try this root first, falling back to `_workspace_root()` when the candidate does not exist.
CLI submission and error handling enhancements `test/oetf/runner_fixture.py`	Updates `_submit_via_cli()` to run `osmo workflow submit` with `cwd=temp_dir` for correct localpath resolution. Enhances `_raise_cli_submission_error()` to filter repeated "New client available" warnings from stderr and combine cleaned stderr with truncated stdout for richer error context.
WorkflowBuilder.submit() variable auto-injection `test/oetf/runner_fixture.py`	Builds a derived `args` list in `submit()` and conditionally appends `platform=` and `bucket=` from fixture defaults when not already present; the enriched args are forwarded to both CLI and API submission paths.
Unit tests for _caller_runfiles_repo_root `test/oetf/tests/test_runner_fixture.py`	Adds `CallerRunfilesRepoRootTest` covering `_main`, bzlmod dependency (`<name>+`), nested directory traversal, and `None` when `TEST_SRCDIR` is unset, using `exec`-based synthetic caller paths.

Workflow Integration Test Scripts and Fixtures

Layer / File(s)	Summary
Test input file fixtures `test/workflow/input/folder_input_0/test_file.txt`, `test/workflow/input/folder_input_1/test_file.txt`	Populates two test input files with `test_file_content` for use in dataset upload/diff tests.
App lifecycle CLI integration test `test/workflow/scripts/app_spec.yaml`, `test/workflow/scripts/app_cli.sh`	Adds a single-task `app-cli-test` workflow spec and a shell script that creates or updates an app, validates it via `osmo app info/show/spec`, deletes it, and asserts non-existence.
Dataset and collection lifecycle CLI integration test `test/workflow/scripts/dataset_cli.sh`, `test/workflow/scripts/dataset_query.txt`, `test/workflow/scripts/collection_query.txt`	Adds query templates and a 270-line script exercising upload, tag/label/metadata, download-diff, update-add/remove, collection create/query, and deletion with existence verification for both datasets and collections.
Workflow CLI smoke-test `test/workflow/scripts/workflow_cli.sh`	Exercises `osmo` commands for profiles, workflow list/query/tag, spec/template retrieval, and pool/resource/task listing with multiple flag variants; exits on first failure.
Server/client TCP echo and hostname scripts `test/workflow/scripts/server.sh`, `test/workflow/scripts/client.sh`, `test/workflow/scripts/echo_hostname.sh`	Adds a `nc`-based TCP server on port 24831, a retry-looping client that saves the echo output, and a hostname-printing utility for multi-task workflow tests.
Workflow test YAML platform parameterization `test/workflow/bad_command.yaml`, `test/workflow/bad_image.yaml`, `test/workflow/empty_command.yaml`, `test/workflow/exec_timeout.yaml`, `test/workflow/host_mount.yaml`, `test/workflow/invalid_mount.yaml`, `test/workflow/invalid_src_dest_mount.yaml`, `test/workflow/multi_arch_containers.yaml`, `test/workflow/osmo_client_test.yaml`, `test/workflow/per_group_exec_timeout.yaml`, `test/workflow/regex_workflow.yaml`, `test/workflow/resource_type.yaml`, `test/workflow/serial_workflow_multi_arch.yaml`, `test/workflow/serial_workflow_update_dataset.yaml`	Updates 14 workflow test YAML files to replace hard-coded `platform: cpu-x86` with templated `platform: {{platform}}`, enabling tests to parameterize the execution platform at runtime.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

fernandol-nvidia
RyaliNvidia

Poem

🐰 Hoppity-hop through the runfiles tree,
TEST_SRCDIR now points straight to me!
Platforms and buckets auto-inject with grace,
Shell scripts galore run the test race.
From dataset diffs to TCP pings—
This rabbit delights in all the new things! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly describes the main changes: fixing spec path resolution from the caller's runfiles repo under bzlmod, and restoring the script bundle that was missing after migration.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jiaenr/fix-oetf-cross-repo-spec-resolution

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

test/oetf/runner_fixture.py (1)

602-606: 💤 Low value

Consider logging the enriched args for debugging clarity.

Line 605 logs self._args (user-specified args) rather than args (which includes auto-injected platform/bucket). When debugging submission failures, seeing the complete set of variables actually submitted may be more helpful.

💡 Optional diff

             print(
                 f"OETF submitted: {workflow_id} ({workflow_url}) "
                 f"[spec={os.path.basename(self._spec_path)} pool={self._pool} "
-                f"client={self._client} args={list(self._args)}]",
+                f"client={self._client} args={args}]",
                 file=sys.stderr, flush=True,
             )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/oetf/runner_fixture.py` around lines 602 - 606, In the print statement
that logs the OETF workflow submission details around line 605, change the
logged args variable from self._args to args. This will ensure the log includes
the complete set of arguments actually submitted, including any auto-injected
platform and bucket values, rather than just the user-specified arguments. This
provides more complete debugging information when investigating submission
issues.

test/workflow/scripts/app_cli.sh (1)

21-21: 💤 Low value

Consider checking exit codes directly.

ShellCheck suggests checking exit codes directly (e.g., if ! osmo app list; then) instead of testing $? for improved readability.

Also applies to: 38-38, 46-46, 54-54, 62-62, 70-70, 80-80, 88-88, 98-98
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/workflow/scripts/app_cli.sh` at line 21, Replace the indirect exit code
checks using `$?` with direct command checks for improved readability and
idiomaticity. Instead of executing a command followed by `if [ $? -ne 0 ];
then`, place the command directly in the conditional statement using the
negation operator `!` (e.g., `if ! osmo app list; then`). Apply this change at
all eight locations noted in the comment: lines 21, 38, 46, 54, 62, 70, 80, 88,
and 98, ensuring each command is executed within its respective conditional
statement.
Source: Linters/SAST tools

test/workflow/scripts/client.sh (2)

31-31: 💤 Low value

Optional: Simplify arithmetic expression.

Same as Line 18, the $ prefix on retries is unnecessary in arithmetic context.

♻️ Optional simplification

-  retries=$(($retries - 1))
+  retries=$((retries - 1))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/workflow/scripts/client.sh` at line 31, In the arithmetic expansion
expression for the retries variable decrement operation, remove the unnecessary
dollar sign prefix before the variable name inside the parentheses. The variable
reference inside the arithmetic context $((...)) does not require the $ prefix,
so simplify the expression by removing it from the first occurrence of retries
while keeping the structure of the subtraction operation intact.

18-18: 💤 Low value

Optional: Simplify arithmetic expression.

Shellcheck suggests that the $ prefix on retries is unnecessary in arithmetic context. While the current syntax works correctly, the simpler form is more idiomatic.

♻️ Optional simplification

-  retries=$(($retries - 1))
+  retries=$((retries - 1))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/workflow/scripts/client.sh` at line 18, In the arithmetic expression for
the retries variable assignment, remove the unnecessary `$` prefix from the
variable name inside the arithmetic expansion. Change the expression from
`retries=$(($retries - 1))` to `retries=$((retries - 1))` to use the more
idiomatic shellcheck-compliant syntax where variable expansion with `$` is
optional within arithmetic context.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/workflow/scripts/app_cli.sh`:
- Around line 26-30: The elif branch checking for "created successfully" is
unreachable because it exists inside an if ! block that only executes when the
command fails. Remove the unreachable elif branch (the one checking for "created
successfully" with grep) from inside the failure block. Restructure the logic so
that the success case is handled in the main if block and the failure case is in
the else block, ensuring that when the create command exits with code 0
(success), it properly acknowledges the successful creation before continuing.
- Around line 78-84: The logic in the osmo app info check is inverted. After the
app is deleted, the osmo app info command should fail because the app no longer
exists. Currently the code treats this expected failure as an error with if [ $?
-ne 0 ]. Invert the condition to if [ $? -eq 0 ] so that success (the command
finding the deleted app) is treated as the error condition, matching the correct
pattern used in the app show and app spec checks that follow.

In `@test/workflow/scripts/app_spec.yaml`:
- Around line 1-9: The copyright header in the app_spec.yaml file uses invalid
YAML syntax with ".." delimiters instead of proper YAML comments. Replace the
".." at the beginning (before "Copyright") and the ".." at the end (after the
closing line about strict prohibition) with the "#" prefix to make them valid
YAML comments. Each line in the copyright header block should start with "#"
followed by a space, consistent with proper YAML comment formatting.

In `@test/workflow/scripts/client.sh`:
- Around line 35-36: The client.sh script is missing a line that writes the TCP
echo output to the workflow output directory. Between the existing `cat
tmp/tcp_echo.txt` command and the `sleep 1000` command, add a new line that
redirects the TCP echo file content to the workflow output directory using the
{{output}} placeholder syntax. This ensures the workflow produces the expected
tcp_echo.txt output artifact as required by the parallel_workflow_v2.yaml
configuration.

In `@test/workflow/scripts/dataset_cli.sh`:
- Around line 248-257: The osmo dataset delete command on line 248 does not
validate its exit status before proceeding to check if the dataset was deleted.
Add an exit status check immediately after the osmo dataset delete
${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} command using the same pattern as
the osmo dataset info check below it, so that if the delete operation itself
fails, the script can detect and report this failure rather than letting it
proceed to the post-delete validation checks which may give misleading results.

In `@test/workflow/scripts/server.sh`:
- Around line 9-12: The server.sh script is missing the output artifact creation
step that is required by the workflow definition. After the netcat command (nc
-w 50 -l -p 24831), add a line that copies the /tmp/hello.txt file to the output
artifact directory using the syntax cat /tmp/hello.txt > {{output}}/hello.txt to
ensure the workflow produces the expected hello.txt output artifact.

---

Nitpick comments:
In `@test/oetf/runner_fixture.py`:
- Around line 602-606: In the print statement that logs the OETF workflow
submission details around line 605, change the logged args variable from
self._args to args. This will ensure the log includes the complete set of
arguments actually submitted, including any auto-injected platform and bucket
values, rather than just the user-specified arguments. This provides more
complete debugging information when investigating submission issues.

In `@test/workflow/scripts/app_cli.sh`:
- Line 21: Replace the indirect exit code checks using `$?` with direct command
checks for improved readability and idiomaticity. Instead of executing a command
followed by `if [ $? -ne 0 ]; then`, place the command directly in the
conditional statement using the negation operator `!` (e.g., `if ! osmo app
list; then`). Apply this change at all eight locations noted in the comment:
lines 21, 38, 46, 54, 62, 70, 80, 88, and 98, ensuring each command is executed
within its respective conditional statement.

In `@test/workflow/scripts/client.sh`:
- Line 31: In the arithmetic expansion expression for the retries variable
decrement operation, remove the unnecessary dollar sign prefix before the
variable name inside the parentheses. The variable reference inside the
arithmetic context $((...)) does not require the $ prefix, so simplify the
expression by removing it from the first occurrence of retries while keeping the
structure of the subtraction operation intact.
- Line 18: In the arithmetic expression for the retries variable assignment,
remove the unnecessary `$` prefix from the variable name inside the arithmetic
expansion. Change the expression from `retries=$(($retries - 1))` to
`retries=$((retries - 1))` to use the more idiomatic shellcheck-compliant syntax
where variable expansion with `$` is optional within arithmetic context.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d3170a21-fe64-4892-9380-ab9fb5f54efd

📥 Commits

Reviewing files that changed from the base of the PR and between cc7e0bf and 3fce3f1.

📒 Files selected for processing (13)

test/oetf/runner_fixture.py
test/oetf/tests/test_runner_fixture.py
test/workflow/input/folder_input_0/test_file.txt
test/workflow/input/folder_input_1/test_file.txt
test/workflow/scripts/app_cli.sh
test/workflow/scripts/app_spec.yaml
test/workflow/scripts/client.sh
test/workflow/scripts/collection_query.txt
test/workflow/scripts/dataset_cli.sh
test/workflow/scripts/dataset_query.txt
test/workflow/scripts/echo_hostname.sh
test/workflow/scripts/server.sh
test/workflow/scripts/workflow_cli.sh

coderabbitai · 2026-06-16T23:44:59Z

+    elif echo "$output" | grep -q "created successfully"; then
+        echo "[App Create Done]"
+    else
+        echo "[App Create Failed] Failed to create app"
+        exit 1


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unreachable success case inside failure block.

Lines 26-28 check for "created successfully" inside the if ! block that only executes when the create command fails. A successful create would exit with code 0, skip this entire block, and continue to line 34. This branch is unreachable.

♻️ Proposed fix to remove unreachable branch

echo "[App Update Done]" - elif echo "$output" | grep -q "created successfully"; then - echo "[App Create Done]" else echo "[App Create Failed] Failed to create app" exit 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

elif echo "$output" | grep -q "created successfully"; then

echo "[App Create Done]"

else

echo "[App Create Failed] Failed to create app"

exit 1

elif echo "$output" | grep -q "already exists"; then

echo "[App Update Done]"

else

echo "[App Create Failed] Failed to create app"

exit 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/workflow/scripts/app_cli.sh` around lines 26 - 30, The elif branch checking for "created successfully" is unreachable because it exists inside an if ! block that only executes when the command fails. Remove the unreachable elif branch (the one checking for "created successfully" with grep) from inside the failure block. Restructure the logic so that the success case is handled in the main if block and the failure case is in the else block, ensuring that when the create command exits with code 0 (success), it properly acknowledges the successful creation before continuing.

coderabbitai · 2026-06-16T23:45:00Z

+echo "[App Info Start] ------------------------------------------------"
+osmo app info ${APP_NAME}
+if [ $? -ne 0 ]; then
+    echo "[App Info Failed] Failed to get app info"
+    exit 1
+fi
+echo "[App Info Done]"


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Inverted post-deletion check logic.

After deleting the app, osmo app info should fail because the app no longer exists. The current code treats this expected failure as an error. Lines 87-93 and 97-103 correctly expect failure for app show and app spec.

🐛 Proposed fix to expect failure after deletion

echo "[App Info Start] ------------------------------------------------" osmo app info ${APP_NAME} -if [ $? -ne 0 ]; then - echo "[App Info Failed] Failed to get app info" - exit 1 +if [ $? -eq 0 ]; then + echo "[Info] Deleted App Info is still available" + exit 1 fi echo "[App Info Done]"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "[App Info Start] ------------------------------------------------"

osmo app info ${APP_NAME}

if [ $? -ne 0 ]; then

echo "[App Info Failed] Failed to get app info"

exit 1

fi

echo "[App Info Done]"

echo "[App Info Start] ------------------------------------------------"

osmo app info ${APP_NAME}

if [ $? -eq 0 ]; then

echo "[Info] Deleted App Info is still available"

exit 1

fi

echo "[App Info Done]"

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 79-79: Double quote to prevent globbing and word splitting.

(SC2086)

[style] 80-80: Check exit code directly with e.g. 'if ! mycmd;', not indirectly with $?.

(SC2181)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/workflow/scripts/app_cli.sh` around lines 78 - 84, The logic in the osmo app info check is inverted. After the app is deleted, the osmo app info command should fail because the app no longer exists. Currently the code treats this expected failure as an error with if [ $? -ne 0 ]. Invert the condition to if [ $? -eq 0 ] so that success (the command finding the deleted app) is treated as the error condition, matching the correct pattern used in the app show and app spec checks that follow.

coderabbitai · 2026-06-16T23:45:00Z

+..
+  Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+
+  NVIDIA CORPORATION and its licensors retain all intellectual property
+  and proprietary rights in and to this software, related documentation
+  and any modifications thereto. Any use, reproduction, disclosure or
+  distribution of this software and related documentation without an express
+  license agreement from NVIDIA CORPORATION is strictly prohibited.
+..


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix YAML syntax error in copyright header.

The ".." delimiters are not valid YAML syntax and cause a parsing error. YAML comments must use # prefix.

🐛 Proposed fix for copyright header syntax

-.. - Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. - - NVIDIA CORPORATION and its licensors retain all intellectual property - and proprietary rights in and to this software, related documentation - and any modifications thereto. Any use, reproduction, disclosure or - distribution of this software and related documentation without an express - license agreement from NVIDIA CORPORATION is strictly prohibited. -.. +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# NVIDIA CORPORATION and its licensors retain all intellectual property +# and proprietary rights in and to this software, related documentation +# and any modifications thereto. Any use, reproduction, disclosure or +# distribution of this software and related documentation without an express +# license agreement from NVIDIA CORPORATION is strictly prohibited.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

..

Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

NVIDIA CORPORATION and its licensors retain all intellectual property

and proprietary rights in and to this software, related documentation

and any modifications thereto. Any use, reproduction, disclosure or

distribution of this software and related documentation without an express

license agreement from NVIDIA CORPORATION is strictly prohibited.

..

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

#

# NVIDIA CORPORATION and its licensors retain all intellectual property

# and proprietary rights in and to this software, related documentation

# and any modifications thereto. Any use, reproduction, disclosure or

# distribution of this software and related documentation without an express

# license agreement from NVIDIA CORPORATION is strictly prohibited.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/workflow/scripts/app_spec.yaml` around lines 1 - 9, The copyright header in the app_spec.yaml file uses invalid YAML syntax with ".." delimiters instead of proper YAML comments. Replace the ".." at the beginning (before "Copyright") and the ".." at the end (after the closing line about strict prohibition) with the "#" prefix to make them valid YAML comments. Each line in the copyright header block should start with "#" followed by a space, consistent with proper YAML comment formatting.

coderabbitai · 2026-06-16T23:45:00Z

+cat tmp/tcp_echo.txt
+sleep 1000


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing output artifact creation.

Cross-referencing with the workflow definition in parallel_workflow_v2.yaml:34-73, the client script should write the TCP echo output to the workflow output directory. The embedded script includes:

cat tmp/tcp_echo.txt > {{output}}/tcp_echo.txt

between lines 35 and 36.

Without this line, the workflow will not produce the expected tcp_echo.txt output artifact.

📦 Proposed fix to add missing output artifact

cat tmp/tcp_echo.txt +cat tmp/tcp_echo.txt > {{output}}/tcp_echo.txt sleep 1000

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cat tmp/tcp_echo.txt

sleep 1000

cat tmp/tcp_echo.txt

cat tmp/tcp_echo.txt > {{output}}/tcp_echo.txt

sleep 1000

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/workflow/scripts/client.sh` around lines 35 - 36, The client.sh script is missing a line that writes the TCP echo output to the workflow output directory. Between the existing `cat tmp/tcp_echo.txt` command and the `sleep 1000` command, add a new line that redirects the TCP echo file content to the workflow output directory using the {{output}} placeholder syntax. This ensures the workflow produces the expected tcp_echo.txt output artifact as required by the parallel_workflow_v2.yaml configuration.

coderabbitai · 2026-06-16T23:45:00Z

+osmo dataset delete ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} --force --all
+
+echo "[Info] Get Deleted Dataset Info"
+osmo dataset info ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}}
+if [ $? -ne 0 ]; then
+    echo "[Info] Deleted Dataset doesn't exist"
+else
+    fail=1
+    echo "[Info] Deleted Dataset still Exists"
+fi


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate dataset delete exit status before post-delete checks.

Line 248 runs a destructive delete but never checks its return code. This can mask a failed deletion and make later assertions ambiguous.

Suggested fix

osmo dataset delete ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} --force --all +if [ $? -ne 0 ]; then + echo "[Delete Failed] Failed to delete dataset" + fail=1 +fi echo "[Info] Get Deleted Dataset Info"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

osmo dataset delete ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} --force --all

echo "[Info] Get Deleted Dataset Info"

osmo dataset info ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}}

if [ $? -ne 0 ]; then

echo "[Info] Deleted Dataset doesn't exist"

else

fail=1

echo "[Info] Deleted Dataset still Exists"

fi

osmo dataset delete ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} --force --all

if [ $? -ne 0 ]; then

echo "[Delete Failed] Failed to delete dataset"

fail=1

fi

echo "[Info] Get Deleted Dataset Info"

osmo dataset info ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}}

if [ $? -ne 0 ]; then

echo "[Info] Deleted Dataset doesn't exist"

else

fail=1

echo "[Info] Deleted Dataset still Exists"

fi

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 248-248: Double quote to prevent globbing and word splitting.

(SC2086)

[warning] 248-248: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)

[warning] 248-248: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)

[warning] 248-248: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)

[warning] 248-248: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)

[info] 251-251: Double quote to prevent globbing and word splitting.

(SC2086)

[warning] 251-251: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)

[warning] 251-251: This { is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)

[warning] 251-251: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)

[warning] 251-251: This } is literal. Check expression (missing ;/\n?) or quote it.

(SC1083)

[style] 252-252: Check exit code directly with e.g. 'if ! mycmd;', not indirectly with $?.

(SC2181)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/workflow/scripts/dataset_cli.sh` around lines 248 - 257, The osmo dataset delete command on line 248 does not validate its exit status before proceeding to check if the dataset was deleted. Add an exit status check immediately after the osmo dataset delete ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} command using the same pattern as the osmo dataset info check below it, so that if the delete operation itself fails, the script can detect and report this failure rather than letting it proceed to the post-delete validation checks which may give misleading results.

coderabbitai · 2026-06-16T23:45:00Z

+set -ex
+# Open a TCP server in listening mode using port 24831
+# Wait for 50 seconds before closing the connection
+nc -w 50 -l -p 24831 < /tmp/hello.txt


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing output artifact creation.

Cross-referencing with the workflow definition in parallel_workflow_v2.yaml:12-33, the server script should also write the output artifact after the netcat command. The embedded script includes:

cat /tmp/hello.txt > {{output}}/hello.txt

Without this line, the workflow will not produce the expected hello.txt output artifact.

📦 Proposed fix to add missing output artifact

nc -w 50 -l -p 24831 < /tmp/hello.txt +cat /tmp/hello.txt > {{output}}/hello.txt

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

set -ex

# Open a TCP server in listening mode using port 24831

# Wait for 50 seconds before closing the connection

nc -w 50 -l -p 24831 < /tmp/hello.txt

set -ex

# Open a TCP server in listening mode using port 24831

# Wait for 50 seconds before closing the connection

nc -w 50 -l -p 24831 < /tmp/hello.txt

cat /tmp/hello.txt > {{output}}/hello.txt

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/workflow/scripts/server.sh` around lines 9 - 12, The server.sh script is missing the output artifact creation step that is required by the workflow definition. After the netcat command (nc -w 50 -l -p 24831), add a line that copies the /tmp/hello.txt file to the output artifact directory using the syntax cat /tmp/hello.txt > {{output}}/hello.txt to ensure the workflow produces the expected hello.txt output artifact.

…porting Three more migration-induced staging failures found by sweeping all 26 originally-failing scenarios: 1. **14 workflow yamls hardcoded ``platform: cpu-x86``** in their ``resources:`` blocks. The previous commit's WorkflowBuilder platform/bucket auto-injection only helps yamls that templated the field as ``{{platform}}`` — the hardcoded ones still tried to schedule on cpu-x86 and failed on isaac-nightly. Replace ``platform: cpu-x86`` → ``platform: {{platform}}`` only in the deeply-indented ``resources:`` block (>=4 spaces). Leaves the top-level ``default-values: platform: cpu-x86`` literal default untouched, so KIND deploys + tests that don't set OETF_DEFAULT_PLATFORM continue to default to cpu-x86 as before. 2. **CLI submit cwd was the bazel sandbox**, not the staged temp_dir where ``_copy_localpath_files_to_dir`` writes the rewritten spec + copied localpath files. The CLI dutifully resolved ``localpath: folder_input_0`` against the wrong dir and failed: Error message: The localpath input does not exist! Pass ``cwd=temp_dir`` to ``subprocess.run`` for the submit invocation so the rewritten basename-relative localpaths resolve correctly. 3. **CLI submit error reporting buried real errors under version warnings.** The osmo CLI emits a 3-line ``WARNING: New client X.Y.Z available...`` block to stderr on every invocation, the previous error path captured only ``stderr[:500]`` and stopped, so the actual error (``Error message: ...`` on stdout, or 4xx body on stderr) never reached the test log. ``_raise_cli_submission_error`` now strips the version-warning lines from stderr and includes both stdout and stderr (up to 1000 chars each) in the error body — every CLI failure now points at the real cause without manual log diving. Verification (sweep of all 26 originally-failing scenarios on staging.osmo.nvidia.com / isaac-nightly / OETF_DEFAULT_PLATFORM= x86-5090): 24/26 PASS (was 0/26 before the migration fixes) The 2 remaining failures are NOT migration breakage: - ``app-cli`` task-level workflow failure (the workflow submits + runs on staging for 105s, then the user task exits 1 — pre-existing test/ workflow logic issue unrelated to !6103). - ``folder-input`` requires the ``team-osmo`` Swift data credential registered on the OSMO instance for localpath uploads; Jenkins provides this via ``dataStorageCreds``, local sandbox doesn't.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/oetf/runner_fixture.py (1)

601-606: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Debug output shows original args, not the enriched args.

The debug print on line 636 uses self._args (original) instead of the derived args (with auto-injected platform/bucket). This could confuse developers when debugging: the stderr message might show args=[] while the workflow actually received platform=cpu.

🔧 Suggested fix

         print(
             f"OETF submitted: {workflow_id} ({workflow_url}) "
             f"[spec={os.path.basename(self._spec_path)} pool={self._pool} "
-            f"client={self._client} args={list(self._args)}]",
+            f"client={self._client} args={args}]",
             file=sys.stderr, flush=True,
         )

Also applies to: 636-636

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/oetf/runner_fixture.py` around lines 601 - 606, The debug print
statement that outputs the arguments is using the original self._args variable
instead of the enriched args variable that has been populated with auto-injected
platform and bucket values in the preceding lines. Locate the debug print
statement that references self._args on or near line 636 and change it to use
the local args variable instead, so that the debug output accurately reflects
the arguments that will actually be passed to the workflow, including any
platform or bucket values that were automatically appended.

🧹 Nitpick comments (1)

test/oetf/runner_fixture.py (1)

401-411: 💤 Low value

Minor edge case: incomplete warning block could suppress subsequent errors.

If the CLI emits a WARNING: New client line but never outputs the expected curl ... install.sh terminator (e.g., format change or truncated output), skip remains True and all subsequent stderr lines are lost. This could hide actual error messages.

Consider resetting skip after N lines or when encountering an obviously non-warning line:

♻️ Defensive fix (optional)

     cleaned_stderr_lines = []
     skip = False
+    skip_line_count = 0
     for line in raw_stderr.splitlines():
         if line.startswith("WARNING: New client"):
             skip = True
+            skip_line_count = 0
             continue
         if skip:
+            skip_line_count += 1
+            # Safety: reset after 10 lines to avoid losing real errors
+            if skip_line_count > 10:
+                skip = False
             if line.startswith("curl ") and "install.sh" in line:
                 skip = False
             continue
         cleaned_stderr_lines.append(line)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/oetf/runner_fixture.py` around lines 401 - 411, The stderr filtering
logic in the loop that processes raw_stderr has a vulnerability where the skip
flag can remain True indefinitely if the expected curl install.sh terminator
line is never encountered, causing all subsequent error messages to be
suppressed. Add defensive logic to reset the skip flag after a reasonable number
of lines have been skipped (such as a maximum line count threshold) or when an
obviously non-warning line is encountered, ensuring that temporary skip states
do not permanently suppress stderr output from being captured.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/oetf/runner_fixture.py`:
- Around line 601-606: The debug print statement that outputs the arguments is
using the original self._args variable instead of the enriched args variable
that has been populated with auto-injected platform and bucket values in the
preceding lines. Locate the debug print statement that references self._args on
or near line 636 and change it to use the local args variable instead, so that
the debug output accurately reflects the arguments that will actually be passed
to the workflow, including any platform or bucket values that were automatically
appended.

---

Nitpick comments:
In `@test/oetf/runner_fixture.py`:
- Around line 401-411: The stderr filtering logic in the loop that processes
raw_stderr has a vulnerability where the skip flag can remain True indefinitely
if the expected curl install.sh terminator line is never encountered, causing
all subsequent error messages to be suppressed. Add defensive logic to reset the
skip flag after a reasonable number of lines have been skipped (such as a
maximum line count threshold) or when an obviously non-warning line is
encountered, ensuring that temporary skip states do not permanently suppress
stderr output from being captured.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 12674653-3fa5-4785-bcee-86659824a8ab

📥 Commits

Reviewing files that changed from the base of the PR and between 3fce3f1 and 15f8d29.

📒 Files selected for processing (15)

test/oetf/runner_fixture.py
test/workflow/bad_command.yaml
test/workflow/bad_image.yaml
test/workflow/empty_command.yaml
test/workflow/exec_timeout.yaml
test/workflow/host_mount.yaml
test/workflow/invalid_mount.yaml
test/workflow/invalid_src_dest_mount.yaml
test/workflow/multi_arch_containers.yaml
test/workflow/osmo_client_test.yaml
test/workflow/per_group_exec_timeout.yaml
test/workflow/regex_workflow.yaml
test/workflow/resource_type.yaml
test/workflow/serial_workflow_multi_arch.yaml
test/workflow/serial_workflow_update_dataset.yaml

✅ Files skipped from review due to trivial changes (6)

test/workflow/exec_timeout.yaml
test/workflow/multi_arch_containers.yaml
test/workflow/serial_workflow_update_dataset.yaml
test/workflow/per_group_exec_timeout.yaml
test/workflow/bad_image.yaml
test/workflow/bad_command.yaml

jiaenren requested a review from a team as a code owner June 16, 2026 23:38

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oetf: resolve spec paths from caller's runfiles repo; restore script bundle#1114

oetf: resolve spec paths from caller's runfiles repo; restore script bundle#1114
jiaenren wants to merge 2 commits into
mainfrom
jiaenr/fix-oetf-cross-repo-spec-resolution

jiaenren commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiaenren commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Spec-file resolution under bzlmod

2. Script + input bundles dropped during the OETF migration

3. Platform/bucket auto-injection in WorkflowBuilder.submit()

Verification

Pairs with

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jiaenren commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

3. Platform/bucket auto-injection in `WorkflowBuilder.submit()`

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading