oetf: resolve spec paths from caller's runfiles repo; restore script bundle#1114
oetf: resolve spec paths from caller's runfiles repo; restore script bundle#1114jiaenren wants to merge 2 commits into
Conversation
…bundle
Two migration-induced breakages on the staging Jenkins OETF pipeline:
1. Spec-file resolution under bzlmod. _workspace_root() uses TEST_WORKSPACE
which bzlmod always sets to "_main", regardless of which module the test
target lives in. So tests in @osmo_workspace//test/scenarios:* that call
self.workflow("test/workflow/X.yaml") resolved against <TEST_SRCDIR>/_main/
even though their data deps land under <TEST_SRCDIR>/osmo_workspace+/.
Every external scenario in the staging Jenkins run was failing with
FileNotFoundError: spec file not found: ...runfiles/_main/test/workflow/X.yaml
Fix: new _caller_runfiles_repo_root() helper walks from the caller's
__file__ up to the immediate child of TEST_SRCDIR — that's the runfiles
repo dir that actually owns the test (e.g. osmo_workspace+ or _main).
_resolve_spec_path() tries this candidate first for workspace-relative
paths, falling back to _workspace_root() for the BUILD_WORKSPACE_DIRECTORY
path (bazel run) and pre-bzlmod compat.
2. Script + input bundles dropped during the OETF migration. Several yamls
under test/workflow/ (app_cli.yaml, dataset_cli.yaml, parallel_workflow_
mount_v2.yaml, etc.) reference localpath: scripts/X.sh or localpath:
input/Y — the scripts/ and input/ subdirs existed under
validation/workflow/ in the internal repo but weren't moved over with
the YAMLs. The yaml-only migration left every script-referencing
workflow broken end-to-end (even after fix #1 unblocks the YAML load).
Fix: restore the script bundle alongside the yamls (bit-identical copy
from the internal repo's validation/workflow/{scripts,input}/) so the
public yamls are self-sufficient. test/workflow/BUILD's filegroup
already globs scripts/** + input/** with allow_empty=True; no BUILD edit
needed.
Bonus: WorkflowBuilder.submit() now auto-injects platform={default_platform}
and bucket={default_bucket} as workflow variables when the caller hasn't
set them explicitly. Public yamls use {{platform}} placeholders with KIND-
oriented default-values (cpu-x86); on the internal staging Jenkins they
need x86-5090 / agx-orin-jp6 / etc. Auto-injection from the fixture's env-
aware defaults (OETF_DEFAULT_PLATFORM, OETF_DEFAULT_BUCKET) lets the same
scenarios run unchanged across KIND, staging, and any other env.
Verification (local, against staging.osmo.nvidia.com on isaac-nightly
pool with OETF_DEFAULT_PLATFORM=x86-5090):
@osmo_workspace//test/scenarios:app-cli submitted → ran 106s on
staging (workflow
app-cli-1621)
@osmo_workspace//test/scenarios:command-validation PASS
@osmo_workspace//test/scenarios:mount-validation PASS
(Some scenarios still fail for unrelated reasons — actual task runtime,
expected-outcome drift — but the migration breakage that took out the
whole suite is gone.)
Unit test added: test_runner_fixture.py::CallerRunfilesRepoRootTest covers
both _main and dep-module (<name>+) layouts. All existing test_runner_
fixture tests + the runner_fixture pylint target still green:
bazel test //test/oetf/tests:test_runner_fixture //test/oetf:runner_fixture-pylint
📝 WalkthroughWalkthroughAdds ChangesOETF Runner Fixture Enhancements
Workflow Integration Test Scripts and Fixtures
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (4)
test/oetf/runner_fixture.py (1)
602-606: 💤 Low valueConsider logging the enriched
argsfor debugging clarity.Line 605 logs
self._args(user-specified args) rather thanargs(which includes auto-injectedplatform/bucket). When debugging submission failures, seeing the complete set of variables actually submitted may be more helpful.💡 Optional diff
print( f"OETF submitted: {workflow_id} ({workflow_url}) " f"[spec={os.path.basename(self._spec_path)} pool={self._pool} " - f"client={self._client} args={list(self._args)}]", + f"client={self._client} args={args}]", file=sys.stderr, flush=True, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/oetf/runner_fixture.py` around lines 602 - 606, In the print statement that logs the OETF workflow submission details around line 605, change the logged args variable from self._args to args. This will ensure the log includes the complete set of arguments actually submitted, including any auto-injected platform and bucket values, rather than just the user-specified arguments. This provides more complete debugging information when investigating submission issues.test/workflow/scripts/app_cli.sh (1)
21-21: 💤 Low valueConsider checking exit codes directly.
ShellCheck suggests checking exit codes directly (e.g.,
if ! osmo app list; then) instead of testing$?for improved readability.Also applies to: 38-38, 46-46, 54-54, 62-62, 70-70, 80-80, 88-88, 98-98
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/workflow/scripts/app_cli.sh` at line 21, Replace the indirect exit code checks using `$?` with direct command checks for improved readability and idiomaticity. Instead of executing a command followed by `if [ $? -ne 0 ]; then`, place the command directly in the conditional statement using the negation operator `!` (e.g., `if ! osmo app list; then`). Apply this change at all eight locations noted in the comment: lines 21, 38, 46, 54, 62, 70, 80, 88, and 98, ensuring each command is executed within its respective conditional statement.Source: Linters/SAST tools
test/workflow/scripts/client.sh (2)
31-31: 💤 Low valueOptional: Simplify arithmetic expression.
Same as Line 18, the
$prefix onretriesis unnecessary in arithmetic context.♻️ Optional simplification
- retries=$(($retries - 1)) + retries=$((retries - 1))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/workflow/scripts/client.sh` at line 31, In the arithmetic expansion expression for the retries variable decrement operation, remove the unnecessary dollar sign prefix before the variable name inside the parentheses. The variable reference inside the arithmetic context $((...)) does not require the $ prefix, so simplify the expression by removing it from the first occurrence of retries while keeping the structure of the subtraction operation intact.
18-18: 💤 Low valueOptional: Simplify arithmetic expression.
Shellcheck suggests that the
$prefix onretriesis unnecessary in arithmetic context. While the current syntax works correctly, the simpler form is more idiomatic.♻️ Optional simplification
- retries=$(($retries - 1)) + retries=$((retries - 1))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/workflow/scripts/client.sh` at line 18, In the arithmetic expression for the retries variable assignment, remove the unnecessary `$` prefix from the variable name inside the arithmetic expansion. Change the expression from `retries=$(($retries - 1))` to `retries=$((retries - 1))` to use the more idiomatic shellcheck-compliant syntax where variable expansion with `$` is optional within arithmetic context.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/workflow/scripts/app_cli.sh`:
- Around line 26-30: The elif branch checking for "created successfully" is
unreachable because it exists inside an if ! block that only executes when the
command fails. Remove the unreachable elif branch (the one checking for "created
successfully" with grep) from inside the failure block. Restructure the logic so
that the success case is handled in the main if block and the failure case is in
the else block, ensuring that when the create command exits with code 0
(success), it properly acknowledges the successful creation before continuing.
- Around line 78-84: The logic in the osmo app info check is inverted. After the
app is deleted, the osmo app info command should fail because the app no longer
exists. Currently the code treats this expected failure as an error with if [ $?
-ne 0 ]. Invert the condition to if [ $? -eq 0 ] so that success (the command
finding the deleted app) is treated as the error condition, matching the correct
pattern used in the app show and app spec checks that follow.
In `@test/workflow/scripts/app_spec.yaml`:
- Around line 1-9: The copyright header in the app_spec.yaml file uses invalid
YAML syntax with ".." delimiters instead of proper YAML comments. Replace the
".." at the beginning (before "Copyright") and the ".." at the end (after the
closing line about strict prohibition) with the "#" prefix to make them valid
YAML comments. Each line in the copyright header block should start with "#"
followed by a space, consistent with proper YAML comment formatting.
In `@test/workflow/scripts/client.sh`:
- Around line 35-36: The client.sh script is missing a line that writes the TCP
echo output to the workflow output directory. Between the existing `cat
tmp/tcp_echo.txt` command and the `sleep 1000` command, add a new line that
redirects the TCP echo file content to the workflow output directory using the
{{output}} placeholder syntax. This ensures the workflow produces the expected
tcp_echo.txt output artifact as required by the parallel_workflow_v2.yaml
configuration.
In `@test/workflow/scripts/dataset_cli.sh`:
- Around line 248-257: The osmo dataset delete command on line 248 does not
validate its exit status before proceeding to check if the dataset was deleted.
Add an exit status check immediately after the osmo dataset delete
${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} command using the same pattern as
the osmo dataset info check below it, so that if the delete operation itself
fails, the script can detect and report this failure rather than letting it
proceed to the post-delete validation checks which may give misleading results.
In `@test/workflow/scripts/server.sh`:
- Around line 9-12: The server.sh script is missing the output artifact creation
step that is required by the workflow definition. After the netcat command (nc
-w 50 -l -p 24831), add a line that copies the /tmp/hello.txt file to the output
artifact directory using the syntax cat /tmp/hello.txt > {{output}}/hello.txt to
ensure the workflow produces the expected hello.txt output artifact.
---
Nitpick comments:
In `@test/oetf/runner_fixture.py`:
- Around line 602-606: In the print statement that logs the OETF workflow
submission details around line 605, change the logged args variable from
self._args to args. This will ensure the log includes the complete set of
arguments actually submitted, including any auto-injected platform and bucket
values, rather than just the user-specified arguments. This provides more
complete debugging information when investigating submission issues.
In `@test/workflow/scripts/app_cli.sh`:
- Line 21: Replace the indirect exit code checks using `$?` with direct command
checks for improved readability and idiomaticity. Instead of executing a command
followed by `if [ $? -ne 0 ]; then`, place the command directly in the
conditional statement using the negation operator `!` (e.g., `if ! osmo app
list; then`). Apply this change at all eight locations noted in the comment:
lines 21, 38, 46, 54, 62, 70, 80, 88, and 98, ensuring each command is executed
within its respective conditional statement.
In `@test/workflow/scripts/client.sh`:
- Line 31: In the arithmetic expansion expression for the retries variable
decrement operation, remove the unnecessary dollar sign prefix before the
variable name inside the parentheses. The variable reference inside the
arithmetic context $((...)) does not require the $ prefix, so simplify the
expression by removing it from the first occurrence of retries while keeping the
structure of the subtraction operation intact.
- Line 18: In the arithmetic expression for the retries variable assignment,
remove the unnecessary `$` prefix from the variable name inside the arithmetic
expansion. Change the expression from `retries=$(($retries - 1))` to
`retries=$((retries - 1))` to use the more idiomatic shellcheck-compliant syntax
where variable expansion with `$` is optional within arithmetic context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d3170a21-fe64-4892-9380-ab9fb5f54efd
📒 Files selected for processing (13)
test/oetf/runner_fixture.pytest/oetf/tests/test_runner_fixture.pytest/workflow/input/folder_input_0/test_file.txttest/workflow/input/folder_input_1/test_file.txttest/workflow/scripts/app_cli.shtest/workflow/scripts/app_spec.yamltest/workflow/scripts/client.shtest/workflow/scripts/collection_query.txttest/workflow/scripts/dataset_cli.shtest/workflow/scripts/dataset_query.txttest/workflow/scripts/echo_hostname.shtest/workflow/scripts/server.shtest/workflow/scripts/workflow_cli.sh
| elif echo "$output" | grep -q "created successfully"; then | ||
| echo "[App Create Done]" | ||
| else | ||
| echo "[App Create Failed] Failed to create app" | ||
| exit 1 |
There was a problem hiding this comment.
Unreachable success case inside failure block.
Lines 26-28 check for "created successfully" inside the if ! block that only executes when the create command fails. A successful create would exit with code 0, skip this entire block, and continue to line 34. This branch is unreachable.
♻️ Proposed fix to remove unreachable branch
echo "[App Update Done]"
- elif echo "$output" | grep -q "created successfully"; then
- echo "[App Create Done]"
else
echo "[App Create Failed] Failed to create app"
exit 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| elif echo "$output" | grep -q "created successfully"; then | |
| echo "[App Create Done]" | |
| else | |
| echo "[App Create Failed] Failed to create app" | |
| exit 1 | |
| elif echo "$output" | grep -q "already exists"; then | |
| echo "[App Update Done]" | |
| else | |
| echo "[App Create Failed] Failed to create app" | |
| exit 1 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/workflow/scripts/app_cli.sh` around lines 26 - 30, The elif branch
checking for "created successfully" is unreachable because it exists inside an
if ! block that only executes when the command fails. Remove the unreachable
elif branch (the one checking for "created successfully" with grep) from inside
the failure block. Restructure the logic so that the success case is handled in
the main if block and the failure case is in the else block, ensuring that when
the create command exits with code 0 (success), it properly acknowledges the
successful creation before continuing.
| echo "[App Info Start] ------------------------------------------------" | ||
| osmo app info ${APP_NAME} | ||
| if [ $? -ne 0 ]; then | ||
| echo "[App Info Failed] Failed to get app info" | ||
| exit 1 | ||
| fi | ||
| echo "[App Info Done]" |
There was a problem hiding this comment.
Inverted post-deletion check logic.
After deleting the app, osmo app info should fail because the app no longer exists. The current code treats this expected failure as an error. Lines 87-93 and 97-103 correctly expect failure for app show and app spec.
🐛 Proposed fix to expect failure after deletion
echo "[App Info Start] ------------------------------------------------"
osmo app info ${APP_NAME}
-if [ $? -ne 0 ]; then
- echo "[App Info Failed] Failed to get app info"
- exit 1
+if [ $? -eq 0 ]; then
+ echo "[Info] Deleted App Info is still available"
+ exit 1
fi
echo "[App Info Done]"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "[App Info Start] ------------------------------------------------" | |
| osmo app info ${APP_NAME} | |
| if [ $? -ne 0 ]; then | |
| echo "[App Info Failed] Failed to get app info" | |
| exit 1 | |
| fi | |
| echo "[App Info Done]" | |
| echo "[App Info Start] ------------------------------------------------" | |
| osmo app info ${APP_NAME} | |
| if [ $? -eq 0 ]; then | |
| echo "[Info] Deleted App Info is still available" | |
| exit 1 | |
| fi | |
| echo "[App Info Done]" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 79-79: Double quote to prevent globbing and word splitting.
(SC2086)
[style] 80-80: Check exit code directly with e.g. 'if ! mycmd;', not indirectly with $?.
(SC2181)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/workflow/scripts/app_cli.sh` around lines 78 - 84, The logic in the osmo
app info check is inverted. After the app is deleted, the osmo app info command
should fail because the app no longer exists. Currently the code treats this
expected failure as an error with if [ $? -ne 0 ]. Invert the condition to if [
$? -eq 0 ] so that success (the command finding the deleted app) is treated as
the error condition, matching the correct pattern used in the app show and app
spec checks that follow.
| .. | ||
| Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | ||
|
|
||
| NVIDIA CORPORATION and its licensors retain all intellectual property | ||
| and proprietary rights in and to this software, related documentation | ||
| and any modifications thereto. Any use, reproduction, disclosure or | ||
| distribution of this software and related documentation without an express | ||
| license agreement from NVIDIA CORPORATION is strictly prohibited. | ||
| .. |
There was a problem hiding this comment.
Fix YAML syntax error in copyright header.
The ".." delimiters are not valid YAML syntax and cause a parsing error. YAML comments must use # prefix.
🐛 Proposed fix for copyright header syntax
-..
- Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
-
- NVIDIA CORPORATION and its licensors retain all intellectual property
- and proprietary rights in and to this software, related documentation
- and any modifications thereto. Any use, reproduction, disclosure or
- distribution of this software and related documentation without an express
- license agreement from NVIDIA CORPORATION is strictly prohibited.
-..
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto. Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| .. | |
| Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |
| NVIDIA CORPORATION and its licensors retain all intellectual property | |
| and proprietary rights in and to this software, related documentation | |
| and any modifications thereto. Any use, reproduction, disclosure or | |
| distribution of this software and related documentation without an express | |
| license agreement from NVIDIA CORPORATION is strictly prohibited. | |
| .. | |
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |
| # | |
| # NVIDIA CORPORATION and its licensors retain all intellectual property | |
| # and proprietary rights in and to this software, related documentation | |
| # and any modifications thereto. Any use, reproduction, disclosure or | |
| # distribution of this software and related documentation without an express | |
| # license agreement from NVIDIA CORPORATION is strictly prohibited. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/workflow/scripts/app_spec.yaml` around lines 1 - 9, The copyright header
in the app_spec.yaml file uses invalid YAML syntax with ".." delimiters instead
of proper YAML comments. Replace the ".." at the beginning (before "Copyright")
and the ".." at the end (after the closing line about strict prohibition) with
the "#" prefix to make them valid YAML comments. Each line in the copyright
header block should start with "#" followed by a space, consistent with proper
YAML comment formatting.
| cat tmp/tcp_echo.txt | ||
| sleep 1000 |
There was a problem hiding this comment.
Missing output artifact creation.
Cross-referencing with the workflow definition in parallel_workflow_v2.yaml:34-73, the client script should write the TCP echo output to the workflow output directory. The embedded script includes:
cat tmp/tcp_echo.txt > {{output}}/tcp_echo.txt
between lines 35 and 36.
Without this line, the workflow will not produce the expected tcp_echo.txt output artifact.
📦 Proposed fix to add missing output artifact
cat tmp/tcp_echo.txt
+cat tmp/tcp_echo.txt > {{output}}/tcp_echo.txt
sleep 1000📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cat tmp/tcp_echo.txt | |
| sleep 1000 | |
| cat tmp/tcp_echo.txt | |
| cat tmp/tcp_echo.txt > {{output}}/tcp_echo.txt | |
| sleep 1000 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/workflow/scripts/client.sh` around lines 35 - 36, The client.sh script
is missing a line that writes the TCP echo output to the workflow output
directory. Between the existing `cat tmp/tcp_echo.txt` command and the `sleep
1000` command, add a new line that redirects the TCP echo file content to the
workflow output directory using the {{output}} placeholder syntax. This ensures
the workflow produces the expected tcp_echo.txt output artifact as required by
the parallel_workflow_v2.yaml configuration.
| osmo dataset delete ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} --force --all | ||
|
|
||
| echo "[Info] Get Deleted Dataset Info" | ||
| osmo dataset info ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} | ||
| if [ $? -ne 0 ]; then | ||
| echo "[Info] Deleted Dataset doesn't exist" | ||
| else | ||
| fail=1 | ||
| echo "[Info] Deleted Dataset still Exists" | ||
| fi |
There was a problem hiding this comment.
Validate dataset delete exit status before post-delete checks.
Line 248 runs a destructive delete but never checks its return code. This can mask a failed deletion and make later assertions ambiguous.
Suggested fix
osmo dataset delete ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} --force --all
+if [ $? -ne 0 ]; then
+ echo "[Delete Failed] Failed to delete dataset"
+ fail=1
+fi
echo "[Info] Get Deleted Dataset Info"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| osmo dataset delete ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} --force --all | |
| echo "[Info] Get Deleted Dataset Info" | |
| osmo dataset info ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} | |
| if [ $? -ne 0 ]; then | |
| echo "[Info] Deleted Dataset doesn't exist" | |
| else | |
| fail=1 | |
| echo "[Info] Deleted Dataset still Exists" | |
| fi | |
| osmo dataset delete ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} --force --all | |
| if [ $? -ne 0 ]; then | |
| echo "[Delete Failed] Failed to delete dataset" | |
| fail=1 | |
| fi | |
| echo "[Info] Get Deleted Dataset Info" | |
| osmo dataset info ${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} | |
| if [ $? -ne 0 ]; then | |
| echo "[Info] Deleted Dataset doesn't exist" | |
| else | |
| fail=1 | |
| echo "[Info] Deleted Dataset still Exists" | |
| fi |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 248-248: Double quote to prevent globbing and word splitting.
(SC2086)
[warning] 248-248: This { is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
[warning] 248-248: This { is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
[warning] 248-248: This } is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
[warning] 248-248: This } is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
[info] 251-251: Double quote to prevent globbing and word splitting.
(SC2086)
[warning] 251-251: This { is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
[warning] 251-251: This { is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
[warning] 251-251: This } is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
[warning] 251-251: This } is literal. Check expression (missing ;/\n?) or quote it.
(SC1083)
[style] 252-252: Check exit code directly with e.g. 'if ! mycmd;', not indirectly with $?.
(SC2181)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/workflow/scripts/dataset_cli.sh` around lines 248 - 257, The osmo
dataset delete command on line 248 does not validate its exit status before
proceeding to check if the dataset was deleted. Add an exit status check
immediately after the osmo dataset delete
${BUCKET_NAME}/${DATASET_NAME}_{{workflow_id}} command using the same pattern as
the osmo dataset info check below it, so that if the delete operation itself
fails, the script can detect and report this failure rather than letting it
proceed to the post-delete validation checks which may give misleading results.
| set -ex | ||
| # Open a TCP server in listening mode using port 24831 | ||
| # Wait for 50 seconds before closing the connection | ||
| nc -w 50 -l -p 24831 < /tmp/hello.txt |
There was a problem hiding this comment.
Missing output artifact creation.
Cross-referencing with the workflow definition in parallel_workflow_v2.yaml:12-33, the server script should also write the output artifact after the netcat command. The embedded script includes:
cat /tmp/hello.txt > {{output}}/hello.txt
Without this line, the workflow will not produce the expected hello.txt output artifact.
📦 Proposed fix to add missing output artifact
nc -w 50 -l -p 24831 < /tmp/hello.txt
+cat /tmp/hello.txt > {{output}}/hello.txt📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| set -ex | |
| # Open a TCP server in listening mode using port 24831 | |
| # Wait for 50 seconds before closing the connection | |
| nc -w 50 -l -p 24831 < /tmp/hello.txt | |
| set -ex | |
| # Open a TCP server in listening mode using port 24831 | |
| # Wait for 50 seconds before closing the connection | |
| nc -w 50 -l -p 24831 < /tmp/hello.txt | |
| cat /tmp/hello.txt > {{output}}/hello.txt |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/workflow/scripts/server.sh` around lines 9 - 12, The server.sh script is
missing the output artifact creation step that is required by the workflow
definition. After the netcat command (nc -w 50 -l -p 24831), add a line that
copies the /tmp/hello.txt file to the output artifact directory using the syntax
cat /tmp/hello.txt > {{output}}/hello.txt to ensure the workflow produces the
expected hello.txt output artifact.
…porting
Three more migration-induced staging failures found by sweeping all 26
originally-failing scenarios:
1. **14 workflow yamls hardcoded ``platform: cpu-x86``** in their
``resources:`` blocks. The previous commit's WorkflowBuilder
platform/bucket auto-injection only helps yamls that templated the
field as ``{{platform}}`` — the hardcoded ones still tried to
schedule on cpu-x86 and failed on isaac-nightly. Replace
``platform: cpu-x86`` → ``platform: {{platform}}`` only in the
deeply-indented ``resources:`` block (>=4 spaces). Leaves the
top-level ``default-values: platform: cpu-x86`` literal default
untouched, so KIND deploys + tests that don't set
OETF_DEFAULT_PLATFORM continue to default to cpu-x86 as before.
2. **CLI submit cwd was the bazel sandbox**, not the staged temp_dir
where ``_copy_localpath_files_to_dir`` writes the rewritten spec +
copied localpath files. The CLI dutifully resolved ``localpath:
folder_input_0`` against the wrong dir and failed:
Error message: The localpath input does not exist!
Pass ``cwd=temp_dir`` to ``subprocess.run`` for the submit invocation
so the rewritten basename-relative localpaths resolve correctly.
3. **CLI submit error reporting buried real errors under version
warnings.** The osmo CLI emits a 3-line ``WARNING: New client X.Y.Z
available...`` block to stderr on every invocation, the previous
error path captured only ``stderr[:500]`` and stopped, so the actual
error (``Error message: ...`` on stdout, or 4xx body on stderr)
never reached the test log. ``_raise_cli_submission_error`` now
strips the version-warning lines from stderr and includes both
stdout and stderr (up to 1000 chars each) in the error body — every
CLI failure now points at the real cause without manual log diving.
Verification (sweep of all 26 originally-failing scenarios on
staging.osmo.nvidia.com / isaac-nightly / OETF_DEFAULT_PLATFORM=
x86-5090):
24/26 PASS (was 0/26 before the migration fixes)
The 2 remaining failures are NOT migration breakage:
- ``app-cli`` task-level workflow failure (the workflow submits + runs
on staging for 105s, then the user task exits 1 — pre-existing test/
workflow logic issue unrelated to !6103).
- ``folder-input`` requires the ``team-osmo`` Swift data credential
registered on the OSMO instance for localpath uploads; Jenkins
provides this via ``dataStorageCreds``, local sandbox doesn't.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/oetf/runner_fixture.py (1)
601-606:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDebug output shows original args, not the enriched args.
The debug print on line 636 uses
self._args(original) instead of the derivedargs(with auto-injected platform/bucket). This could confuse developers when debugging: the stderr message might showargs=[]while the workflow actually receivedplatform=cpu.🔧 Suggested fix
print( f"OETF submitted: {workflow_id} ({workflow_url}) " f"[spec={os.path.basename(self._spec_path)} pool={self._pool} " - f"client={self._client} args={list(self._args)}]", + f"client={self._client} args={args}]", file=sys.stderr, flush=True, )Also applies to: 636-636
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/oetf/runner_fixture.py` around lines 601 - 606, The debug print statement that outputs the arguments is using the original self._args variable instead of the enriched args variable that has been populated with auto-injected platform and bucket values in the preceding lines. Locate the debug print statement that references self._args on or near line 636 and change it to use the local args variable instead, so that the debug output accurately reflects the arguments that will actually be passed to the workflow, including any platform or bucket values that were automatically appended.
🧹 Nitpick comments (1)
test/oetf/runner_fixture.py (1)
401-411: 💤 Low valueMinor edge case: incomplete warning block could suppress subsequent errors.
If the CLI emits a
WARNING: New clientline but never outputs the expectedcurl ... install.shterminator (e.g., format change or truncated output),skipremainsTrueand all subsequent stderr lines are lost. This could hide actual error messages.Consider resetting
skipafter N lines or when encountering an obviously non-warning line:♻️ Defensive fix (optional)
cleaned_stderr_lines = [] skip = False + skip_line_count = 0 for line in raw_stderr.splitlines(): if line.startswith("WARNING: New client"): skip = True + skip_line_count = 0 continue if skip: + skip_line_count += 1 + # Safety: reset after 10 lines to avoid losing real errors + if skip_line_count > 10: + skip = False if line.startswith("curl ") and "install.sh" in line: skip = False continue cleaned_stderr_lines.append(line)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/oetf/runner_fixture.py` around lines 401 - 411, The stderr filtering logic in the loop that processes raw_stderr has a vulnerability where the skip flag can remain True indefinitely if the expected curl install.sh terminator line is never encountered, causing all subsequent error messages to be suppressed. Add defensive logic to reset the skip flag after a reasonable number of lines have been skipped (such as a maximum line count threshold) or when an obviously non-warning line is encountered, ensuring that temporary skip states do not permanently suppress stderr output from being captured.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/oetf/runner_fixture.py`:
- Around line 601-606: The debug print statement that outputs the arguments is
using the original self._args variable instead of the enriched args variable
that has been populated with auto-injected platform and bucket values in the
preceding lines. Locate the debug print statement that references self._args on
or near line 636 and change it to use the local args variable instead, so that
the debug output accurately reflects the arguments that will actually be passed
to the workflow, including any platform or bucket values that were automatically
appended.
---
Nitpick comments:
In `@test/oetf/runner_fixture.py`:
- Around line 401-411: The stderr filtering logic in the loop that processes
raw_stderr has a vulnerability where the skip flag can remain True indefinitely
if the expected curl install.sh terminator line is never encountered, causing
all subsequent error messages to be suppressed. Add defensive logic to reset the
skip flag after a reasonable number of lines have been skipped (such as a
maximum line count threshold) or when an obviously non-warning line is
encountered, ensuring that temporary skip states do not permanently suppress
stderr output from being captured.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 12674653-3fa5-4785-bcee-86659824a8ab
📒 Files selected for processing (15)
test/oetf/runner_fixture.pytest/workflow/bad_command.yamltest/workflow/bad_image.yamltest/workflow/empty_command.yamltest/workflow/exec_timeout.yamltest/workflow/host_mount.yamltest/workflow/invalid_mount.yamltest/workflow/invalid_src_dest_mount.yamltest/workflow/multi_arch_containers.yamltest/workflow/osmo_client_test.yamltest/workflow/per_group_exec_timeout.yamltest/workflow/regex_workflow.yamltest/workflow/resource_type.yamltest/workflow/serial_workflow_multi_arch.yamltest/workflow/serial_workflow_update_dataset.yaml
✅ Files skipped from review due to trivial changes (6)
- test/workflow/exec_timeout.yaml
- test/workflow/multi_arch_containers.yaml
- test/workflow/serial_workflow_update_dataset.yaml
- test/workflow/per_group_exec_timeout.yaml
- test/workflow/bad_image.yaml
- test/workflow/bad_command.yaml
Fixes two OETF migration breakages that surfaced in the gitlab-master staging Jenkins pipeline run after MR !6103 merged:
1. Spec-file resolution under bzlmod
runner_fixture._workspace_root()usesTEST_WORKSPACE, which bzlmod always sets to_mainregardless of which module the test target lives in. Tests in@osmo_workspace//test/scenarios:*that callself.workflow("test/workflow/X.yaml")resolved against<TEST_SRCDIR>/_main/— but their YAML data deps land under<TEST_SRCDIR>/osmo_workspace+/. Every external scenario in staging Jenkins was failing with:Fix: new
_caller_runfiles_repo_root()helper walks from the caller's__file__up to the immediate child ofTEST_SRCDIR— that's the runfiles repo dir that actually owns the test._resolve_spec_path()tries this candidate first for workspace-relative paths, falling back to_workspace_root()forBUILD_WORKSPACE_DIRECTORY/ pre-bzlmod compat.Unit test added:
test_runner_fixture.py::CallerRunfilesRepoRootTestcovers both_mainand dep-module (<name>+) layouts.2. Script + input bundles dropped during the OETF migration
Several yamls under
test/workflow/(app_cli.yaml,dataset_cli.yaml,parallel_workflow_mount_v2.yaml, etc.) referencelocalpath: scripts/X.shorlocalpath: input/Y. Thescripts/andinput/subdirs existed undervalidation/workflow/in the internal repo but weren't moved over with the YAMLs. The yaml-only migration left every script-referencing workflow broken end-to-end (even after fix #1 unblocks the YAML load).Fix: restore the script bundle alongside the yamls (bit-identical copy from internal
validation/workflow/{scripts,input}/) so the public yamls are self-sufficient.test/workflow/BUILD's filegroup already globsscripts/**+input/**withallow_empty=True; no BUILD edit needed.3. Platform/bucket auto-injection in
WorkflowBuilder.submit()Public yamls use
{{platform}}placeholders with KIND-orienteddefault-values:(cpu-x86); on the internal staging Jenkins they needx86-5090/agx-orin-jp6/ etc.WorkflowBuilder.submit()now auto-injectsplatform={fixture.default_platform}andbucket={fixture.default_bucket}as workflow variables when the caller hasn't set them explicitly. The fixture defaults are env-aware (OETF_DEFAULT_PLATFORM,OETF_DEFAULT_BUCKET), so the same scenarios run unchanged across KIND (cpu), staging (x86-5090), and other envs.The internal-repo follow-up exports
OETF_DEFAULT_PLATFORM=x86-5090injenkins/vars/oetfTests.groovyso this kicks in for the staging job.Verification
Local against
https://staging.osmo.nvidia.comonisaac-nightlypool withOETF_DEFAULT_PLATFORM=x86-5090:app-cliFileNotFoundErrorat 3.4sapp-cli-1621); task-level exit-1 is a separate pre-existing issuecommand-validationFileNotFoundErrorat 3.3smount-validationFileNotFoundErrorat 3.3sAll existing framework unit tests + pylint stay green:
bazel test //test/oetf/tests:test_runner_fixture //test/oetf:runner_fixture-pylint.Pairs with
Internal gitlab-master OSMO follow-up to bump the external submodule + export
OETF_DEFAULT_PLATFORMin Jenkins is queued separately.Summary by CodeRabbit
Tests
localpath:to run from the staging directory and improved failure diagnostics with richer, cleaned CLI output.platform/bucketwhen not explicitly provided.Chores
{{platform}}.