[CI] pytorch-finetuning / quick-train-full-finetuning failed on stx (linux)

This issue was opened automatically by the **Test Playbooks** workflow after the test `quick-train-full-finetuning` failed on the `main` branch.

## Failure scope

- **Playbook:** `pytorch-finetuning`
- **Test id:** `quick-train-full-finetuning`
- **Device:** `stx`
- **Operating system:** `linux`
- **Runner labels:** `self-hosted`, `Linux`, `stx`
- **Runner name:** `xsj-aimlab-stxp-01`
- **Commit:** `0b670a0916a72ede16f803aaad15cd1673ec0516`
- **Workflow run:** https://github.com/amd/playbooks/actions/runs/27048931391

## Hardware / OS to use to reproduce

Run the failing test on a machine that matches the runner labels above (OS = `linux`, device = `stx`). The repo's self-hosted runners already advertise these labels; if you reproduce locally, use the same OS family and the same AMD device class.

## How to dispatch the same test from CI

Re-run only the failing playbook on the same matrix entry by triggering the workflow with the playbook id:

```bash
gh workflow run test-playbooks.yml --repo amd/playbooks -f playbook_id=pytorch-finetuning
```

The workflow's matrix narrows down to this `(device, platform)` combination automatically based on the playbook's `tested_platforms`.

## How to run just this test locally

```bash
python .github/scripts/run_playbook_tests.py --playbook pytorch-finetuning --platform linux --device stx
```

The runner extracts test blocks from `playbooks/*/pytorch-finetuning/README.md` (the failing block starts around line 222).

## Failing test (verbatim from the README)

- **Setup:** `source finetune-venv/bin/activate`
- **Timeout:** `1200s`

```python
import os
import subprocess
import sys

os.environ["QUICK_TRAIN"] = "1"
os.environ["QUICK_TRAIN_MODEL"] = "unsloth/gemma-3-4b-it"
r = subprocess.run([sys.executable, "train_full_finetuning.py"], timeout=600)
sys.exit(r.returncode)
```

## Result

- **Exit code:** `1`

### stderr (last lines)

```

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:01<00:01,  1.10s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.13it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.09it/s]

Tokenizing train dataset:   0%|          | 0/6 [00:00<?, ? examples/s]
Tokenizing train dataset: 100%|██████████| 6/6 [00:00<00:00, 508.46 examples/s]

Tokenizing eval dataset:   0%|          | 0/2 [00:00<?, ? examples/s]
Tokenizing eval dataset: 100%|██████████| 2/2 [00:00<00:00, 672.60 examples/s]
The model is already on multiple devices. Skipping the move to device specified in `args`.

  0%|          | 0/1 [00:00<?, ?it/s]/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/finetune-venv/lib/python3.13/site-packages/transformers/integrations/sdpa_attention.py:96: UserWarning: Mem Efficient attention on Current AMD GPU is still experimental. Enable it with TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1. (Triggered internally at /__w/rockrel/rockrel/external-builds/pytorch/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:383.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Traceback (most recent call last):
  File "/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/train_full_finetuning.py", line 209, in <module>
    trainer.train()
    ~~~~~~~~~~~~~^^
  File "/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/finetune-venv/lib/python3.13/site-packages/transformers/trainer.py", line 2325, in train
    return inner_training_loop(
        args=args,
    ...<2 lines>...
        ignore_keys_for_eval=ignore_keys_for_eval,
    )
  File "/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/finetune-venv/lib/python3.13/site-packages/transformers/trainer.py", line 2740, in _inner_training_loop
    self.optimizer.step()
    ~~~~~~~~~~~~~~~~~~~^^
  File "/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/finetune-venv/lib/python3.13/site-packages/accelerate/optimizer.py", line 179, in step
    self.optimizer.step(closure)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/finetune-venv/lib/python3.13/site-packages/torch/optim/lr_scheduler.py", line 166, in wrapper
    return func.__get__(opt, opt.__class__)(*args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/finetune-venv/lib/python3.13/site-packages/torch/optim/optimizer.py", line 533, in wrapper
    out = func(*args, **kwargs)
  File "/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/finetune-venv/lib/python3.13/site-packages/torch/optim/optimizer.py", line 81, in _use_grad
    ret = func(*args, **kwargs)
  File "/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/finetune-venv/lib/python3.13/site-packages/torch/optim/adam.py", line 238, in step
    has_complex = self._init_group(
        group,
    ...<5 lines>...
        state_steps,
    )
  File "/home/user/actions-runner/_work/playbooks/playbooks/playbooks/supplemental/pytorch-finetuning/assets/finetune-venv/lib/python3.13/site-packages/torch/optim/adam.py", line 178, in _init_group
    state["exp_avg"] = torch.zeros_like(
                       ~~~~~~~~~~~~~~~~^
        p, memory_format=torch.preserve_format
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacity of 29.24 GiB of which 26.81 MiB is free. Of the allocated memory 28.53 GiB is allocated by PyTorch, and 233.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)

  0%|          | 0/1 [00:10<?, ?it/s]

```

### stdout (last lines)

```
Loading dataset...
QUICK_TRAIN=1: using non-gated model for smoke test: unsloth/gemma-3-4b-it
QUICK_TRAIN=1: using 1 step and a tiny dataset (smoke test).
Train samples: 6, Test samples: 2
Total selected samples: 8

Loading unsloth/gemma-3-4b-it...
Note: Model is stored as MXFP4 on Hugging Face but will be loaded as BF16 for training
(This is expected - the warning about MXFP4 is informational)

Model loaded. Weights footprint: 8.60 GB
Gradient checkpointing enabled (saves memory during backprop)
Using bf16 mixed precision.
Starting Full Fine-tuning
Model: unsloth/gemma-3-4b-it
Trainable parameters: 4,300,079,472
Effective batch size: 16
Learning rate: 2e-05
Quick smoke mode enabled: tiny dataset + max_steps=1


```

---
_This issue is opened and deduplicated by `.github/scripts/create_failure_issues.py`. Close it once the failure is fixed; subsequent failures with the same scope will reopen a fresh issue._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] pytorch-finetuning / quick-train-full-finetuning failed on stx (linux) #349

Failure scope

Hardware / OS to use to reproduce

How to dispatch the same test from CI

How to run just this test locally

Failing test (verbatim from the README)

Result

stderr (last lines)

stdout (last lines)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[CI] pytorch-finetuning / quick-train-full-finetuning failed on stx (linux) #349

Description

Failure scope

Hardware / OS to use to reproduce

How to dispatch the same test from CI

How to run just this test locally

Failing test (verbatim from the README)

Result

stderr (last lines)

stdout (last lines)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions