Skip to content

SOS reports are empty due to cross-device hard link extraction failure #131

@m4d3bug

Description

@m4d3bug

Bug Description

SOS reports collected by gather_sos and gather_edpm_sos are empty (0 bytes) after must-gather completes. The sosreport tar.xz is downloaded successfully from the nodes, but the extraction step fails with Invalid cross-device link errors on all files, and then the original tar.xz is deleted — resulting in complete data loss.

Root Cause

GNU tar 1.34 (shipped with RHEL 9) has a bug where --one-top-level combined with --strip-components triggers Invalid cross-device link errors on all file types, even when source and destination are on the same filesystem.

# FAILS - 17273 errors, 0 files extracted
tar --one-top-level=/tmp/test --strip-components=1 -Jxf sosreport.tar.xz

# WORKS - 0 errors, 15052 files extracted
tar -C /tmp/test --strip-components=1 -Jxf sosreport.tar.xz

Affected Files

collection-scripts/gather_sos:

tar -i --one-top-level="${SOS_PATH_NODES}/sosreport-$node" --strip-components=1 --exclude='*/dev/null' -Jxf "${sos_file}"
rm "${sos_file}"    # deletes the original even if tar failed

collection-scripts/gather_edpm_sos:

tar --one-top-level="${SOS_PATH_NODES}/sosreport-$node" --strip-components=1 --exclude='*/dev/null' -Jxf ${SOS_PATH_NODES}/sosreport-$node.tar.xz
rm "${SOS_PATH_NODES}/sosreport-$node.tar.xz"    # deletes the original even if tar failed

Error Messages (from must-gather pod logs)

tar: /must-gather/sos-reports/_all_nodes/sosreport-edpm-compute-01/version.txt: Cannot open: Invalid cross-device link
tar: /must-gather/sos-reports/_all_nodes/sosreport-edpm-compute-01/sos_reports: Cannot mkdir: Invalid cross-device link
tar: /must-gather/sos-reports/_all_nodes/sosreport-edpm-compute-01/chkconfig: Cannot create symlink to 'sos_commands/services/chkconfig_--list': Invalid cross-device link
...

Over 17,000 files failed to extract in our test.

Impact

  • All SOS reports are affected — both control plane nodes (gather_sos) and EDPM nodes (gather_edpm_sos)
  • The sosreport directories exist but are completely empty (0 bytes)
  • The original tar.xz archives are deleted after failed extraction, so there is no way to recover the data
  • Users are unaware of the failure unless they check the pod logs

Suggested Fix

  1. Replace --one-top-level=DIR with -C DIR (which achieves the same result without the bug)
  2. Check the tar exit code before deleting the original archive

See PR #132.

Environment

  • OpenShift 4.18.30
  • openstack-must-gather image: registry.redhat.io/rhoso-operators/openstack-must-gather-rhel9:1.0 (sha256:ab86b53a49adf8ad2b9658076cba5c55b3a19552ff17ea178dfe346fc1ac9979)
  • openstack-operator v1.0.20
  • GNU tar 1.34 (RHEL 9)

Steps to Reproduce

  1. Run the OpenStack must-gather:
    oc adm must-gather --image=registry.redhat.io/rhoso-operators/openstack-must-gather-rhel9:1.0
  2. Check the sos-reports/_all_nodes/ directories — they will be empty
  3. Check the must-gather pod logs for Invalid cross-device link errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions