Skip to content

Backport XSK queue teardown fixes from torvalds/linux mainline (tx_timeout + NULL deref during AF_XDP reload) #61

@Shivam279Chaudhary

Description

@Shivam279Chaudhary

We're hitting a reproducible NIC crash (host becomes permanently unreachable) on ice 2.2.8 during AF_XDP/XSK socket setup/teardown under TX load. The crash sequence is:

  1. False TX watchdog timeout (transmit queue N timed out) during ice_qp_dis
  2. PF reset triggers NULL pointer dereference in ice_qp_dis+0x6c (ring pointer freed during reset)
  3. Workqueue deadlock between ice_service_task and xp_release_deferred — NIC never recovers

The upstream Linux kernel already has 4 commits (merged Jul-Aug 2024) that fix this. They all modify ice_xsk.c (ice_qp_dis / ice_qp_ena):

All fix the original 2d4238f55697 ( torvalds/linux@2d4238f55697) ("ice: Add support for AF_XDP").

Questions:

  1. Are these fixes planned for any upcoming out-of-tree release (2.5.x or later)?
  2. If so, what's the expected timeline?

We've validated that backporting these 4 commits onto ice 2.5.4 resolves the issue in our environment.

Environment: Intel E810, ice 2.2.8 (DKMS), kernel 5.10.253 (Amazon Linux 2), AF_XDP with 62 queues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions