Skip to content

KVM/x86 + vfio/pci: don't crash the VM on guest access to a disabled passthrough BAR#249

Merged
rene merged 2 commits into
lf-edge:eve-kernel-amd64-v6.12.49-genericfrom
rucoder:mikem/kvm-vfio-disabled-bar-fix
Jun 22, 2026
Merged

KVM/x86 + vfio/pci: don't crash the VM on guest access to a disabled passthrough BAR#249
rene merged 2 commits into
lf-edge:eve-kernel-amd64-v6.12.49-genericfrom
rucoder:mikem/kvm-vfio-disabled-bar-fix

Conversation

@rucoder

@rucoder rucoder commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fix a guest-triggerable host-side VM crash on EVE nodes that pass a PCI device
(e.g. an Intel iGPU) through to a guest. When the guest clears PCI_COMMAND.MEM
on the assigned device while another vCPU accesses the device BAR, KVM_RUN
returns -EFAULT and the VM is killed.

Two commits:

  1. vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() — clean
    cherry-pick of upstream 05f2a68b407a (Matt Evans). Makes BAR setup eager and
    synchronised, removing the racy on-demand path. (Replaces an earlier EVE-local
    backport that carried a custom setup_bars_at_enable debug module-param; that
    knob is intentionally dropped here.)
  2. KVM: x86/mmu: emulate (not -EFAULT) guest access to a disabled passthrough
    BAR
    — the actual crash fix.

Root cause / mechanism

A passed-through BAR is mapped via a VM_IO/VM_PFNMAP VMA whose fault handler
(vfio_pci_mmap_fault) declines to install a PTE while device memory is
disabled. A concurrent vCPU fault reaches kvm_faultin_pfn() with a valid
memslot but a failed GUP → KVM_PFN_ERR_FAULTkvm_handle_error_pfn()
-EFAULT. On bare metal the same access is an Unsupported Request (reads
all-ones, writes dropped), not fatal.

The fix adds a distinct KVM_PFN_ERR_PFNMAP and routes it to
kvm_handle_noslot_fault() (MMIO emulation / UR semantics) instead of
-EFAULT. Normal-memory faults still return -EFAULT, so real errors aren't
masked.

Validation

  • checkpatch --strict: 0/0/0 on both commits.
  • Builds clean (drivers/vfio/pci/, arch/x86/kvm/, virt/kvm/).
  • Deterministic reproducer (guest BAR0 MMIO spinner + PCI_COMMAND.MEM toggle):
    pre-fix crashes at cycle 2 (~1 s); post-fix 200k cycles, no crash, and
    48 h across 17 nodes with no recurrence.

Upstream status

The KVM fix has been submitted to LKML/KVM (based on kvm-x86/next); under
review. This is the eve-kernel (v6.12.49) backport. The vfio commit is a clean
cherry-pick of mainline 05f2a68b407a (not yet in 6.12.y stable).

Backport / other branches

Lead branch eve-kernel-amd64-v6.12.49-generic. Other active eve-kernel-*
branches that pass through PCI BARs likely need the same two commits (follow-up
backports).

🤖 Generated with Claude Code

metamev and others added 2 commits June 22, 2026 09:51
Previously BAR resource requests and the corresponding pci_iomap()
were performed on-demand and without synchronisation, which was racy.
Rather than add synchronisation, it's simplest to address this by
doing both activities from vfio_pci_core_enable().

The resource allocation and/or pci_iomap() can still fail; their
status is tracked and existing calls to vfio_pci_core_setup_barmap()
will fail in a similar way to before.  This keeps the point of failure
as observed by userspace the same, i.e. failures to request/map unused
BARs are benign.

Fixes: 89e1f7d ("vfio: Add PCI device driver")
Signed-off-by: Matt Evans <mattev@meta.com>
Link: https://lore.kernel.org/r/20260511145829.2993601-2-mattev@meta.com
[ERR_PTR -> IOMEM_ERR_PTR per lkp report]
Signed-off-by: Alex Williamson <alex@shazbot.org>
(cherry picked from commit 05f2a68b407a6817fe141dd64972c6ab8725312d)
…rough BAR

A passed-through PCI device's BAR is mapped into the guest via a VM_IO/
VM_PFNMAP VMA whose fault handler (e.g. vfio_pci_mmap_fault) declines to
install a PTE while the device's memory space is disabled, such as right
after the guest clears PCI_COMMAND.MEM. If another vCPU accesses that BAR
during the window, the gup in the page-fault path fails with an error pfn
even though the memslot is still valid, and KVM_RUN returns -EFAULT to
userspace, crashing the VM. A guest can trigger this at will, so it is a
guest-triggerable host-side VM kill.

On real hardware an access to a BAR with memory decoding disabled completes
as an Unsupported Request (reads return all-ones, writes are dropped). KVM
can present the same behaviour by treating the access as MMIO and emulating
it, which is exactly what the noslot path already does.

Distinguish the VM_IO/VM_PFNMAP fault-handler failure from other error pfns
with a new KVM_PFN_ERR_PFNMAP value (in-range, so existing error-pfn range
checks are unaffected) and route it to kvm_handle_noslot_fault() in the x86
TDP fault path. Genuine, non-pfnmap faults (e.g. a vanished anonymous
backing) still take the fatal -EFAULT path, so real errors are not masked.
The MMIO mapping self-heals when the device memory is re-enabled and the
memslot is updated, bumping the MMIO generation.

Fixes: abafbc5 ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
@rucoder rucoder requested a review from rene June 22, 2026 10:05
@rene rene merged commit dcdba3d into lf-edge:eve-kernel-amd64-v6.12.49-generic Jun 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants