KVM/x86 + vfio/pci: don't crash the VM on guest access to a disabled passthrough BAR#249
Merged
rene merged 2 commits intoJun 22, 2026
Conversation
Previously BAR resource requests and the corresponding pci_iomap() were performed on-demand and without synchronisation, which was racy. Rather than add synchronisation, it's simplest to address this by doing both activities from vfio_pci_core_enable(). The resource allocation and/or pci_iomap() can still fail; their status is tracked and existing calls to vfio_pci_core_setup_barmap() will fail in a similar way to before. This keeps the point of failure as observed by userspace the same, i.e. failures to request/map unused BARs are benign. Fixes: 89e1f7d ("vfio: Add PCI device driver") Signed-off-by: Matt Evans <mattev@meta.com> Link: https://lore.kernel.org/r/20260511145829.2993601-2-mattev@meta.com [ERR_PTR -> IOMEM_ERR_PTR per lkp report] Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 05f2a68b407a6817fe141dd64972c6ab8725312d)
…rough BAR A passed-through PCI device's BAR is mapped into the guest via a VM_IO/ VM_PFNMAP VMA whose fault handler (e.g. vfio_pci_mmap_fault) declines to install a PTE while the device's memory space is disabled, such as right after the guest clears PCI_COMMAND.MEM. If another vCPU accesses that BAR during the window, the gup in the page-fault path fails with an error pfn even though the memslot is still valid, and KVM_RUN returns -EFAULT to userspace, crashing the VM. A guest can trigger this at will, so it is a guest-triggerable host-side VM kill. On real hardware an access to a BAR with memory decoding disabled completes as an Unsupported Request (reads return all-ones, writes are dropped). KVM can present the same behaviour by treating the access as MMIO and emulating it, which is exactly what the noslot path already does. Distinguish the VM_IO/VM_PFNMAP fault-handler failure from other error pfns with a new KVM_PFN_ERR_PFNMAP value (in-range, so existing error-pfn range checks are unaffected) and route it to kvm_handle_noslot_fault() in the x86 TDP fault path. Genuine, non-pfnmap faults (e.g. a vanished anonymous backing) still take the fatal -EFAULT path, so real errors are not masked. The MMIO mapping self-heals when the device memory is re-enabled and the memslot is updated, bumping the MMIO generation. Fixes: abafbc5 ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
rene
approved these changes
Jun 22, 2026
dcdba3d
into
lf-edge:eve-kernel-amd64-v6.12.49-generic
4 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix a guest-triggerable host-side VM crash on EVE nodes that pass a PCI device
(e.g. an Intel iGPU) through to a guest. When the guest clears
PCI_COMMAND.MEMon the assigned device while another vCPU accesses the device BAR,
KVM_RUNreturns
-EFAULTand the VM is killed.Two commits:
cherry-pick of upstream
05f2a68b407a(Matt Evans). Makes BAR setup eager andsynchronised, removing the racy on-demand path. (Replaces an earlier EVE-local
backport that carried a custom
setup_bars_at_enabledebug module-param; thatknob is intentionally dropped here.)
BAR — the actual crash fix.
Root cause / mechanism
A passed-through BAR is mapped via a
VM_IO/VM_PFNMAPVMA whose fault handler(
vfio_pci_mmap_fault) declines to install a PTE while device memory isdisabled. A concurrent vCPU fault reaches
kvm_faultin_pfn()with a validmemslot but a failed GUP →
KVM_PFN_ERR_FAULT→kvm_handle_error_pfn()→-EFAULT. On bare metal the same access is an Unsupported Request (readsall-ones, writes dropped), not fatal.
The fix adds a distinct
KVM_PFN_ERR_PFNMAPand routes it tokvm_handle_noslot_fault()(MMIO emulation / UR semantics) instead of-EFAULT. Normal-memory faults still return-EFAULT, so real errors aren'tmasked.
Validation
--strict: 0/0/0 on both commits.drivers/vfio/pci/,arch/x86/kvm/,virt/kvm/).PCI_COMMAND.MEMtoggle):pre-fix crashes at cycle 2 (~1 s); post-fix 200k cycles, no crash, and
48 h across 17 nodes with no recurrence.
Upstream status
The KVM fix has been submitted to LKML/KVM (based on
kvm-x86/next); underreview. This is the eve-kernel (v6.12.49) backport. The vfio commit is a clean
cherry-pick of mainline
05f2a68b407a(not yet in 6.12.y stable).Backport / other branches
Lead branch
eve-kernel-amd64-v6.12.49-generic. Other activeeve-kernel-*branches that pass through PCI BARs likely need the same two commits (follow-up
backports).
🤖 Generated with Claude Code