Skip to content

[VFIO] add basic implementation#5870

Draft
ShadowCurse wants to merge 22 commits into
firecracker-microvm:mainfrom
ShadowCurse:vfio_with_dependencies
Draft

[VFIO] add basic implementation#5870
ShadowCurse wants to merge 22 commits into
firecracker-microvm:mainfrom
ShadowCurse:vfio_with_dependencies

Conversation

@ShadowCurse
Copy link
Copy Markdown
Contributor

@ShadowCurse ShadowCurse commented May 8, 2026

Changes

Add basic implementation of the VFIO device pass-through.
Current version only allows devices to be added before VM boot.
Other limitations:

  • Only devices with MSIx interrupts are supported.
  • No INTx interrupt support
  • No ROM BAR/IO BAR support
  • No BAR relocation/resizing

Reason

Provide a way to pass physical PCI devices into VM

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@ShadowCurse ShadowCurse self-assigned this May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 25.51799% with 683 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.18%. Comparing base (bd656b9) to head (3b8ed67).
⚠️ Report is 17 commits behind head on main.

⚠️ Current head 3b8ed67 differs from pull request most recent head 50e789e

Please upload reports for the commit 50e789e to get more accurate results.

Files with missing lines Patch % Lines
src/vmm/src/vfio.rs 18.92% 557 Missing ⚠️
src/vmm/src/device_manager/pci_mngr.rs 0.00% 32 Missing ⚠️
src/vmm/src/rpc_interface.rs 11.42% 31 Missing ⚠️
src/vmm/src/pci/configuration.rs 45.94% 20 Missing ⚠️
src/vmm/src/resources.rs 35.00% 13 Missing ⚠️
src/vmm/src/device_manager/mod.rs 10.00% 9 Missing ⚠️
src/vmm/src/lib.rs 10.00% 9 Missing ⚠️
src/vmm/src/builder.rs 33.33% 6 Missing ⚠️
src/vmm/src/pci/msix.rs 79.31% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5870      +/-   ##
==========================================
- Coverage   82.84%   81.18%   -1.66%     
==========================================
  Files         277      280       +3     
  Lines       29912    30814     +902     
==========================================
+ Hits        24781    25017     +236     
- Misses       5131     5797     +666     
Flag Coverage Δ
5.10-m5n.metal 81.30% <25.51%> (-1.84%) ⬇️
5.10-m6a.metal 80.59% <25.51%> (-1.88%) ⬇️
5.10-m6g.metal 77.99% <25.51%> (-1.79%) ⬇️
5.10-m6i.metal 81.30% <25.51%> (-1.85%) ⬇️
5.10-m7a.metal-48xl 80.58% <25.51%> (-1.88%) ⬇️
5.10-m7g.metal 77.99% <25.51%> (-1.79%) ⬇️
5.10-m7i.metal-24xl 81.28% <25.51%> (-1.84%) ⬇️
5.10-m7i.metal-48xl 81.27% <25.51%> (-1.85%) ⬇️
5.10-m8g.metal-24xl 77.99% <25.51%> (-1.79%) ⬇️
5.10-m8g.metal-48xl 77.99% <25.51%> (-1.79%) ⬇️
5.10-m8i.metal-48xl 81.27% <25.51%> (-1.85%) ⬇️
5.10-m8i.metal-96xl 81.28% <25.51%> (-1.84%) ⬇️
6.1-m5n.metal 81.32% <25.51%> (-1.85%) ⬇️
6.1-m6a.metal 80.62% <25.51%> (-1.88%) ⬇️
6.1-m6g.metal 77.98% <25.51%> (-1.80%) ⬇️
6.1-m6i.metal 81.33% <25.51%> (-1.84%) ⬇️
6.1-m7a.metal-48xl 80.61% <25.51%> (-1.89%) ⬇️
6.1-m7g.metal 77.99% <25.51%> (-1.79%) ⬇️
6.1-m7i.metal-24xl 81.33% <25.51%> (-1.85%) ⬇️
6.1-m7i.metal-48xl 81.33% <25.51%> (-1.85%) ⬇️
6.1-m8g.metal-24xl 77.98% <25.51%> (-1.79%) ⬇️
6.1-m8g.metal-48xl 77.99% <25.51%> (-1.79%) ⬇️
6.1-m8i.metal-48xl 81.34% <25.51%> (-1.84%) ⬇️
6.1-m8i.metal-96xl 81.34% <25.51%> (-1.84%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 7 times, most recently from ab237e9 to 8949587 Compare May 12, 2026 14:58
Add the VfioConfig and VfioConfigs types for describing VFIO device
configuration. Wire them into VmResources and VmmConfig so that VFIO
devices can be specified before boot. Actual device setup will be added
in later commits.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add PUT /vfio/{id} API endpoint for configuring VFIO passthrough
devices.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
The future commits VFIO code wil use ArrayVec for BAR
mappings and MSI-X hole tracking, so make it required.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO devices can expose both 32-bit and 64-bit BARs. The existing
Bars type only handled 64-bit BARs. Add set_bar_32() for 32-bit
BARs and get_bar_addr() that works with both widths.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO passthrough needs to emulate the MSI-X table and PBA regions
within device BARs. Add accessor methods to MsixCap for extracting
table/PBA BIR, offset, size, and the enabled/masked status bits.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO BAR regions containing MSI-X table/PBA must be split into
mmappable and emulated parts. KVM memory slots require host-page
alignment, but MSI-X structures can sit at arbitrary offsets
within a BAR. Add align_up_host_page, align_down_host_page, and
offset_from_lower_host_page helpers to expand emulated regions
to page boundaries.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the rust-vmm vfio-bindings (0.6.2) and vfio-ioctls (0.6.0)
crates that provide wrappers around VFIO kernel interfaces.
These are needed by the VFIO code.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the core VFIO passthrough implementation. This allows physical PCI
devices bound to vfio-pci on the host to be presented to the guest with
minimal overhead.

The implementation covers:
- PCI config space: most reads/writes are proxied to the physical
  device. BARs, MSI-X capability, and select extended capabilities are
  emulated or masked by Firecracker.
- BAR regions: device MMIO regions are mmap'd from the VFIO device fd
  and mapped into guest address space as KVM memory slots. BARs
  containing MSI-X table/PBA are split around the emulated regions using
  either sparse-mmap caps or manual hole calculation.
- MSI-X interrupts: the table and PBA are emulated in Firecracker.
  Physical device interrupts are delivered via eventfds wired through
  KVM irqfd.
- DMA: guest RAM regions are mapped into the VFIO container's IOMMU so
  the device can DMA directly to guest memory.

Only MSI-X interrupts are supported. IO BARs, ROM BARs, legacy INTx, and
MSI (non-X) are not handled.

Also no support for hot-plug/unplug of VFIO devices is present at this
point, so no cleanup for created VFIO devices is present. Only part
which is concerned with cleanup is the device setup code which ensures
that all resources are cleaned up if there are any errors during device
set-up.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add devtool options for preparing a PCI device for VFIO passthrough
testing. --vfio-device accepts a block device path (e.g. /dev/nvme1n1)
or a PCI SBDF, resolves it to a PCI device, binds it to vfio-pci, and
passes the SBDF and sysfs path to the test container via environment
variables. --first-vfio-pci-device is a fallback that searches for the
first NVMe device already bound to vfio-pci if the primary device is not
found.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add an integration tests that verify VFIO passthrough with a physical
NVMe device. Tests are gated behind the `vfio` pytest mark and
FC_VFIO_PCI_SBDF environment variable so they only run when a suitable
device is available.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Current VFIO implementation has some restrictions:
- Does not work without PCI since VFIO devices are PCI devices
- Does not work with virtio-mem device since we don't update DMA
  mappings on hot-plug/unplug
- Does not work with virtio-balloon since it can `fadvise` on memory

In order to prevent VMs being launched with invalid configurations,
implement multiple checks for invalid configurations:
- At API level, prevent adding of incompatible combinations (VFIO after
  balloon/mem or in reverse)
- At vm creation or snapshot restoraton since they get VmResources from
  other sources.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO device state is opaque to the VMM and cannot be serialized
or restored. Add VFIO devices to the list of snapshot-incompatible
devices so that snapshot requests are rejected with a clear error
instead of producing a corrupt snapshot.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
The vfio-ioctls crate uses syscalls that were not previously in
the seccomp allowlists. Add them for both architectures:

- dup: used by vfio-ioctls to duplicate file descriptors during
  VFIO device setup.
- ioctl(VFIO_GROUP_UNSET_CONTAINER): used during VfioDevice drop
  to detach the group from the container.
- ioctl(VFIO_IOMMU_UNMAP_DMA): used during cleanup to unmap DMA
  regions.
- pread64/pwrite64: used for reading/writing device PCI config
  and BAR regions, both on the API thread during setup and on
  vCPU threads during MMIO/PIO exits.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
The VFIO integration tests use an NVMe device to verify passthrough
functionality. Update a kernel config to enable the NVMe core and block
device drivers so the guest can detect and use the passthrough device.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO tests need exclusive access to the passthrough device, so they
cannot run in parallel with other tests. Add a separate Buildkite step
in the PR pipeline that runs only the vfio-marked tests, similar to the
existing performance step. CI instances will have an additional 1GB NVMe
device at /dev/nvme1n1 for this purpose.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 2 times, most recently from 11aee55 to 63c3bd5 Compare May 14, 2026 15:40
Add docs/vfio.md covering how VFIO passthrough works in Firecracker,
prerequisites (IOMMU, vfio-pci binding), configuration via API and
config file, security considerations, snapshot incompatibility, and
current limitations.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add a changelog entry for the new VFIO PCI device passthrough feature.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
do not merge: point to vfio artifacts

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch from 63c3bd5 to f6d6fea Compare May 14, 2026 15:45
Wire up the code to allow after VM boot hot-plugging
of VFIO devices.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
With hot-plug support, we need to expand the list of syscalls to allow
new VFIO specific ones.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Code that cleans up BARs allocations will need to know both addr and
size for each bar. Add additional utility functions to get these values.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch from f6d6fea to 50e789e Compare May 14, 2026 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant