Skip to content

feat(snapshot): allow block device path override on restore#5774

Open
meAmitPatil wants to merge 1 commit into
firecracker-microvm:mainfrom
superserve-ai:feat/snapshot-drive-overrides
Open

feat(snapshot): allow block device path override on restore#5774
meAmitPatil wants to merge 1 commit into
firecracker-microvm:mainfrom
superserve-ai:feat/snapshot-drive-overrides

Conversation

@meAmitPatil
Copy link
Copy Markdown

@meAmitPatil meAmitPatil commented Mar 19, 2026

Allow overriding block device host paths when restoring from a snapshot. This is useful when disk paths are non-deterministic (e.g. container runtimes using devmapper) or when restoring on a different host where the backing file is at a different location.

Adds a drive_overrides parameter to the snapshot load API, following the same pattern as the existing network_overrides (#4731) and vsock_override (#5323).

Closes #4014

License Acceptance

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@JamesC1305
Copy link
Copy Markdown
Contributor

Hi @meAmitPatil,

Thanks for the contribution. This is something we'd be interested in supporting, I'll take a look at it in the coming days. In the meantime, would you be able to add python integration tests to this? Something similar to those in #4731 and #5323 should work. Thank you!

@meAmitPatil
Copy link
Copy Markdown
Author

Hi @meAmitPatil,

Thanks for the contribution. This is something we'd be interested in supporting, I'll take a look at it in the coming days. In the meantime, would you be able to add python integration tests to this? Something similar to those in #4731 and #5323 should work. Thank you!

@JamesC1305 Thanks for the reply I've updated the PR with Python integration tests.

Copy link
Copy Markdown
Contributor

@JamesC1305 JamesC1305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes LGTM. A few very small nits, and one suggestion for improving the integration test coverage. The documentation is in a non-obvious place, but we'll plan to refactor it later and merge all the 'overrides' into their own docfile, so no need to worry about it for this PR.

Once you've responded to my feedback RE integ tests, I'll approve a run and we can go from there. Thanks!

BlockState::Virtio(virtio_block_state) => virtio_block_state.virtio_state.activated,
BlockState::VhostUser(vhost_user_block_state) => false,
BlockState::Virtio(state) => state.virtio_state.activated,
BlockState::VhostUser(_state) => false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be state rather than _state?

Comment thread CHANGELOG.md Outdated

### Added

- [#4014](https://github.com/firecracker-microvm/firecracker/issues/4014): Add
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should probably link to the resolving PR rather than the issue.

Comment on lines +625 to +634
# The rootfs disk already exists in the snapshot; override its path
# to the same file (just proving the override mechanism works).
rootfs_path = str(snapshot.disks["rootfs"])
restored_vm.restore_from_snapshot(
snapshot,
drive_overrides=[
{"drive_id": "rootfs", "path_on_host": rootfs_path},
],
resume=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expand on this? Maybe it would be good to use cp to make a copy of the rootfs with a similar name. That way we can verify that the override behaviour works with a different file (+ inode).

@meAmitPatil
Copy link
Copy Markdown
Author

Overall changes LGTM. A few very small nits, and one suggestion for improving the integration test coverage. The documentation is in a non-obvious place, but we'll plan to refactor it later and merge all the 'overrides' into their own docfile, so no need to worry about it for this PR.

Once you've responded to my feedback RE integ tests, I'll approve a run and we can go from there. Thanks!

@JamesC1305 Thanks for the review!

  • Changed _state to _ (wildcard, since the value is unused in that arm)
  • CHANGELOG now links to this PR instead of the issue
  • Test now copies the rootfs to a separate file to verify the override with a different inode

@JamesC1305
Copy link
Copy Markdown
Contributor

Hi @meAmitPatil,

I triggered a run of the integration tests on your PR, here. It seems like there are some issues with style, build and functional tests. Would you be able to address the errors please?

You can run the integration tests locally using tools/devtool test -- integration_tests/{style,build,functional}

Thanks,
James

@meAmitPatil meAmitPatil force-pushed the feat/snapshot-drive-overrides branch 3 times, most recently from 83a278b to 29d2c6c Compare March 31, 2026 16:57
@meAmitPatil
Copy link
Copy Markdown
Author

@JamesC1305 I've squashed the commits, fixed the style issues (gitlint, markdown formatting), and fixed the integration_tests.rs:375 compile error and a jailer path issue in the integration test. All verified passing on a bare metal instance. Could you trigger a new CI run?
Thanks!

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 46.66667% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.84%. Comparing base (2440dbb) to head (ca0db63).
⚠️ Report is 66 commits behind head on main.

⚠️ Current head ca0db63 differs from pull request most recent head a23333b

Please upload reports for the commit a23333b to get more accurate results.

Files with missing lines Patch % Lines
src/vmm/src/persist.rs 5.88% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5774      +/-   ##
==========================================
- Coverage   82.87%   82.84%   -0.04%     
==========================================
  Files         276      276              
  Lines       29729    29757      +28     
==========================================
+ Hits        24638    24651      +13     
- Misses       5091     5106      +15     
Flag Coverage Δ
5.10-m5n.metal 83.13% <46.66%> (-0.04%) ⬇️
5.10-m6a.metal 82.46% <46.66%> (-0.04%) ⬇️
5.10-m6g.metal 79.74% <46.66%> (-0.05%) ⬇️
5.10-m6i.metal 83.14% <46.66%> (-0.04%) ⬇️
5.10-m7a.metal-48xl 82.45% <46.66%> (-0.04%) ⬇️
5.10-m7g.metal 79.75% <46.66%> (-0.04%) ⬇️
5.10-m7i.metal-24xl 83.10% <46.66%> (-0.04%) ⬇️
5.10-m7i.metal-48xl 83.11% <46.66%> (-0.04%) ⬇️
5.10-m8g.metal-24xl 79.74% <46.66%> (-0.04%) ⬇️
5.10-m8g.metal-48xl 79.74% <46.66%> (-0.04%) ⬇️
5.10-m8i.metal-48xl 83.11% <46.66%> (-0.04%) ⬇️
5.10-m8i.metal-96xl 83.11% <46.66%> (-0.04%) ⬇️
6.1-m5n.metal 83.16% <46.66%> (-0.04%) ⬇️
6.1-m6a.metal 82.49% <46.66%> (-0.04%) ⬇️
6.1-m6g.metal 79.74% <46.66%> (-0.05%) ⬇️
6.1-m6i.metal 83.16% <46.66%> (-0.04%) ⬇️
6.1-m7a.metal-48xl 82.48% <46.66%> (-0.04%) ⬇️
6.1-m7g.metal 79.74% <46.66%> (-0.05%) ⬇️
6.1-m7i.metal-24xl 83.17% <46.66%> (-0.05%) ⬇️
6.1-m7i.metal-48xl 83.17% <46.66%> (-0.05%) ⬇️
6.1-m8g.metal-24xl 79.74% <46.66%> (-0.04%) ⬇️
6.1-m8g.metal-48xl 79.74% <46.66%> (-0.04%) ⬇️
6.1-m8i.metal-48xl 83.18% <46.66%> (-0.04%) ⬇️
6.1-m8i.metal-96xl 83.18% <46.66%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JamesC1305
Copy link
Copy Markdown
Contributor

Changes all LGTM. Only thing is the patch coverage is slightly low. Would you be able to extend the unit tests a little bit to increase the coverage and make codecov happy? Seems like the biggest gap is in the newly introduced BlockState functions.

@meAmitPatil meAmitPatil force-pushed the feat/snapshot-drive-overrides branch from 29d2c6c to aa336b4 Compare April 12, 2026 05:45
@meAmitPatil meAmitPatil requested a review from micz010 as a code owner April 12, 2026 05:45
@meAmitPatil meAmitPatil force-pushed the feat/snapshot-drive-overrides branch from aa336b4 to 16c5261 Compare April 12, 2026 05:51
@meAmitPatil
Copy link
Copy Markdown
Author

@JamesC1305 Added unit tests for the id(), is_activated(), and set_host_path() methods on BlockState to improve patch coverage.

@JamesC1305
Copy link
Copy Markdown
Contributor

@meAmitPatil,

Sincere apologies for the delay in getting back to you. Everything looks good to me now. Could you please rebase and fix the merge conflicts? We can then get this merged for you

@meAmitPatil meAmitPatil force-pushed the feat/snapshot-drive-overrides branch from 16c5261 to ca0db63 Compare April 23, 2026 21:30
@meAmitPatil
Copy link
Copy Markdown
Author

@JamesC1305 Done, rebased on latest main and resolved all conflicts. Ready for merge!

JamesC1305
JamesC1305 previously approved these changes Apr 24, 2026
@JamesC1305 JamesC1305 added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Apr 24, 2026
Copy link
Copy Markdown
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm, just one extra verification in the integ test, and more clear handling of vhost-user-blk in API and documentation

{"drive_id": "rootfs", "path_on_host": "/rootfs_override.ext4"},
],
resume=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should test the device still works after restore (and override)

Comment on lines +1663 to +1666
path_on_host:
type: string
description:
The new path on the host for the block device's backing file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency with the PUT /drive API we should have either path_on_host (virtio) or socket (vhost).

}
]
}'
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should mention how this works with vhost-user-blk as well.

@meAmitPatil
Copy link
Copy Markdown
Author

@Manciukic Thanks for the review! All three points have been addressed.

  1. Test now verifies the block device works after restore via an SSH blockdev check
  2. DriveOverride now accepts either path_on_host (virtio-block) or socket (vhost-user-block), matching the PUT /drives API convention
  3. Docs updated with examples for both device types

jliu-tubi added a commit to jliu-tubi/firecracker that referenced this pull request Apr 28, 2026
vsock_override was added in PR firecracker-microvm#5323 which merged after v1.15.1.
The cherry-pick of drive_overrides (firecracker-microvm#5774) brought references to
vsock_override that don't compile against v1.15.1 base.

Removed: persist.rs vsock_override processing block,
snapshot.rs test struct field.
Copy link
Copy Markdown
Contributor

@JamesC1305 JamesC1305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting back to you on this! We have quite a lot of things going on right now.

I just had one quick point about handling block file paths vs socket paths. AFAICT they're mutually exclusive, so an enum would probably be a better way to represent them.

Comment on lines +69 to +78
pub struct DriveOverride {
/// The ID of the drive to modify
pub drive_id: String,
/// The new host path for a virtio-block device's backing file
#[serde(default, skip_serializing_if = "Option::is_none")]
pub path_on_host: Option<String>,
/// The new socket path for a vhost-user-block device's backend
#[serde(default, skip_serializing_if = "Option::is_none")]
pub socket: Option<String>,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe use an enum here instead of allowing both options? They seem mutually exclusive to me.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesC1305 Done refactored to an untagged enum.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified further:

 #[derive(Debug, PartialEq, Eq, Deserialize)]
  #[serde(rename_all = "snake_case")]
  pub enum DriveOverrideBacking {
      PathOnHost(String),
      Socket(String),
  }

Allow overriding block device host paths when restoring from a snapshot.
This is useful when disk paths are non-deterministic (e.g. container
runtimes using devmapper) or when restoring on a different host where
the backing file is at a different location.

Adds a drive_overrides parameter to the snapshot load API, following the
same pattern as the existing network_overrides and vsock_override.
Includes Python integration tests that verify the override works with a
different file and that an unknown drive_id is rejected.

Closes firecracker-microvm#4014

Signed-off-by: Amit Patil <iamitpatil2001@gmail.com>
@meAmitPatil meAmitPatil force-pushed the feat/snapshot-drive-overrides branch from a6164d5 to a23333b Compare May 12, 2026 23:43
Copy link
Copy Markdown
Contributor

@JamesC1305 JamesC1305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small points that I think will clean the code up a little bit more. Almost there!

Comment on lines +69 to +78
pub struct DriveOverride {
/// The ID of the drive to modify
pub drive_id: String,
/// The new host path for a virtio-block device's backing file
#[serde(default, skip_serializing_if = "Option::is_none")]
pub path_on_host: Option<String>,
/// The new socket path for a vhost-user-block device's backend
#[serde(default, skip_serializing_if = "Option::is_none")]
pub socket: Option<String>,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified further:

 #[derive(Debug, PartialEq, Eq, Deserialize)]
  #[serde(rename_all = "snake_case")]
  pub enum DriveOverrideBacking {
      PathOnHost(String),
      Socket(String),
  }

Comment thread src/vmm/src/persist.rs
Comment on lines +431 to +438
let applied = match &entry.backing {
DriveOverrideBacking::PathOnHost { path_on_host } => {
device_state.set_disk_path(path_on_host)
}
DriveOverrideBacking::Socket { socket } => device_state.set_socket_path(socket),
};
if !applied {
Err(SnapshotStateFromFileError::DriveOverrideMismatch)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the enum, we can collapse the two set_*_path functions into a singular one with match. E.g.,

  pub fn apply_override(&mut self, backing: &DriveOverrideBacking)
      -> Result<(), DriveOverrideMismatch>
  {
      match (self, backing) {
          (BlockState::Virtio(s), DriveOverrideBacking::PathOnHost(p)) => {
              s.disk_path = p.clone(); Ok(())
          }
          (BlockState::VhostUser(s), DriveOverrideBacking::Socket(p)) => {
              s.socket_path = p.clone(); Ok(())
          }
          _ => Err(DriveOverrideMismatch),
      }
  }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have small tests for:

  • Testing that we can't supply path_on_host and socket at the same time
  • Testing that an error is returned when path_on_host is provided for vhost-user-blk and socket for virtio-blk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Snaps] Drive mounts for block device disk paths during snapshot loading

3 participants