Skip to content

task(vmop): add migration progress status and reasons#2182

Draft
danilrwx wants to merge 27 commits intomainfrom
feat/vmop/migrate-percent
Draft

task(vmop): add migration progress status and reasons#2182
danilrwx wants to merge 27 commits intomainfrom
feat/vmop/migrate-percent

Conversation

@danilrwx
Copy link
Copy Markdown
Contributor

@danilrwx danilrwx commented Apr 2, 2026

Description

Implement VMOP migration progress reporting and make sync-phase progress more informative and stable.

Changes in this PR:

  • add status.progress support for migration VirtualMachineOperation objects;
  • add detailed Completed reasons for migration lifecycle stages;
  • centralize sync progress range in migrationprogress.SyncRangeMin/SyncRangeMax;
  • add a dedicated migration progress package with mapper/strategy logic;
  • consume KubeVirt migration transfer details from status.migrationState.transferStatus when they are available;
  • keep degraded time/phase-based progress as a fallback when transfer details are unavailable;
  • reduce repeated artificial stall bumps in degraded mode.

Why do we need it, and what problem does it solve?

Migration VMOP progress was previously coarse and bursty. In practice users could observe sequences like 1 -> 2 -> 3 -> 16 -> 41 -> 42..51 -> 100, where the visible progress was driven by reconcile cadence and fallback estimation rather than by the actual migration state.

This PR improves the user-facing progress model by:

  • exposing progress directly in VMOP status;
  • providing stage-specific reasons such as disks preparation, target scheduling, target preparation, syncing, source suspended, and target resumed;
  • improving sync-phase progress calculation so it can consume KubeVirt migration transfer counters and runtime signals from transferStatus when they are available;
  • keeping the old degraded estimator as a compatibility fallback;
  • preventing the previous near +1 per reconcile behavior from the stall bump logic.

As a result, migration progress becomes more actionable for users and easier to debug from VMOP conditions.

This PR depends on the matching 3p-kubevirt API change that moves migration transfer-related fields under VirtualMachineInstanceMigrationState.transferStatus.

What is the expected result?

  1. Start a VM migration through VirtualMachineOperation.
  2. Observe status.progress on the VMOP object.
  3. During early lifecycle phases, progress should move through the coarse stage milestones.
  4. During sync, progress should stay within the configured sync range and advance more naturally.
  5. When KubeVirt exposes status.migrationState.transferStatus, sync progress should use those runtime values.
  6. Completed condition reasons should reflect the current migration stage.
  7. In environments where migration transfer details are not available yet, degraded fallback should still work without repeated artificial increments on every reconcile.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: core
type: feature
summary: "Add VMOP migration progress reporting with detailed lifecycle reasons and smoother sync-phase progress calculation."
impact_level: low

@danilrwx danilrwx force-pushed the feat/vmop/migrate-percent branch 12 times, most recently from c478cf2 to b884608 Compare April 3, 2026 14:48
danilrwx added 17 commits April 3, 2026 17:22
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Handle nil migrations as generic failures and keep terminal VMOP metrics detection explicit.

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Read migration byte counters from VirtualMachineInstanceMigration status and feed them into VMOP migration progress calculation.

Keep degraded mode as a fallback when counters are absent and use a local kubevirt.io/api replace for the patched 3p-kubevirt module during development.

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Keep degraded-mode stall bump monotonic, but avoid issuing another artificial increment when the computed base progress only trails the previous value by the prior bump.

This removes the near +1-per-reconcile behavior while preserving the fallback path for migrations without byte counters.

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
…yncs

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
danilrwx added 6 commits April 3, 2026 17:22
…ss store

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
…and adaptive stall

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
- BuildRecord now reads AutoConverge from MigrationConfiguration.AllowAutoConverge
  instead of vmop.Spec.Force; remove resolveAutoConverge helper
- isAtMaxThrottle: !AutoConverge => always at max (safe mode),
  AutoConverge => throttle >= 0.99
- Add live TargetDiskError detection via target pod events (FailedAttachVolume/FailedMount)
- Preserve NotConverging terminal reason when migration fails with generic reason
- Add unit tests for IsNotConverging, BuildRecord AutoConverge, and integration
  tests for TargetPreparing, TargetResumed, SourceSuspended, NotConverging persistence,
  TargetDiskError live detection

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
kubelet emits FailedAttachVolume/FailedMount events while pod is in
Pending phase — before ContainerCreating state is ever reached.
The previous ContainerCreating guard prevented detection entirely.
Now check events for any Pending pod that is not being deleted.

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
This reverts commit 56de1b8.

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
@danilrwx danilrwx force-pushed the feat/vmop/migrate-percent branch from a410aa5 to 5619cf3 Compare April 3, 2026 15:22
danilrwx added 4 commits April 3, 2026 17:51
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant