Skip to content

Commit 19e1704

Browse files
committed
CA-411766: Detach VBDs right after VM Halted
Fix race condition when destroying VBD after VM power_state change. A client of XenServer attempted to destroy a VBD immediately after receiving an event triggered by a VM power_state change, resulting in a failure. The root cause is below: 1. The update to VM's power_state and the update to VBDs are not performed atomically, so the client may receive the event from the update to VM's power_state and attempt to operate VBDs before their state is updated. 2. If the VM is running on a supporter, database operations require sending RPCs to the coordinator, introducing additional latency. 3. Between the updates to the VM's power_state and the VBDs, xapi also updates the pending_guidences fields, which requires at least eight database operations and then further delays the VBD update. It's not straightforward to add transactions for these DB operations. The workaround is to move the update to pending_guildences to the end of the relevant database operations (VBDs, VIFs, GPUs, etc), ensuring that VBDs are updated immediately after the VM's power_state change. This is related to XSI-1915 where Citrix deploy tool MCS triggered the issue. Signed-off-by: Bengang Yuan <[email protected]>
1 parent c3761f9 commit 19e1704

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

ocaml/xapi/xapi_vm_lifecycle.ml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -856,8 +856,6 @@ let force_state_reset_keep_current_operations ~__context ~self ~value:state =
856856
if state = `Suspended then
857857
remove_pending_guidance ~__context ~self ~value:`restart_device_model ;
858858
if state = `Halted then (
859-
remove_pending_guidance ~__context ~self ~value:`restart_device_model ;
860-
remove_pending_guidance ~__context ~self ~value:`restart_vm ;
861859
(* mark all devices as disconnected *)
862860
List.iter
863861
(fun vbd ->
@@ -899,7 +897,9 @@ let force_state_reset_keep_current_operations ~__context ~self ~value:state =
899897
)
900898
(Db.VM.get_VUSBs ~__context ~self) ;
901899
(* Blank the requires_reboot flag *)
902-
Db.VM.set_requires_reboot ~__context ~self ~value:false
900+
Db.VM.set_requires_reboot ~__context ~self ~value:false ;
901+
remove_pending_guidance ~__context ~self ~value:`restart_device_model ;
902+
remove_pending_guidance ~__context ~self ~value:`restart_vm
903903
) ;
904904
(* Do not clear resident_on for VM and VGPU in a checkpoint operation *)
905905
if

0 commit comments

Comments
 (0)