virsh_domjobinfo: fix pipe file read stuck when background virsh cmd failed #6564

liang-cong-red-hat · 2025-09-15T04:36:03Z

Since libvirt 11.5.0, the virsh dump --live option has been disabled and now returns an error message.
This causes the test to hang while attempting to read the virsh dump pipe file's input.

The fix implements subprocess.communicate() to handle the background command interaction:
Command failure: the test aborts immediately with the error message.
Command success: original logic remains unchanged.
Note: If the background command fails after the timeout period, this likely indicates a libvirt issue, so no changes are made for now.

Note: this PR comes from the discussion of closed PR: #6478

Test result with libvirt >= 11.5.0:
(01/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.live_dump.running_state.vm_id: STARTED
(01/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.live_dump.running_state.vm_id: ERROR: Background cmd met unexpected failure of b"error: Failed to core dump domain 'avocado-vt-vm1' to /var/tmp/avocado_m85vaczv/domjobinfo.fifo\nerror: unsupported flags (0x2) in function qemuDomainCoreDumpWithFormat\n", abort the test. (21.61 s)
(02/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.live_dump.paused_state.vm_name: STARTED
(02/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.live_dump.paused_state.vm_name: ERROR: Background cmd met unexpected failure of b"error: Failed to core dump domain 'avocado-vt-vm1' to /var/tmp/avocado_4zt69ar6/domjobinfo.fifo\nerror: unsupported flags (0x2) in function qemuDomainCoreDumpWithFormat\n", abort the test. (18.84 s)
(03/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.crash_dump.running_state.vm_id: STARTED
(03/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.crash_dump.running_state.vm_id: PASS (30.01 s)
(04/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.crash_dump.paused_state.vm_uuid: STARTED
(04/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.crash_dump.paused_state.vm_uuid: PASS (29.90 s)
(05/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.keep_complete_test.running_state.vm_name: STARTED
(05/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.keep_complete_test.running_state.vm_name: ERROR: Background cmd met unexpected failure of b"error: Failed to core dump domain 'avocado-vt-vm1' to /var/tmp/avocado_62liqx51/domjobinfo.fifo\nerror: unsupported flags (0x2) in function qemuDomainCoreDumpWithFormat\n", abort the test. (22.09 s)
(06/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.save_action.running_state.vm_id: STARTED
(06/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.save_action.running_state.vm_id: PASS (37.78 s)
(07/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.save_action.paused_state.vm_name: STARTED
(07/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.save_action.paused_state.vm_name: PASS (37.94 s)
(08/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.managedsave_action.running_state.vm_id: STARTED
(08/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.managedsave_action.running_state.vm_id: PASS (37.37 s)
(09/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.managedsave_action.paused_state.vm_name: STARTED
(09/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.managedsave_action.paused_state.vm_name: PASS (37.46 s)
(10/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.no_name: STARTED
(10/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.no_name: PASS (26.84 s)
(11/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.shutoff_state: STARTED
(11/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.shutoff_state: PASS (5.50 s)
(12/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.with_libvirtd_stop: STARTED
(12/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.with_libvirtd_stop: PASS (26.66 s)

Summary by CodeRabbit

New Features
- None
Bug Fixes
- Prevents indefinite waits by adding a bounded wait for background commands, ensures clearer error reporting on failures, and guarantees cleanup of temporary resources.
Tests
- Strengthens tests by enforcing a 6-second wait limit for background operations, capturing failure output, and adding debug logging when background commands remain running.

coderabbitai · 2025-09-15T04:36:11Z

Walkthrough

Adds a bounded 6s wait and explicit error handling for two background virsh domjobinfo subprocess invocations in libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py: capture stderr, unlink FIFO and call test.error on non‑zero exit, log when subprocess times out, then continue existing FIFO read.

Changes

Cohort / File(s)	Summary
Bounded wait & error handling for background virsh subprocess `libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py`	For both background `virsh` invocations: call `process.communicate(timeout=6)` to capture stderr; on non‑zero `returncode` unlink the temp FIFO and call `test.error` with stderr; on `TimeoutExpired` log a debug that the background command is still running; then proceed to the existing FIFO read.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Test as Test harness
  participant Proc as virsh subprocess
  participant FIFO as Temp FIFO

  Test->>Proc: Start background virsh action
  note right of Test: New: bounded wait (timeout=6s)

  rect rgba(230,245,255,0.5)
    Test->>Proc: communicate(timeout=6)
    alt Completes within 6s
      Proc-->>Test: stdout, stderr, returncode
      alt returncode != 0
        Test->>Test: unlink FIFO
        Test->>Test: test.error(stderr)
      else returncode == 0
        Test->>FIFO: Read FIFO (existing flow)
      end
    else TimeoutExpired
      Test->>Test: Log debug "background command still running"
      Test->>FIFO: Read FIFO (existing flow)
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I waited six ticks, nose twitching with care,
I caught the soft whispers of stderr in the air.
If the child should stumble, I tidy the pipe tight,
If it lingers awhile, I log and read through the night.
A rabbit that tests, keeping errors in sight. 🐇

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 575f781 and 8b89623.

📒 Files selected for processing (1)

libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Python 3.11
GitHub Check: Python 3.12
GitHub Check: Python 3.8
GitHub Check: Python 3.9

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly identifies the affected test (virsh_domjobinfo) and the primary fix—preventing the pipe-file read from getting stuck when a background virsh command fails; this directly matches the described changes that add bounded subprocess.communicate waits and error handling. It is concise, specific, and understandable for teammates scanning PR history. The wording accurately reflects the main intent of the changeset without unnecessary detail.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py (2)

196-203: Process cleanup bug: kill runs only when process already exited non‑zero

if process.poll(): is truthy only for non‑zero return codes; it never kills a still‑running process, risking a leak. Terminate when poll() is None and fall back to kill on timeout.

-    if process:
-        if process.poll():
-            try:
-                process.kill()
-            except OSError:
-                pass
+    if process and process.poll() is None:
+        try:
+            process.terminate()
+            process.wait(timeout=5)
+        except subprocess.TimeoutExpired:
+            try:
+                process.kill()
+            except OSError:
+                pass
+        except OSError:
+            pass

41-45: Avoid shell=True; build argv with shlex to remove injection risk

Inputs ultimately come from params; using shell=True is unnecessary and risky. Build argv and run with shell=False. Also log the joined command for parity.

+import shlex
@@
-        command = "virsh %s %s %s" % (action, vm_name, file)
-        logging.debug("Action: %s", command)
-        p = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE,
-                             stderr=subprocess.PIPE)
+        cmd = ["virsh"] + shlex.split(action) + [vm_name]
+        if file:
+            cmd.append(file)
+        logging.debug("Action: %s", " ".join(cmd))
+        p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

Also applies to: 1-6

🧹 Nitpick comments (4)

libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py (4)

153-160: Good fix; add readable stderr and cleanup ordering

Decode stderr for human-readable logs and unlink the FIFO before raising to avoid accidentally proceeding to open it. Minor, but makes failures clearer and safer.

-            _, stderr = process.communicate(timeout=6)
-            if process.returncode:
-                test.error('Background cmd met unexpected failure of %s, abort the test.' % stderr)
-                os.unlink(tmp_pipe)
+            _, stderr_b = process.communicate(timeout=6)
+            if process.returncode:
+                # Decode for readable logs; ensure FIFO is removed before aborting.
+                stderr_s = (stderr_b.decode(locale.getpreferredencoding(), 'ignore')
+                            if isinstance(stderr_b, (bytes, bytearray)) else str(stderr_b))
+                try:
+                    os.unlink(tmp_pipe)
+                except FileNotFoundError:
+                    pass
+                test.error(f"Background virsh command failed: {stderr_s}")

140-141: Guard against unbound f and make the check explicit

If an exception occurs before assigning f, if process and f: can raise UnboundLocalError when process is truthy. Initialize f to None and check explicitly.

-    process = None
+    process = None
+    f = None
@@
-    if process and f:
+    if process and f is not None:
         dummy = f.read()
         f.close()

Also applies to: 187-190

106-110: Make managedsave detection robust; trim action to avoid trailing‑space traps

Rely on actions value and strip the composed action. Prevents brittle string equality with trailing spaces.

-    action = ' '.join([actions, act_opt])
+    action = ' '.join([actions, act_opt]).strip()
@@
-        if action == "managedsave ":
+        if actions == "managedsave":
             process = get_subprocess(action, vm_name, "", None)
         else:
             process = get_subprocess(action, vm_name, tmp_pipe, None)

Also applies to: 147-151

58-58: Split once to avoid accidental multi‑colon key/value breakage

Safer parsing if any value contains a colon.

-        out_dict = dict([x.split(':') for x in out_list])
+        out_dict = dict([x.split(':', 1) for x in out_list])

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e924cd0 and 7aa6817.

📒 Files selected for processing (1)

libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Python 3.12
GitHub Check: Python 3.8
GitHub Check: Python 3.9
GitHub Check: Python 3.11

liang-cong-red-hat · 2025-09-15T04:50:41Z

@smitterl please help review.

…failed Since libvirt 11.5.0, the virsh dump --live option has been disabled and now returns an error message. This causes the test to hang while attempting to read the virsh dump pipe file's input. The fix implements subprocess.communicate() to handle the background command interaction: Command failure: the test aborts immediately with the error message. Command success: original logic remains unchanged. Note: If the background command fails after the timeout period, this likely indicates a libvirt issue, so no changes are made for now. Signed-off-by: Liang Cong <[email protected]>

smitterl

LGTM, thank you

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

liang-cong-red-hat force-pushed the domjobinfo_fix_job_stuck_pipe_error_3 branch from 7aa6817 to 48b0277 Compare September 15, 2025 04:45

liang-cong-red-hat changed the title ~~Fix pipe file read stuck when background virsh cmd failed~~ virsh_domjobinfo.py: Fix pipe file read stuck when background virsh cmd failed Sep 16, 2025

liang-cong-red-hat changed the title ~~virsh_domjobinfo.py: Fix pipe file read stuck when background virsh cmd failed~~ virsh_domjobinfo: fix pipe file read stuck when background virsh cmd failed Sep 16, 2025

liang-cong-red-hat force-pushed the domjobinfo_fix_job_stuck_pipe_error_3 branch 2 times, most recently from 575f781 to 8b89623 Compare September 16, 2025 06:52

smitterl approved these changes Sep 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

virsh_domjobinfo: fix pipe file read stuck when background virsh cmd failed #6564

virsh_domjobinfo: fix pipe file read stuck when background virsh cmd failed #6564

Uh oh!

liang-cong-red-hat commented Sep 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

liang-cong-red-hat commented Sep 15, 2025

Uh oh!

smitterl left a comment

Uh oh!

Uh oh!

virsh_domjobinfo: fix pipe file read stuck when background virsh cmd failed #6564

Are you sure you want to change the base?

virsh_domjobinfo: fix pipe file read stuck when background virsh cmd failed #6564

Uh oh!

Conversation

liang-cong-red-hat commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

liang-cong-red-hat commented Sep 15, 2025

Uh oh!

smitterl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liang-cong-red-hat commented Sep 15, 2025 •

edited

Loading

coderabbitai bot commented Sep 15, 2025 •

edited

Loading