Skip to content

Conversation

liang-cong-red-hat
Copy link
Contributor

@liang-cong-red-hat liang-cong-red-hat commented Sep 15, 2025

Since libvirt 11.5.0, the virsh dump --live option has been disabled and now returns an error message.
This causes the test to hang while attempting to read the virsh dump pipe file's input.

The fix implements subprocess.communicate() to handle the background command interaction:
Command failure: the test aborts immediately with the error message.
Command success: original logic remains unchanged.
Note: If the background command fails after the timeout period, this likely indicates a libvirt issue, so no changes are made for now.

Note: this PR comes from the discussion of closed PR: #6478

Test result with libvirt >= 11.5.0:
(01/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.live_dump.running_state.vm_id: STARTED
(01/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.live_dump.running_state.vm_id: ERROR: Background cmd met unexpected failure of b"error: Failed to core dump domain 'avocado-vt-vm1' to /var/tmp/avocado_m85vaczv/domjobinfo.fifo\nerror: unsupported flags (0x2) in function qemuDomainCoreDumpWithFormat\n", abort the test. (21.61 s)
(02/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.live_dump.paused_state.vm_name: STARTED
(02/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.live_dump.paused_state.vm_name: ERROR: Background cmd met unexpected failure of b"error: Failed to core dump domain 'avocado-vt-vm1' to /var/tmp/avocado_4zt69ar6/domjobinfo.fifo\nerror: unsupported flags (0x2) in function qemuDomainCoreDumpWithFormat\n", abort the test. (18.84 s)
(03/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.crash_dump.running_state.vm_id: STARTED
(03/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.crash_dump.running_state.vm_id: PASS (30.01 s)
(04/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.crash_dump.paused_state.vm_uuid: STARTED
(04/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.crash_dump.paused_state.vm_uuid: PASS (29.90 s)
(05/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.keep_complete_test.running_state.vm_name: STARTED
(05/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.dump_action.keep_complete_test.running_state.vm_name: ERROR: Background cmd met unexpected failure of b"error: Failed to core dump domain 'avocado-vt-vm1' to /var/tmp/avocado_62liqx51/domjobinfo.fifo\nerror: unsupported flags (0x2) in function qemuDomainCoreDumpWithFormat\n", abort the test. (22.09 s)
(06/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.save_action.running_state.vm_id: STARTED
(06/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.save_action.running_state.vm_id: PASS (37.78 s)
(07/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.save_action.paused_state.vm_name: STARTED
(07/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.save_action.paused_state.vm_name: PASS (37.94 s)
(08/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.managedsave_action.running_state.vm_id: STARTED
(08/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.managedsave_action.running_state.vm_id: PASS (37.37 s)
(09/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.managedsave_action.paused_state.vm_name: STARTED
(09/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.normal_test.managedsave_action.paused_state.vm_name: PASS (37.46 s)
(10/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.no_name: STARTED
(10/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.no_name: PASS (26.84 s)
(11/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.shutoff_state: STARTED
(11/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.shutoff_state: PASS (5.50 s)
(12/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.with_libvirtd_stop: STARTED
(12/12) type_specific.io-github-autotest-libvirt.virsh.domjobinfo.error_test.with_libvirtd_stop: PASS (26.66 s)

Summary by CodeRabbit

  • New Features

    • None
  • Bug Fixes

    • Prevents indefinite waits by adding a bounded wait for background commands, ensures clearer error reporting on failures, and guarantees cleanup of temporary resources.
  • Tests

    • Strengthens tests by enforcing a 6-second wait limit for background operations, capturing failure output, and adding debug logging when background commands remain running.

Copy link

coderabbitai bot commented Sep 15, 2025

Walkthrough

Adds a bounded 6s wait and explicit error handling for two background virsh domjobinfo subprocess invocations in libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py: capture stderr, unlink FIFO and call test.error on non‑zero exit, log when subprocess times out, then continue existing FIFO read.

Changes

Cohort / File(s) Summary
Bounded wait & error handling for background virsh subprocess
libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py
For both background virsh invocations: call process.communicate(timeout=6) to capture stderr; on non‑zero returncode unlink the temp FIFO and call test.error with stderr; on TimeoutExpired log a debug that the background command is still running; then proceed to the existing FIFO read.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Test as Test harness
  participant Proc as virsh subprocess
  participant FIFO as Temp FIFO

  Test->>Proc: Start background virsh action
  note right of Test: New: bounded wait (timeout=6s)

  rect rgba(230,245,255,0.5)
    Test->>Proc: communicate(timeout=6)
    alt Completes within 6s
      Proc-->>Test: stdout, stderr, returncode
      alt returncode != 0
        Test->>Test: unlink FIFO
        Test->>Test: test.error(stderr)
      else returncode == 0
        Test->>FIFO: Read FIFO (existing flow)
      end
    else TimeoutExpired
      Test->>Test: Log debug "background command still running"
      Test->>FIFO: Read FIFO (existing flow)
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I waited six ticks, nose twitching with care,
I caught the soft whispers of stderr in the air.
If the child should stumble, I tidy the pipe tight,
If it lingers awhile, I log and read through the night.
A rabbit that tests, keeping errors in sight. 🐇

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 575f781 and 8b89623.

📒 Files selected for processing (1)
  • libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Python 3.11
  • GitHub Check: Python 3.12
  • GitHub Check: Python 3.8
  • GitHub Check: Python 3.9

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Comment @coderabbitai help to get the list of available commands and usage tips.

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly identifies the affected test (virsh_domjobinfo) and the primary fix—preventing the pipe-file read from getting stuck when a background virsh command fails; this directly matches the described changes that add bounded subprocess.communicate waits and error handling. It is concise, specific, and understandable for teammates scanning PR history. The wording accurately reflects the main intent of the changeset without unnecessary detail.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py (2)

196-203: Process cleanup bug: kill runs only when process already exited non‑zero

if process.poll(): is truthy only for non‑zero return codes; it never kills a still‑running process, risking a leak. Terminate when poll() is None and fall back to kill on timeout.

-    if process:
-        if process.poll():
-            try:
-                process.kill()
-            except OSError:
-                pass
+    if process and process.poll() is None:
+        try:
+            process.terminate()
+            process.wait(timeout=5)
+        except subprocess.TimeoutExpired:
+            try:
+                process.kill()
+            except OSError:
+                pass
+        except OSError:
+            pass

41-45: Avoid shell=True; build argv with shlex to remove injection risk

Inputs ultimately come from params; using shell=True is unnecessary and risky. Build argv and run with shell=False. Also log the joined command for parity.

+import shlex
@@
-        command = "virsh %s %s %s" % (action, vm_name, file)
-        logging.debug("Action: %s", command)
-        p = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE,
-                             stderr=subprocess.PIPE)
+        cmd = ["virsh"] + shlex.split(action) + [vm_name]
+        if file:
+            cmd.append(file)
+        logging.debug("Action: %s", " ".join(cmd))
+        p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

Also applies to: 1-6

🧹 Nitpick comments (4)
libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py (4)

153-160: Good fix; add readable stderr and cleanup ordering

Decode stderr for human-readable logs and unlink the FIFO before raising to avoid accidentally proceeding to open it. Minor, but makes failures clearer and safer.

-            _, stderr = process.communicate(timeout=6)
-            if process.returncode:
-                test.error('Background cmd met unexpected failure of %s, abort the test.' % stderr)
-                os.unlink(tmp_pipe)
+            _, stderr_b = process.communicate(timeout=6)
+            if process.returncode:
+                # Decode for readable logs; ensure FIFO is removed before aborting.
+                stderr_s = (stderr_b.decode(locale.getpreferredencoding(), 'ignore')
+                            if isinstance(stderr_b, (bytes, bytearray)) else str(stderr_b))
+                try:
+                    os.unlink(tmp_pipe)
+                except FileNotFoundError:
+                    pass
+                test.error(f"Background virsh command failed: {stderr_s}")

140-141: Guard against unbound f and make the check explicit

If an exception occurs before assigning f, if process and f: can raise UnboundLocalError when process is truthy. Initialize f to None and check explicitly.

-    process = None
+    process = None
+    f = None
@@
-    if process and f:
+    if process and f is not None:
         dummy = f.read()
         f.close()

Also applies to: 187-190


106-110: Make managedsave detection robust; trim action to avoid trailing‑space traps

Rely on actions value and strip the composed action. Prevents brittle string equality with trailing spaces.

-    action = ' '.join([actions, act_opt])
+    action = ' '.join([actions, act_opt]).strip()
@@
-        if action == "managedsave ":
+        if actions == "managedsave":
             process = get_subprocess(action, vm_name, "", None)
         else:
             process = get_subprocess(action, vm_name, tmp_pipe, None)

Also applies to: 147-151


58-58: Split once to avoid accidental multi‑colon key/value breakage

Safer parsing if any value contains a colon.

-        out_dict = dict([x.split(':') for x in out_list])
+        out_dict = dict([x.split(':', 1) for x in out_list])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e924cd0 and 7aa6817.

📒 Files selected for processing (1)
  • libvirt/tests/src/virsh_cmd/domain/virsh_domjobinfo.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Python 3.12
  • GitHub Check: Python 3.8
  • GitHub Check: Python 3.9
  • GitHub Check: Python 3.11

@liang-cong-red-hat liang-cong-red-hat force-pushed the domjobinfo_fix_job_stuck_pipe_error_3 branch from 7aa6817 to 48b0277 Compare September 15, 2025 04:45
@liang-cong-red-hat
Copy link
Contributor Author

@smitterl please help review.

@liang-cong-red-hat liang-cong-red-hat changed the title Fix pipe file read stuck when background virsh cmd failed virsh_domjobinfo.py: Fix pipe file read stuck when background virsh cmd failed Sep 16, 2025
…failed

Since libvirt 11.5.0, the virsh dump --live option has been disabled and now returns an error message.
This causes the test to hang while attempting to read the virsh dump pipe file's input.

The fix implements subprocess.communicate() to handle the background command interaction:
Command failure: the test aborts immediately with the error message.
Command success: original logic remains unchanged.
Note: If the background command fails after the timeout period, this likely indicates a libvirt issue, so no changes are made for now.

Signed-off-by: Liang Cong <[email protected]>
@liang-cong-red-hat liang-cong-red-hat changed the title virsh_domjobinfo.py: Fix pipe file read stuck when background virsh cmd failed virsh_domjobinfo: fix pipe file read stuck when background virsh cmd failed Sep 16, 2025
@liang-cong-red-hat liang-cong-red-hat force-pushed the domjobinfo_fix_job_stuck_pipe_error_3 branch 2 times, most recently from 575f781 to 8b89623 Compare September 16, 2025 06:52
Copy link
Contributor

@smitterl smitterl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants