Skip to content

Conversation

@bryancall
Copy link
Contributor

Problem

The test_cache_Populated_Cache unit test was failing with "Subprocess aborted" showing thousands of repeated reenable calls on the same CacheVC object. The Jenkins CI logs showed an infinite loop of reenable messages:

https://ci.trafficserver.apache.org/job/master/job/os_build/47228/console

Root Cause

In src/iocore/cache/unit_tests/main.cc, the CacheReadTest::read_event handler was calling process_event() inside the while loop that consumes data blocks:

case VC_EVENT_READ_READY: {
  while (this->_reader->block_read_avail()) {
    // ... consume block ...
    this->process_event(event);  // ❌ Called for EVERY block
  }
  break;
}

This meant reenable() was being called for every block of data, which is incorrect. When reading from a populated cache with many blocks, this resulted in excessive reenables and triggered the test abort.

The Fix

Move the process_event() call outside the while loop so reenable() is only called once per event after all available data has been consumed:

case VC_EVENT_READ_READY: {
  while (this->_reader->block_read_avail()) {
    // ... consume block ...
  }
  this->process_event(event);  // ✅ Called once after consuming all data
  break;
}

This matches the pattern used in CacheTestSM (the production regression test code) and follows standard async I/O patterns: consume all available data, then signal readiness for more.

Testing

  • test_cache_Cache - Passed (fresh cache)
  • test_cache_Populated_Cache - Passed (previously failing)

The fix reduces reenable calls from thousands to just a handful per test run.

@bryancall bryancall self-assigned this Dec 1, 2025
@bryancall bryancall added this to the 10.2.0 milestone Dec 1, 2025
@bryancall bryancall marked this pull request as draft December 1, 2025 22:29
@bryancall bryancall requested a review from cmcfarlen December 1, 2025 22:34
@bryancall bryancall force-pushed the fix-cache-test-reenable-loop branch from 60a76e2 to 771f40a Compare December 2, 2025 18:53
@bryancall bryancall marked this pull request as ready for review December 2, 2025 18:54
@bryancall bryancall requested a review from Copilot December 2, 2025 18:54
Copilot finished reviewing on behalf of bryancall December 2, 2025 18:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an infinite loop bug in the test_cache_Populated_Cache unit test caused by excessive reenable() calls. The fix moves the process_event() call from inside the data consumption loop to after the loop, ensuring proper async I/O handling that aligns with the pattern used in production code (CacheTestSM).

Key Changes:

  • Relocated process_event(event) call to execute once per VC_EVENT_READ_READY event (after consuming all available data) instead of once per data block
  • This prevents thousands of redundant reenable calls when reading from a populated cache with multiple blocks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bryancall bryancall force-pushed the fix-cache-test-reenable-loop branch from 771f40a to 58b49cf Compare December 2, 2025 19:30
Move process_event() call outside the while loop that consumes data blocks.
The original code was calling reenable() for every block of data, which is
incorrect. The correct pattern (as shown in CacheTestSM) is to consume all
available data, then call reenable() once to signal readiness for more.

This fixes the flaky test_cache_Populated_Cache test that was aborting due
to excessive reenables when reading from a populated cache with many blocks.
@bryancall bryancall force-pushed the fix-cache-test-reenable-loop branch from 58b49cf to ee1bcfa Compare December 2, 2025 19:35
@bryancall
Copy link
Contributor Author

[approve ci rocky]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant