-
-
Notifications
You must be signed in to change notification settings - Fork 311
Description
Under load test scenarios, I observe that our client app enters a state where outgoing traffic stops flowing. Here is a PR with a test that reproduces the behavior: #852
The test is timing-dependent but reproduces the problem 100% reliably on my PC, just gets stuck at different points each time I run it.
Typical output:
Requests completed: 2 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
thread 'logical_deadlock' panicked at tests\h2-tests\tests\deadlock.rs:117:17:
No requests completed in the last 2s, deadlock likely occurred
Increasing MAX_CONCURRENT_STREAMS
beyond CONCURRENCY
in the test file will make the test tend to pass, though I cannot say for sure that it does so always (insufficient data - the real app that was locking up did not exceed max_concurrent_streams
as far as I know).
My best-effort investigation of the logs leading up to this suggests that when the timing aligns just right, we end up in a situation where flow control logic has allocated all connection capacity to requests that are in pending_open
state but which never leave the pending_open
state.
The leading theory is that this leads to a logical deadlock where:
- None of the open streams have any capacity allocated to them, so their data is not being sent. They cannot get more capacity because the connection's send window is exhausted (
conn=0
). - All the streams that have assigned capacity are in
pending_open
state and are never leaving it (perhaps becausemax_concurrent_streams
has been reached).