Skip to content

Logical connection deadlock occurs under high concurrency - outgoing data transmission stops #853

@sandersaares

Description

@sandersaares

Under load test scenarios, I observe that our client app enters a state where outgoing traffic stops flowing. Here is a PR with a test that reproduces the behavior: #852

The test is timing-dependent but reproduces the problem 100% reliably on my PC, just gets stuck at different points each time I run it.

Typical output:

Requests completed: 2 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000
Requests completed: 32 of 10000

thread 'logical_deadlock' panicked at tests\h2-tests\tests\deadlock.rs:117:17:
No requests completed in the last 2s, deadlock likely occurred

Increasing MAX_CONCURRENT_STREAMS beyond CONCURRENCY in the test file will make the test tend to pass, though I cannot say for sure that it does so always (insufficient data - the real app that was locking up did not exceed max_concurrent_streams as far as I know).

My best-effort investigation of the logs leading up to this suggests that when the timing aligns just right, we end up in a situation where flow control logic has allocated all connection capacity to requests that are in pending_open state but which never leave the pending_open state.

The leading theory is that this leads to a logical deadlock where:

  • None of the open streams have any capacity allocated to them, so their data is not being sent. They cannot get more capacity because the connection's send window is exhausted (conn=0).
  • All the streams that have assigned capacity are in pending_open state and are never leaving it (perhaps because max_concurrent_streams has been reached).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions