Skip to content

[Bug] Webhook Service Down Causes Async /predictions to Fail & Blocks Cancellation with exceptions ConnectionRefusedError and MaxRetryError #2229

@raccoon70

Description

@raccoon70

Webhook Service Down Causes Async /predictions to Fail & Blocks Cancellation

Issue Summary

When the webhook service is down, an async /predictions request that includes a "webhook" input leads to a ConnectionRefusedError and MaxRetryError. Additionally, when this happens, the health check status reports "BUSY", and the request cannot be canceled via the cancel API.

Steps to Reproduce

  1. Ensure the webhook service is unavailable or down.
  2. Submit an async /predictions request that includes a webhook as an input.
  3. Observe the logs, which show a ConnectionRefusedError and MaxRetryError.
  4. Check the health check API, which reports "BUSY".
  5. Attempt to cancel the request using the cancel API—this fails.

Expected Behavior

  • The request should fail gracefully if the webhook service is down.
  • The health check should not get stuck on "BUSY".
  • The cancel API should allow the request to be canceled successfully.

Actual Behavior

  • The request fails with a ConnectionRefusedError and MaxRetryError.
  • Health check status remains "BUSY", preventing new requests.
  • The cancel API does not cancel the stuck request.

Relevant Logs

{"logger": "cog.server.webhook", 
"timestamp": "2025-03-26T20:28:12.501900Z", 
"exception": "Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/urllib3/connection.py", line 196, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 495, in _make_request
    conn.request(
  File "/usr/local/lib/python3.12/site-packages/urllib3/connection.py", line 398, in request
    self.endheaders()
  File "/usr/local/lib/python3.12/http/client.py", line 1331, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.12/http/client.py", line 1091, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.12/http/client.py", line 1035, in send
    self.connect()
  File "/usr/local/lib/python3.12/site-packages/urllib3/connection.py", line 236, in connect
    self.sock = self._new_conn()
                ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connection.py", line 211, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f0ce2f16090>: 
Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=9000): Max retries exceeded with url: /test 
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0ce2f16090>: 
Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/cog/server/webhook.py", line 61, in caller
    default_session.post(webhook, json=dict_response)
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9000): Max retries exceeded with url: /test 
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0ce2f16090>: 
Failed to establish a new connection: [Errno 111] Connection refused'))",
"severity": "WARNING", 
"message": "caught exception while sending webhook"
}

Environment

  • Python Version: 3.12
  • Cog Version: 0.14.4
  • Libraries Involved: urllib3, requests

Let me know if more information is needed. Thanks in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions