Skip to content

Conversation

@toillium
Copy link

@toillium toillium commented Nov 16, 2025

What does this PR do?

Makes the clickhouse client thread safe by disabling autogenerate_session_id.

Relevant docs: https://clickhouse.com/docs/integrations/language-clients/python/driver-api#multi-threaded-applications

Motivation

#21885

@toillium
Copy link
Author

cc @sarah-witt @steveny91

@codecov
Copy link

codecov bot commented Nov 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.02%. Comparing base (b40aa6f) to head (9087f7b).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@toillium toillium force-pushed the joe/disable-client-session-ids branch 2 times, most recently from 624bfd2 to f15e379 Compare November 16, 2025 18:34
@toillium toillium force-pushed the joe/disable-client-session-ids branch from f15e379 to 68d1ce7 Compare November 16, 2025 18:35
Copy link
Contributor

@AAraKKe AAraKKe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @toillium ! I am not 100% sure why this would be an issue since as far as I can tell the check runs synchronously. Would you mind sharing where do you think the multi-threaded behavior is?

The error you report seeing is

HTTPDriver for <URL> received ClickHouse error code 373\n Code: 373. DB::Exception: Session 7de98854-bff0-481b-a959-aa18f3ec9e4d is locked by a concurrent client. (SESSION_IS_LOCKED)

This means that the error is coming from the server returning 373 right? Is it possible that somehow the query launched against clickhouse is timing out somehow but we reuse the client and launch a new query against the server, causing the session already used error?

I am bit worried that by not generating the session ID we might be hiding an issue with timeouts for instance that we should be resurfacing to customers.

My Feedback Legend

Here's a quick guide to the prefixes I use in my comments:

praise: no action needed, just celebrate!
note: just a comment/information, no need to take any action.
question: I need clarification or I'm seeking to understand your approach.
nit: A minor, non-blocking issue (e.g., style, typo). Feel free to ignore.
suggestion: I'm proposing an improvement. This is optional but recommended.
request: A change I believe is necessary before this can be merged.

The only blocking comments are request, any other type of comment can be applied at discretion of the developer.

@toillium
Copy link
Author

Would you mind sharing where do you think the multi-threaded behavior is?

I think the multithreaded behavior is coming from datadog agent's query scheduling.

This means that the error is coming from the server returning 373 right? Is it possible that somehow the query launched against clickhouse is timing out somehow but we reuse the client and launch a new query against the server, causing the session already used error?

I think the issue is query A launches with session id foo, then shortly after query B launches with session id foo while query A is still running (implies multithreading is being used). Query B times out while waiting for the lock on the session id, and then query A completes sometime later.

I am bit worried that by not generating the session ID we might be hiding an issue with timeouts for instance that we should be resurfacing to customers.

I don't think session id impacts how timeouts are surfaced. My understanding is that it's only used to determine the lifespan of temporary resource on the clickhouse server (e.g. temporary tables).

@toillium toillium force-pushed the joe/disable-client-session-ids branch from 9fe1a22 to 9cc9c01 Compare November 24, 2025 15:44
@toillium toillium force-pushed the joe/disable-client-session-ids branch from 37e509e to 3b6a98e Compare November 24, 2025 16:03
@joe-clickhouse
Copy link

Hi all, Joe from clickhouse here (maintainer of clickhouse-connect). Setting autogenerate_session_id=False should indeed resolve the SESSION_IS_LOCKED errors. This makes the client "safe enough" for typical concurrent query usage. Just note that it's not FULLY thread-safe. e.g. avoid calling methods like set_client_setting() or similar concurrently with queries. But for the use case of concurrent queries without modifying client state, you should be good to go.

Hope that helps! Happy to help provide more context or answer any other clickhouse-connect questions as well.

@toillium toillium requested a review from AAraKKe November 25, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants