Skip to content

fix: LangfuseTracer is not thread-safe, causing mixed traces in async environments #2188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented Aug 13, 2025

Why:

Purpose: Fixes a race condition where spans from concurrent pipeline runs become intertwined, corrupting tracing data and making debugging unreliable in production web server environments.

What:

  • Replaced instance-level self._context list with ContextVar for thread-safe span stack management
  • Updated LangfuseTracer.trace() method to use context-local span storage
  • Modified current_span() method to read from context-local state
  • Added module-level span_context_var ContextVar for isolated span contexts per execution thread
  • Fixed support for session IDs as well

How can it be used:

Nothing changes in client code except that we don't corrupt traces in concurrent async pipelines. All existing usage patterns continue to work exactly as before, but now with proper thread safety.

How did you test it:

  • Added unit tests verifying context isolation between concurrent tracer instances
  • Verified that spans are properly isolated per execution context
  • Tested examples in examples directory and itinerary agent
  • Ran existing test suite to ensure no regressions in single-threaded scenarios

Notes for the reviewer:

@LastRemote would you please test this branch on your use case

@github-actions github-actions bot added integration:langfuse type:documentation Improvements or additions to documentation labels Aug 13, 2025
@vblagoje vblagoje marked this pull request as ready for review August 17, 2025 21:00
@vblagoje vblagoje requested a review from a team as a code owner August 17, 2025 21:00
@vblagoje vblagoje requested review from mpangrazzi and removed request for a team August 17, 2025 21:00
@LastRemote
Copy link
Contributor

@vblagoje Sure thing! I had some trouble updating Haystack versions in my use cases last week, but I will try this today.

@vblagoje
Copy link
Member Author

vblagoje commented Aug 18, 2025

@vblagoje Sure thing! I had some trouble updating Haystack versions in my use cases last week, but I will try this today.

Nice, thanks. Please do. Note that this fix breaks down single long agent run into individual traces and we agreed internally not to do that but to try to keep the hierarchy as we do now in a single trace. I'll be working on this and making sure that traces from relatively regular pipeline runs, as found in examples dir for examples, and complex agent runs work ok. If you can confirm you are not experiencing error of mixed async traces as you did before that would be the missing piece of confirmation we need to integrate this one soon. Let's keep a tight loop on this issue for the next few days @LastRemote

@LastRemote
Copy link
Contributor

# Using session_id to group related traces
response = pipe.run(
    data={"prompt_builder": {"template_variables": {"location": "Paris"}, "template": messages}},
    tracer={"session_id": "user_123_conversation_456"}
)

@vblagoje Is this supported in the current main branch of haystack? I found this quote in README but I don't see the tracer parameter being mentioned elsewhere.

@vblagoje
Copy link
Member Author

# Using session_id to group related traces
response = pipe.run(
    data={"prompt_builder": {"template_variables": {"location": "Paris"}, "template": messages}},
    tracer={"session_id": "user_123_conversation_456"}
)

@vblagoje Is this supported in the current main branch of haystack? I found this quote in README but I don't see the tracer parameter being mentioned elsewhere.

Mistake, it should be tracer inside data! No not supported, I'm testing it now on this branch only

@LastRemote
Copy link
Contributor

@vblagoje No problem. I am still encountering troubles with the haystack version upgrade, given how many Agent changes we've had from 2.12 to 2.16. Luckily these should be most likely resolved today so I can kick off the test tomorrow.

For the testing part, it is a bit tricky since I would need actual traffic for this. So for the sake of simplicity I decided to create a DummyWait component that echoes the input after an hour. I will try to stress-run against this to see how it goes.

Copy link
Contributor

@mpangrazzi mpangrazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

I have one question (not strictly related to this PR) about _session_root_traces: it seems that it will grow indefinitely without cleanup. Would you see this as a problem? I am thinking e.g. at long running applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:langfuse type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrency: LangfuseTracer is not thread-safe, causing mixed traces in async environments
3 participants