Skip to content

Conversation

@penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Nov 1, 2025

Now that Libtask works on 1.12, I think we can and should run CI on it.

Mooncake can be disabled for the time being (it will still be tested in on 1.10).


# Skip Mooncake on 1.12 as it is not compatible yet
const INCLUDE_MOONCAKE = VERSION >= v"1.12"
const INCLUDE_MOONCAKE = VERSION < v"1.12"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly mistake I made previously.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2025

Turing.jl documentation for PR #2707 is available at:
https://TuringLang.github.io/Turing.jl/previews/PR2707/

@codecov
Copy link

codecov bot commented Nov 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.45%. Comparing base (bfa61d8) to head (a93fb35).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2707   +/-   ##
=======================================
  Coverage   86.45%   86.45%           
=======================================
  Files          21       21           
  Lines        1418     1418           
=======================================
  Hits         1226     1226           
  Misses        192      192           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@penelopeysm
Copy link
Member Author

@mhauru Do you know if the test failure is expected?

@mhauru
Copy link
Member

mhauru commented Nov 3, 2025

Nope, haven't seen that before. I hope this isn't indeterministic to reproduce. Looks like MCMCThreads gets Libtask tasks running on multiple threads, and two happened to write to the global MistyClosures cache within Libtask at the same time. Maybe need to do some locks or Atomic stuff with that global cache.

@mhauru
Copy link
Member

mhauru commented Nov 3, 2025

Wanna try to make a little MWE by creating tasks on multiple threads, or shall I? My (sad) prediction is that the CI failure will go away if you rerun it and get a bit lucky.

@penelopeysm
Copy link
Member Author

penelopeysm commented Nov 3, 2025

I'm struggling to make a Libtask-only MWE, this doesn't work:

using Libtask
function task(i)
    # define a new method, ensuring that Libtask has to construct
    # a new MistyClosure and add it to the cache
    function f(::Val{j}) where {j}
        produce(j)
        return nothing
    end
    # construct a TapedTask for it
    return TapedTask(nothing, f, Val(i))
end
fetch.([Threads.@spawn task(i) for i in 1:200])

This is doing concurrent writes to Libtask.mc_cache, but it's not picked up by Julia's base library because only certain code paths trigger the detection of concurrent writes (https://github.com/JuliaLang/julia/blob/ba1e628ee49351af0b704afd2b2903d253bd3564/base/dict.jl#L182).

In fact even this doesn't trigger it:

d = Dict{Int,Int}()
empty!(d)
function seti(i)
    d[i] = i
end
fetch.([Threads.@spawn seti(i) for i in 1:1000000])

Will continue trying.

@penelopeysm
Copy link
Member Author

Patching that Base.rehash! method by adding a 0.1 second sleep inside allows me to replicate the error with just Libtask. Obviously, that doesn't constitute an MWE. But I think the important thing is to have some kind of function with a complicated enough cache key, such that the middle bit of rehash! takes long enough to cause a race condition.

@penelopeysm
Copy link
Member Author

@penelopeysm
Copy link
Member Author

So the real underlying problem (concurrent writes to Dicts) can occur on both 1.11 and 1.12, which of course leads to the question of why does this error on 1.12 but not 1.11? I assume that there's some performance regression on hash in 1.12 that causes this error to manifest, and we were lucky enough that 1.11 was fast enough to not error, but 1.12 wasn't. Obviously this should still be fixed in Libtask, but just interesting to note.

@mhauru
Copy link
Member

mhauru commented Nov 4, 2025

New Libtask version should fix this, rerunning the failing CI.

@penelopeysm penelopeysm requested a review from mhauru November 4, 2025 13:58
@penelopeysm
Copy link
Member Author

Gibbs on Windows takes a full hour, meh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants