Skip to content

Add SmallTag type for more compact Dual types #748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

topolarity
Copy link

This is largely to improve printing and reduce visual noise for packages with large function types (e.g. SciML)

Compare tag = :small:

ForwardDiff.Dual{ForwardDiff.SmallTag{0x39f35d61d979c3d1}, Float64, 3}

to tag = :default:

ForwardDiff.Dual{ForwardDiff.Tag{var"#f#1"{DAEProblem{Vector{Float64}, Vector{Float64}, Tuple{Float64, Float64}, false, Vector{Float64}, DAEFunction{false, SciMLBase.FullSpecialize, typeof(f), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing}, @Kwargs{}, Vector{Bool}}, BrownFullBasicInit{Float64, Nothing}, DFBDF{5, 0, ADTypes.AutoForwardDiff{nothing, Nothing}, Nothing, OrdinaryDiffEqNonlinearSolve.NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}, Rational{Int64}}, typeof(OrdinaryDiffEqCore.DEFAULT_PRECS), Val{:forward}(), true, nothing, Nothing, Nothing}}, Float64}, Float64, 3}

@topolarity topolarity changed the title Add SmallTag type for more compute Dual types Add SmallTag type for more compact Dual types Apr 23, 2025
Copy link

codecov bot commented Apr 23, 2025

Codecov Report

Attention: Patch coverage is 86.36364% with 9 lines in your changes missing coverage. Please review.

Project coverage is 86.92%. Comparing base (c310fb5) to head (d88bf58).
Report is 21 commits behind head on master.

Files with missing lines Patch % Lines
src/config.jl 86.36% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #748      +/-   ##
==========================================
- Coverage   89.57%   86.92%   -2.66%     
==========================================
  Files          11       10       -1     
  Lines         969     1025      +56     
==========================================
+ Hits          868      891      +23     
- Misses        101      134      +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mcabbott
Copy link
Member

We could also revive #572, closed it was thought that SciML printing problems were solved in Base.

Can you comment on the comparison?

@topolarity
Copy link
Author

topolarity commented Apr 23, 2025

We could also revive #572, closed it was thought that SciML printing problems were solved in Base.

Base has type-folding now that helps many stack traces, but you still sometimes need a repr(...) of a type that you can copy-and-paste, which disables folding

Can you comment on the comparison?

The main difference is that debugging perturbation issues without the actual function / array types can be very confusing, so it seems useful to keep the original functionality around. Other than that, the implementation should be mostly equivalent (the hash is based in many cases on the objectid)

Also it's worth mentioning that tag = short does not allow the Dual / Config type to be inferred on 1.10 sadly, since hash(::Type) is only const-eval-able on 1.11+ (see JuliaLang/julia#52427 (comment))

This is an alternative to `Tag` that provides largely the same
functionality, but carries around only the hash of the function
/ array types instead of the full types themselves.

This can make these types much less bulky to print and easier to
visually scan for.
This provides a convenient interface to ask for a SmallTag.
Latest patch release seems to have tightened this up.
This old version of Julia doesn't have call-site `@inline` / `@noinline`
so version-guard against this trick for now.
@ChrisRackauckas
Copy link
Member

Chatted on this with @oscardssmith and @topolarity . This is better than using the objectid because it is consistent with respect to precompilation, i.e. in a new session the objectid can change depending on the order that you evaluate functions, while this is consistent. That plus on v1.11 this is const eval-able, and thus using this form of a short tag is type-stable and makes it so you will precompile the autodiff calls if you autodiff w.r.t. the same function.

With those two properties though, I would be highly in favor of defaulting to this on v1.11+, since it won't hurt performance or precompilation, and I cannot see a scenario where a normal user would favor the tags based on function types. It would be odd to have huge multi-screen types on v1.10 and then good stack traces on v1.11, but IMO that's not a reason to make it wait longer.

I'd like to hear @KristofferC's opinion here. But either way, I think we'd want to set this up as well in ADTypes and DifferentiationInterface @gdalle

Copy link
Member

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general idea! I think #724 and #744 should be finished and merged first though

src/config.jl Outdated
Comment on lines 27 to 31
# SmallTag is similar to a Tag, but carries just a small UInt64 hash, instead
# of the full type, which makes stacktraces / types easier to read while still
# providing good resilience to perturbation confusion.
struct SmallTag{H}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an actual docstring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it should be marked public or exported, but so should other symbols (#744)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this needs a new type at all? Could we just extend Dual's first type parameter from Union{Nothing, Tag} to perhaps Union{Nothing, Tag, Integer}?

julia> ForwardDiff.Dual(pi/2, 1)
Dual{Nothing}(1.5707963267948966,1.0)

julia> ForwardDiff.derivative(x -> @show(x), pi/2)
x = Dual{ForwardDiff.Tag{var"#37#38", Float64}}(1.5707963267948966,1.0)
1.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it does need a new type, the obvious name is HashTag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should be a type. Otherwise people might start passing 1 by accident. A UInt might be ok though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's safer as a type, so it should be a type.

@topolarity
Copy link
Author

I think #724 and #744 should be finished and merged first though

I'm not really convinced that #724 is a proper fix for the issue it claims to solve - it is still possible for the counter to overlap, e.g., between multiple precompiled packages.

I am happy to update the type exports while we're at it though

@gdalle
Copy link
Member

gdalle commented May 12, 2025

I'm not really convinced that #724 is a proper fix for the issue it claims to solve - it is still possible for the counter to overlap, e.g., between multiple precompiled packages.

I don't feel confident enough to evaluate that one, but I thought it looked related to what you did with SmallTag

Copy link
Member

@mcabbott mcabbott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review but some comments:

  • If we want to keep two methods of tagging around, is this the right way to access them? It seems quite an obscure internal detail, which should never change any results. So perhaps need not be a new keyword on every user-facing function.

  • I wonder if the implementation can be much simpler, like just storing the integer as the tag, instead of adding more structs.

src/config.jl Outdated
Comment on lines 27 to 31
# SmallTag is similar to a Tag, but carries just a small UInt64 hash, instead
# of the full type, which makes stacktraces / types easier to read while still
# providing good resilience to perturbation confusion.
struct SmallTag{H}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this needs a new type at all? Could we just extend Dual's first type parameter from Union{Nothing, Tag} to perhaps Union{Nothing, Tag, Integer}?

julia> ForwardDiff.Dual(pi/2, 1)
Dual{Nothing}(1.5707963267948966,1.0)

julia> ForwardDiff.derivative(x -> @show(x), pi/2)
x = Dual{ForwardDiff.Tag{var"#37#38", Float64}}(1.5707963267948966,1.0)
1.0

src/config.jl Outdated
Comment on lines 59 to 60
@inline function ≺(::Type{SmallTag{H1}}, ::Type{SmallTag{H2}}) where {H1,H2}
tagcount(SmallTag{H1}) < tagcount(SmallTag{H2})
Copy link
Member

@mcabbott mcabbott May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be obvious, but can you explain a bit why this is designed this way? Why calculate hash H and store that in the type, but call the generated tagcount(SmallTag{H1}) to get a different integer whenever you use it. Why not store the integer from TAGCOUNT as a type parameter directly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just to preserve the current ordering behavior for tags as closely as possible - the tag hashes are not sorted chronologically, so they would make a very different order

@gdalle
Copy link
Member

gdalle commented May 13, 2025

Also, am I correct that this doesn't make perturbation confusion impossible, only vanishingly unlikely?

@oscardssmith
Copy link
Member

@gdalle that is correct. With a 64 bit hash, we have a 50% collision chance once you get to 4 billion different tags (that are used in the same operation), but as long as your number of tags is below millions (which should always be the case), the chance of confusion is incredibly unlikely

@oscardssmith
Copy link
Member

IMO we should merge this.

@mcabbott
Copy link
Member

Would someone care to answer this, and related comments above?

It seems quite an obscure internal detail, which should never change any results. So perhaps need not be a new keyword on every user-facing function.

@ChrisRackauckas
Copy link
Member

Also, am I correct that this doesn't make perturbation confusion impossible, only vanishingly unlikely?

If someone happens to be doing ForwardDiff of the same function against a few million different arguments at the same time, then there would be an extremely small probability of a perturbation confusion.

@ChrisRackauckas
Copy link
Member

I don't feel confident enough to evaluate that one, but I thought it looked related to what you did with SmallTag

No I don't think it's related. This PR just uses the current way for the tag incrementer, which could potentially have an issue with some odd multithreading cases, but that PR also isn't a proper fix. But it's pretty unrelated since the current code already has this issue, and so there's no difference in this respect between default and smalltag, just whomever does end up wanting to do something about the multithreading issue will need to hit these two functions. I don't think that's a major maintenance burden to stop a PR like this.

@ChrisRackauckas
Copy link
Member

This was heavily discussed with about 30 people at JuliaCon. Basically I made a "SmallTag triage" JuliaCon Hackathon discussion. Everyone was really excited that there was a fix that was already made and the question was "why is this not merged"? The general consensus about certain topics in here was:

  1. It should just default to the shorter version. The issue at hand is that the shorter tag type is not inferred and relocatable (?) on v1.10, as it uses some v1.11 optimizations in order to be fully efficient. But there was a universal agreement to just turn it into the default for v1.11+ and use the longer tag on v1.10, Lillith response "why would you wait for a new LTS before anyone can get a better stacktrace?" and cited a bunch of Base examples that are similar printing improvements. The table concurred.
  2. We agreed with @mcabbott's review that using tag conflicts with the other way that setting tag is used in the repo, so using @topolarity @oscardssmith and I agreed that tagstyle is a reasonably unambiguous way to do this.
  3. Somewhat with part (1), there was pretty universal disagreement with having this be an opt-in Preferences thing, since pretty much everyone wanted shorter ForwardDiff stack traces and nobody thought the Preferences would be discoverable since these stack traces are typically not seen by users of ForwardDiff, but by downstream users of packages that use ForwardDiff. Thus there was an inversion of concerns: people who would be the most confused by long stack traces, have no idea AD is even doing, and least likely to know how to switch them off would be who we would be asking to set Preferences. It violates the principle of "Make the simple things simple and the complex things possible". So if there was a Preferences thing, it would just be to opt-out, which would be a developer tool, though it's not quite clear who is the developer this is helping. So if a specific developer of ForwardDiff wants this, they can follow up with that PR for their workflows but since it does not actually effect downstream users it was considered to be an unnecessary hurdle for now.

So with this, I think we have answered @gdalle's questions (mostly seemed to just be questions rather than concerns) and @topolarity is making some of these final changes based on @mcabbott's requests, and then two JuliaDiff maintainers (@oscardssmith and I) are sitting together with @topolarity and the community and when it all seems good plan to merge.

@ChrisRackauckas
Copy link
Member

I wonder if the implementation can be much simpler, like just storing the integer as the tag, instead of adding more structs.

The three of us agreed that using a tag type is more clear and less likely to run into a potential issue. One potential issue is someone putting a type parameter in the wrong spot, so trying to specify a chunksize=1 and instead accidentally setting a tag to 1, and then opening themselves up to perturbation confusion. So using a type and UInt just is a nice safety measure.

We note that the exact tag was considered by everyone to just be an implementation detail and not public API anyone should be relying on, so it wouldn't be a breaking change to change this to just an Int in the future. That could be a discussion we can continue, but it doesn't seem like there was much support for that idea.

@ChrisRackauckas
Copy link
Member

If we want to keep two methods of tagging around, is this the right way to access them? It seems quite an obscure internal detail, which should never change any results. So perhaps need not be a new keyword on every user-facing function.

@topolarity noted that v1.10 cannot compute a stable hash at compile time, and this means that these hashes cannot be inferred. Because of this, it would not make a good default on v1.10. It should be considered to be removed after the next LTS bump, but as mentioned in the previous comment, the general Julia user did not believe that we should wait for another LTS in order to make this the default and so we would need to live with the two separate defaults for at least the short term future.

@ChrisRackauckas
Copy link
Member

The reason for the keyword argument is due to the potential of hash collision, this gives a method to flip to the version for which it's debuggable without that potential issue. Though as mentioned, it's vanishingly unlikely, but we should at least move with some care to start and give people an easy way to diagnose if it does cause an issue.

@mcabbott
Copy link
Member

person doing the autodiff should have control

Can you explain what you mean, who are these multiple people? You are envisaging that libraries which call ForwardDiff may want to choose not to use the hash one on their internal calls, even if their end users do? But I don't see why.

If this thing is safe enough to turn on for everyone, and I buy that it is, then why must we inflict clutter on them to make it easy to turn off? If it's not so safe and is an experimental feature, that's different.

@ChrisRackauckas
Copy link
Member

Defaulting to this thing (which really still ought to be called HashTag not SmallTag) seems fine.

Changed to HashTag.

If there really has to be a user-facing keyword, it can be hashtag::Bool not tagstyle::Union{Symbol,Nothing} = :default, which is weird & nonstandard. But...

So you want it a bool? Let's just figure out what you want. This is going in circles. Should we just get on a call? I can now make it a bool if that makes you happy.

If this thing is safe enough to turn on for everyone, and I buy that it is, then why must we inflict clutter on them to make it easy to turn off? If it's not so safe and is an experimental feature, that's different.

Its safe for everyone. I don't know what you're getting at.

@mcabbott
Copy link
Member

mcabbott commented Jul 26, 2025

It's not going in circles. I'm against clutter, especially user-facing clutter.

If this is safe for everyone, then I don't understand the desire for every user-facing function to gain a new keyword to control it, and every user-facing docstring to gain a description of this new keyword's states. Leaving an obscure way to change the internal behaviour back in place seems preferable to that.

@ChrisRackauckas
Copy link
Member

Okay so no keyword argument now? I can do that.

@mcabbott
Copy link
Member

I am keen not to drive the code in circles. Switching from the keyword to Preferences is some work, and if there are solid arguments against it, better to figure them out before coding it. I just haven't seen them made here.

@ChrisRackauckas
Copy link
Member

Okay, is this form what you were looking for? I'm trying triangulate the 3 difference responses of (1) make it a bool, (2) make it as simple as possible because it's safe so just do it and (3) use Preferences, each of which are competing. I think the simplest version is all that's needed. If it's safe and what everyone should use, does there need to be a Preferences?

By removing the option, which is effectively reverting the test changes because now the option is gone and what the tests added was tests that the option worked. Because the tests already have inference testing, even the inference test is unnecessary to keep so we're good.
@ChrisRackauckas
Copy link
Member

So now the option is gone, the tests which use the option are gone (so basically a revert of that), and the docstring changes that reference the new option are gone. Are we good?

@ChrisRackauckas
Copy link
Member

Ahh actually that extra inline stuff can go...

@mcabbott
Copy link
Member

mcabbott commented Jul 26, 2025

We can be civil and discuss pros and cons of all options. Ideally before coding each of them in sequence. I'm against unnecessary user-facing clutter, but open to persuasion about what's necessary.

Maybe a bool keyword isn't in fact a great idea, as it used to accept 3 options (nothing, :default, :small) but then 4 (nothing, :default meaning 1.10 vs 1.11 auto, :type and :hash).

So that leaves

  • delete old Tag completely
  • keep both but switch based on Julia version
  • keep both and switch with Preferences
  • keep both and switch via keywords everywhere

I still think that the last option is the worst. Although I'm not sure I understand yet what argument Oscar was making for it above.

Paging @devmotion and @KristofferC for their thoughts? This isn't a tiny change.

@ChrisRackauckas
Copy link
Member

delete old Tag completely

This was the other PR, which never merged because of relocatability issues and inference on v1.10, so that's a no.

keep both and switch via keywords everywhere

This one is a no and was already removed.

keep both but switch based on Julia version

This is what the PR doing. Is that fine?

@ChrisRackauckas
Copy link
Member

And I have also added a Preferences version. So from the list:

  • delete old Tag completely
  • keep both but switch based on Julia version
  • keep both and switch with Preferences
  • keep both and switch via keywords everywhere

I am still confused which one you want, but we now have an implementation of all 4 versions. Please let me know which version you prefer and we can just take that commit.

@KristofferC
Copy link
Collaborator

KristofferC commented Jul 26, 2025

Firstly, stuff like

This was heavily discussed with about 30 people at JuliaCon. Everyone was really excited that there was a fix that was already made and the question was "why is this not merged"?

is not a great way to go about things. It puts unfair pressure on the maintainers and it is basically emotional manipulation to merge something because you don't want to be the party pooper against all those "30 people" in a room cherring in champagne about how great this PR is.

Secondly, this PR is really hard to review, it has like a gazillion of Update commits etc. If you want to work on this on the hackathon I suggest first cleaning up the PR, make the commits nice etc and then we can take another shot at looking at this objectively.

clean new dispatches out

Update src/config.jl
Update src/config.jl
Update src/config.jl
Update src/config.jl
Update src/config.jl
Update src/config.jl
Update src/config.jl
Add a preferences version

Update src/config.jl
Update src/config.jl
@ChrisRackauckas
Copy link
Member

Secondly, this PR is really hard to review, it has like a gazillion of Update commits etc. If you want to work on this on the hackathon I suggest first cleaning up the PR, make the commits nice etc and then we can take another shot at looking at this objectively.

It was a moving target. First I was told that using the keyword argument tag is not the right name, so it became tagstyle. Then I was told that there shouldn't be a keyword argument. Then I was told it should be a Preference. I am happy to do any form of this that anyone wants, but can you please pick one? There are 4 forms:

  • delete old Tag completely
  • keep both but switch based on Julia version
  • keep both and switch with Preferences
  • keep both and switch via keywords everywhere

Which do you want? We will just do that and then squash. But if every single time the PR is updated it's changed to another then it's going to be a mess. So please let me know which you would be happy with and we will squash to that version.

@ChrisRackauckas
Copy link
Member

Bump @mcabbott is this the form you think is optimal?

@topolarity
Copy link
Author

topolarity commented Jul 27, 2025

I do think it's worth discussing the pros and cons as @mcabbott suggested

There's no need to rush this PR - IIUC we're on the same page that we probably want a change like this, so we just have to discuss and convince ourselves of the details (which seems fair to me)

Tackling that:

So that leaves

  • delete old Tag completely
  • keep both but switch based on Julia version
  • keep both and switch with Preferences
  • keep both and switch via keywords everywhere

My personal preference is a combination of (4) and (2), but (3) seems OK too

I expect the majority of end users (>90%) will not change this setting, via Preferences or otherwise, so I agree we need good defaults. The hash is probably not a good default on 1.10, because it unfortunately makes the tag constructor type-unstable. However, it does seem like a good choice on 1.11+

My feelings are less strong about the user config piece (happy to defer to your judgment @mcabbott ), but I'm generally reluctant to add preferences for things that are likely to be "hidden" concerns for an end-user. In this case, a lot of indirect users of ForwardDiff exist that have no preference / understanding for the perturbation system / issue, so it seems strange to expect them to know that this preference exists and that they will set it.

Moving it to preferences also makes it impossible for a package doing its own AD calculations to have an opinion about perturbation tracking (even if, e.g., that package hits a perturbation bug of some kind and would like to be able to code in a fix for its users)

All of that said, it's true that we mostly expect these two options to have similar perturbation tracking capability. I can definitely see the argument that the preference really only affects display / diagnostics, which is arguably in the end user's "domain"

Anyway I'm OK with either direction for config

Hopefully that at least clarifies the original intent before this PR was hackathon'd

@topolarity
Copy link
Author

topolarity commented Jul 27, 2025

Maybe the way to see it is that the kwarg was trying to simplify the existing "tag control" in the API. That way, a direct user of ForwardDiff only has to choose from a set of options, instead of constructing a tag themselves like they'd do now if they want to tweak tag behavior.

If we'd prefer the original interface though that's fine - packages can still pass in the other tag type if they really want to control this aspect of tag behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants