Skip to content

Conversation

@jdenny-ornl
Copy link
Collaborator

@jdenny-ornl jdenny-ornl commented Nov 12, 2025

Before PR #152775, llvm::getLoopEstimatedTripCount never returned 0. If llvm::setLoopEstimatedTripCount were called with 0, it would zero branch weights, causing llvm::getLoopEstimatedTripCount to return std::nullopt.

PR #152775 changed that behavior: if llvm::setLoopEstimatedTripCount is called with 0, it sets llvm.loop.estimated_trip_count to 0, causing llvm::getLoopEstimatedTripCount to return 0. However, it kept documentation saying llvm::getLoopEstimatedTripCount returns a positive count.

Some passes continue to assume llvm::getLoopEstimatedTripCount never returns 0 and crash if it does, as reported in issue #164254. To restore the behavior they expect, this patch changes llvm::getLoopEstimatedTripCount to return std::nullopt when llvm.loop.estimated_trip_count is 0.

The premise of this patch is that an estimated trip count of 0 is
always invalid.  Before PR #152775, `llvm::getLoopEstimatedTripCount`
never returned 0.  PR #152775 changed that behavior but kept
documentation saying it returns a positive count.  Some passes
continue to rely on the previous behavior, as reported in issue
 #164254.  And yet some passes call `llvm::setLoopEstimatedTripCount`
with a value of 0.

To understand why it seems like an estimated trip count can be 0 but
cannot, consider the example of LoopPeel.  Given a loop with an
estimated trip count of 10, if LoopPeel peels 2 iterations, it seems
reasonable that the remaining loop will have an estimated trip count
of 8.  However, what should the remaining loop's estimated trip count
be when peeling 10 iterations?  What about 50?

Naively, it seems like the answers are 0 and -40, respectively.  But
neither is valid.  Recall that we are talking about estimates.  That
means, the probability is likely *low* but not 0 that execution will
reach iteration 11, iteration 51, or the remaining loop.  In the
unlikely case that it does reach them, it executes them.  In other
words, if execution reaches the loop header, at least one iteration of
the remaining loop executes, and the probability is likely low that
more will execute.  Thus, a pass like LoopPeel might naively calculate
that the remaining loop's estimated trip count is 0, but it must be at
least 1.

We could try to ensure that all passes never set the estimated trip
count as 0.  For now, this patch instead:

- Asserts that `llvm.loop.estimated_trip_count` never ends up as 0.
- If `EstimatedloopInvocationWeight` is not specified, adjusts
  `llvm::setLoopEstimatedTripCount` to convert 0 to 1.
- If `EstimatedloopInvocationWeight` is specified, adjusts
  `llvm::setLoopEstimatedTripCount` to set zeroed branch weights and
  remove any `llvm.loop.estimated_trip_count`.  The effect is that
  `llvm::getLoopEstimatedTripCount` will return `std::nullopt`.  For
  passes that still use `EstimatedloopInvocationWeight`, this patch
  thus restores the behavior from before PR #152775.  Eventually, no
  passes should use `EstimatedloopInvocationWeight`.
@jdenny-ornl
Copy link
Collaborator Author

I have put this in a draft state until I receive confirmation that it fixes #164254.

@jdenny-ornl
Copy link
Collaborator Author

If EstimatedloopInvocationWeight is not specified, adjusts llvm::setLoopEstimatedTripCount to convert 0 to 1.

After thinking about it for another night, I believe that approach is wrong.

If EstimatedloopInvocationWeight is specified, [...] The effect is that llvm::getLoopEstimatedTripCount will return std::nullopt. For passes that still use EstimatedloopInvocationWeight, this patch thus restores the behavior from before PR [PGO] Add llvm.loop.estimated_trip_count metadata #152775. Eventually, no passes should use EstimatedloopInvocationWeight.

I think we should instead continue that approach, even when EstimatedloopInvocationWeight is not specified.

For example, LoopPeel sometimes calls llvm::getLoopEstimatedTripCount to decide how many iterations to peel. If LoopPeel is run repeatedly, it should not keep peeling 1 iteration from the remaining loop. It should stop when it has peeled off the original estimated trip count.

Besides, it is not my goal right now to tweak estimated trip counts and their impact on passes. My goal is to move them out of the way without disturbing them so I can fix BFI.

So I am thinking of changing this patch to:

  • Let llvm::setLoopEstimatedTripCount continue to store 0 when told to.
  • Adjust llvm::getLoopEstimatedTripCount to return std::nullopt for that case.

Thoughts?

@mtrofin
Copy link
Member

mtrofin commented Nov 13, 2025

If EstimatedloopInvocationWeight is not specified, adjusts llvm::setLoopEstimatedTripCount to convert 0 to 1.

After thinking about it for another night, I believe that approach is wrong.

If EstimatedloopInvocationWeight is specified, [...] The effect is that llvm::getLoopEstimatedTripCount will return std::nullopt. For passes that still use EstimatedloopInvocationWeight, this patch thus restores the behavior from before PR [PGO] Add llvm.loop.estimated_trip_count metadata #152775. Eventually, no passes should use EstimatedloopInvocationWeight.

I think we should instead continue that approach, even when EstimatedloopInvocationWeight is not specified.

For example, LoopPeel sometimes calls llvm::getLoopEstimatedTripCount to decide how many iterations to peel. If LoopPeel is run repeatedly, it should not keep peeling 1 iteration from the remaining loop. It should stop when it has peeled off the original estimated trip count.

Besides, it is not my goal right now to tweak estimated trip counts and their impact on passes. My goal is to move them out of the way without disturbing them so I can fix BFI.

I thought the loop count metadata doesn't impact BFI, just branch weights do?

So I am thinking of changing this patch to:

  • Let llvm::setLoopEstimatedTripCount continue to store 0 when told to.
  • Adjust llvm::getLoopEstimatedTripCount to return std::nullopt for that case.

Thoughts?

@mtrofin
Copy link
Member

mtrofin commented Nov 13, 2025

Can we use "unknown" instead of zeroed branch weights? I'm pondering if making all-zero branch weights invalid. The reason is that "0" is a value that can come out of unchecked math. "unknown" is not.

@jdenny-ornl
Copy link
Collaborator Author

jdenny-ornl commented Nov 13, 2025

Besides, it is not my goal right now to tweak estimated trip counts and their impact on passes. My goal is to move them out of the way without disturbing them so I can fix BFI.

I thought the loop count metadata doesn't impact BFI, just branch weights do?

Right. The purpose of that metadata, introduced by PR #152775, is to move estimated trip counts out of branch weights so I can fix BFI issues.

But PR #152775 broke a guarantee of getLoopEstimatedTripCount, and that's what the current PR is trying to fix. But I realized this morning the current PR doesn't fully restore the old behavior because it converts 0 to 1. I am working on a new version of the current PR.

Can we use "unknown" instead of zeroed branch weights? I'm pondering if making all-zero branch weights invalid. The reason is that "0" is a value that can come out of unchecked math. "unknown" is not.

That makes sense to me, but not in the current PR.

Branch weights are zeroed by setLoopEstimatedTripCount [edit: branch weight zeroing was happening before my patches] only if EstimatedloopInvocationWeight is specified. A goal of my BFI patch series is to stop using EstimatedloopInvocationWeight because it's what conflates branch weights and estimated trip counts and thus breaks BFI. Let's fix that there not in the current PR, which should stay focused on not breaking estimated trip count behavior.

Does that make sense?

@mtrofin
Copy link
Member

mtrofin commented Nov 13, 2025

Besides, it is not my goal right now to tweak estimated trip counts and their impact on passes. My goal is to move them out of the way without disturbing them so I can fix BFI.

I thought the loop count metadata doesn't impact BFI, just branch weights do?

Right. The purpose of that metadata, introduced by PR #152775, is to move estimated trip counts out of branch weights so I can fix BFI issues.

But PR #152775 broke a guarantee of getLoopEstimatedTripCount, and that's what the current PR is trying to fix. But I realized this morning the current PR doesn't fully restore the old behavior because it converts 0 to 1. I am working on a new version of the current PR.

Can we use "unknown" instead of zeroed branch weights? I'm pondering if making all-zero branch weights invalid. The reason is that "0" is a value that can come out of unchecked math. "unknown" is not.

That makes sense to me, but not in the current PR.

Branch weights are zeroed by setLoopEstimatedTripCount [edit: branch weight zeroing was happening before my patches] only if EstimatedloopInvocationWeight is specified. A goal of my BFI patch series is to stop using EstimatedloopInvocationWeight because it's what conflates branch weights and estimated trip counts and thus breaks BFI. Let's fix that there not in the current PR, which should stay focused on not breaking estimated trip count behavior.

Does that make sense?

Ah, existing behavior - got it, and makes sense - thanks!

@jdenny-ornl
Copy link
Collaborator Author

So I am thinking of changing this patch to:

* Let `llvm::setLoopEstimatedTripCount` continue to store 0 when told to.

* Adjust `llvm::getLoopEstimatedTripCount` to return `std::nullopt` for that case.

I've rewritten the PR to do that, and I've rewritten the PR initial comment to reflect the change. The new version is significantly simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants