Skip to content

Conversation

@marc-chevalier
Copy link
Member

@marc-chevalier marc-chevalier commented Nov 4, 2025

Analysis

Obervationally

IGVN

During IGVN, in PhiNode::Value, a PhiNode has 2 inputs. Their types are:

in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4))
in(2): java/lang/Object * (speculative=null)

We compute the join (HS' meet):

const Type *t = Type::TOP; // Merged type starting value
for (uint i = 1; i < req(); ++i) {// For all paths in
// Reachable control path?
if (r->in(i) && phase->type(r->in(i)) == Type::CONTROL) {
const Type* ti = phase->type(in(i));
t = t->meet_speculative(ti);
}
}

t=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *)

But the current _type (of the PhiNode as a TypeNode) is

_type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *)

We filter t by _type

const Type* ft = t->filter_speculative(_type); // Worst case type

and we get

ft=java/lang/Object *

which is what we return. After the end of Value, the returned becomes the new PhiNode's _type.

const Type* t = k->Value(this);
assert(t != nullptr, "value sanity");
// Since I just called 'Value' to compute the set of run-time values
// for this Node, and 'Value' is non-local (and therefore expensive) I'll
// cache Value. Later requests for the local phase->type of this Node can
// use the cached Value instead of suffering with 'bottom_type'.
if (type_or_null(k) != t) {
#ifndef PRODUCT
inc_new_values();
set_progress();
#endif
set_type(k, t);
// If k is a TypeNode, capture any more-precise type permanently into Node
k->raise_bottom_type(t);

and
void Node::raise_bottom_type(const Type* new_type) {
if (is_Type()) {
TypeNode *n = this->as_Type();
if (VerifyAliases) {
assert(new_type->higher_equal_speculative(n->type()), "new type must refine old type");
}
n->set_type(new_type);

Verification

On verification, in(1), in(2) have the same value, so does t. But this time

_type=java/lang/Object *

and so after filtering t by (new) _type and we get

ft=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *)

which is retuned. Verification gets angry because the new ft is not the same as the previous one.

But why?!

Details on type computation

In short, we are doing

t = typeof(in(1)) \/ typeof(in(2))
ft  = t /\ _type (* IGVN *)
ft' = t /\ ft    (* Verification *)

and observing that ft != ft'. It seems our lattice doesn't ensure (a /\ b) /\ b = a /\ b which is problematic for this kind of verfication that will just "try again and see if something change".

To me, the surprising fact was that the intersection

java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *)
/\
_type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *)
~>
java/lang/Object *

What happened to the speculative type? Both MyValue2 and MyValue3 are inheriting MyAbstract (and implementing MyInterface). So the code correctly find that the intersection of these speculative type is

compiler/valhalla/inlinetypes/MyAbstract (compiler/valhalla/inlinetypes/MyInterface):AnyNull * (flat in array),iid=top

The interesting part is that it's AnyNull: indeed, what else is a MyValue2 and MyValue3 at the same time? And then, above_centerline decides it's not useful enough (too precise, too clone from HS' top/normal bottom) and remove the speculative type.

if (above_centerline(speculative()->ptr())) {
return no_spec;
}

But on the verification run, compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * is intersected with the speculative type of java/lang/Object *, which is unknown (HS' bottom/normal top), so we are simply getting MyValue2. If we did not discard AnyNull using above_centerline, we would have the intersection of MyValue2 and AnyNull, giving AnyNull, which is indeed stable.

Ok, but the types are weird?

Indeed, they are! How can we get a speculative type MyValue3 on the PhiNode when inputs are both Object, and one is speculated to be a MyValue2? This comes from incremental inlining. It seems that we have some profiling information on the returned type of a callee, that happens to be MyValue3, which propagate to the PhiNode. Later, the callee is inlined, and we get new type information (MyValue2) from its body (from the returned type of a callee of our callee, if I remember well), that reaches the input of our PhiNode.

Reproducing

In Valhalla

This crash is quite rare because:

  1. it needs a specific speculative type setup, which depends heavily on timing
  2. if PhiNode::Value is called a second time, it will stabilize the _type field before verification.

To limitate the influence of 2., I've tested with an additional assert that would immediately do

const Type* ft_ = t->filter_speculative(ft);

in PhiNode::Value and compare ft and ft_. Indeed, we are never sure a run of Value is not the last one: it should always be legal to stop anywhere (even if in a particular case), it was going to run further.

With this extra check, the crash a bit more common, but still pretty rare. Tests that have been witness to crash then at least once:

  • compiler/valhalla/inlinetypes/TestCallingConvention.java
  • compiler/valhalla/inlinetypes/TestIntrinsics.java
  • compiler/valhalla/inlinetypes/TestArrays.java
  • compiler/valhalla/inlinetypes/TestBasicFunctionality.java

All in compiler/valhalla/inlinetypes while I was also testing with mainline tests. Suspicious, uh.

In mainline

With the aforementioned extra check, I've tried to see if it could happen on mainline since the involved code seems not to be valhalla-specific. As we could expect given that only valhalla tests have been seen to crash, no such crash at all.

Crafting an example

I've tried to craft an example that would create a similar situation, but without luck. I never managed to reach a correct setup of incremental inlining, conflicting type profiling...

Fixing

I think changing the type system would be quite risky: it is all over the place. Also, fixing would require not to drop the speculative type when above_centerline. This is not like something missing or a corner case that one didn't think of, it's rather removing a feature that is explicitly here, so probably on purpose.

As a first approach, one could simply run filter_speculative twice, that should be enough as the second filter will simply select the non empty speculative type if there is only one, and this one won't be above_centerline, or it would not exist as a speculative type already.

To try to be a bit less aggressive, we can rather do that in case where we know it cannot be useful. If ft obtained from filter_speculative has no speculative type, but t has, we can know that it might be because it has been dropped, and computing t->filter_speculative(ft) could pick the speculative type of t. The speculative type can still be removed if the non-speculative type of ft is exact and non null for instance, but we've still reached a fixpoint, so it's correct, but a little bit too much work.

That being said, I'm not claiming it's a great solution: it seems that many parts do their job as expected, but the result is unfortunate. Ignoring speculative types in verification (or maybe of PhiNodes only) would also work. Anyway, the problem is an unprecise type, but it is still sound: working with it shouldn't be an issue.

I was told that maybe @rwestrel would have an opinion, or an idea to do differently.

Thanks,
Marc


Progress

  • Change must not contain extraneous whitespace

Issue

  • JDK-8367245: [lworld] C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1717/head:pull/1717
$ git checkout pull/1717

Update a local copy of the PR:
$ git checkout pull/1717
$ git pull https://git.openjdk.org/valhalla.git pull/1717/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1717

View PR using the GUI difftool:
$ git pr show -t 1717

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1717.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 4, 2025

👋 Welcome back mchevalier! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 4, 2025

@marc-chevalier This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8367245: [lworld] C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN"

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the lworld branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the lworld branch, type /integrate in a new comment.

@marc-chevalier marc-chevalier marked this pull request as ready for review November 4, 2025 14:22
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 4, 2025
@mlbridge
Copy link

mlbridge bot commented Nov 4, 2025

Webrevs

@rwestrel
Copy link
Collaborator

rwestrel commented Nov 4, 2025

Do I understand right that 1) Value is run first the Phi has 2 different nodes as inputs. 2) then the Phi's input change and are the same 3) something breaks at verification time?

If that's correct why don't we run Value again after 2)?

With:

in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4))
in(2): java/lang/Object * (speculative=null)

How can the meet be t=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *)? Is one of the region's input top?

@marc-chevalier
Copy link
Member Author

marc-chevalier commented Nov 4, 2025

I don't think this is quite correct. At first, we have

in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 [...]
in(2): java/lang/Object * (speculative=null)

which make _type to be

java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *)

(which I'm not arguing with: it seems correct at that point).

Then, some incremental inlining happen, the types of inputs change to

in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4))
in(2): java/lang/Object * (speculative=null)

(MyValue3 is replaced by MyValue2 in the type of in(1)) and from this point, we run PhiNode::Value (since indeed, the input changed) with the previously computed _type involving MyValue3.

In this invocation of Value (the last non-verification invocation), we compute (typeof(in(1)) \/ typeof(in(2))) /\ _type which gives java/lang/Object * (dropping the speculative type). This type becomes the _type of the PhiNode.

After that, we do verification (so no change to the inputs anymore), and we re-call PhiNode::Value. We recompute (typeof(in(1)) \/ typeof(in(2))) /\ _type with this new _type and here we get

java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *)

It's not the same as before, so it's a verification failure.


As, for why the meet of

in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4))
in(2): java/lang/Object * (speculative=null)

is

java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *)

maybe there is a misunderstanding (that I had myself): null here means Java's null, not C++ nullptr. So, basically, the Phi is either null or a MyValue2 (possibly null). It seems natural that the (Hotspot's) meet is the same as in(1), since typeof(in(1)) > typeof(in(2)). And I've double checked, this meet really happens, both Region's inputs are non-top. The lack of speculative type is displayed by simply omitting the "speculative=" part.

@rwestrel
Copy link
Collaborator

rwestrel commented Nov 5, 2025

TestSpeculativeTypes.java

Attached test case fails with "Missed optimization opportunity in PhaseIterGVN" as well (with mainline, nothing valhalla specific here). I run it with:

$ java -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:+PrintCompilation -XX:CompileOnly=TestSpeculativeTypes::test1 -XX:CompileCommand=quiet -XX:TypeProfileLevel=222 -XX:+AlwaysIncrementalInline -XX:VerifyIterativeGVN=10 -XX:CompileCommand=dontinline,TestSpeculativeTypes::notInlined1 TestSpeculativeTypes

Can you confirm it's indeed the same issue?

@marc-chevalier
Copy link
Member Author

I agree it's the same issue. Very nice! Both with mainline and valhalla. I had the suspicion it wasn't valhalla-specific because none of the concept and code involved was, but I couldn't find an example. I'll add the test.

For the record, we have:

in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3))
in(2): null
t: java/lang/Object * (speculative=TestSpeculativeTypes$C1:exact *)
_type: java/lang/Object * (speculative=TestSpeculativeTypes$C1:exact *)
Filter once: java/lang/Object *
Filter twice: java/lang/Object * (speculative=TestSpeculativeTypes$C2:exact *)

It's now to fix in mainline (and maybe only in mainline, and just count on jdk -> valhalla merges). But there is still the question of the solution, and it is still not clear what is the best way to me.

@merykitty
Copy link
Member

I think it is better to fix the type system. Having a join that does not satisfy X join Y < X is really confusing and opens the chance for many kinds of issues.

@marc-chevalier
Copy link
Member Author

I agree on principle, but I find that weird to have exactly this test with the above_centerline: it looks like it was made on purpose, and not just an overlooked detail. I'm somewhat shy to remove such a "feature", there might be a good reason it's here.

@rwestrel
Copy link
Collaborator

rwestrel commented Nov 6, 2025

The type system is quite complicated and it's hard to get right. This seems like a corner case: only for Phis and speculative types and mostly harmless unless caught by verification code. So I would not bother with the type system and go with a point fix similar to what Marc proposes.

@merykitty
Copy link
Member

I disagree. The issue only surfaces in this particular occasion does not mean it will not appear in other circumstances, possibly in the future. Even if the code is there on purpose, it seems that the purpose is executed incorrectly.

Additionally, an empty speculative type is supposed to mean that the path is speculatively unreachable, which is not the case here. So, another approach may be to fix the injection of speculative and assert that we should not obtain an empty speculative type?

@rwestrel
Copy link
Collaborator

rwestrel commented Nov 6, 2025

Additionally, an empty speculative type is supposed to mean that the path is speculatively unreachable,

Or that profile data for 2 different points in the execution is inconsistent which given how profile data is collected seems as likely to me.

Speculative types were added as a mechanism to fight profile pollution for scripting languages running on the JVM (specifically nashorn). They work really well in some cases. But they also have limited applicability. To me, it doesn't seem like a good use of developer time or our complexity budget to go with a complicated fix.

@marc-chevalier
Copy link
Member Author

I'm trying to understand why cleanup_speculative is doing so. It seems to come from JDK-8031755: Type speculation should be used to optimize explicit null checks. Maybe @rwestrel would remember something about it, but it has been quite a while since!

@merykitty
Copy link
Member

An important point is that:

The interesting part is that it's AnyNull: indeed, what else is a MyValue2 and MyValue3 at the same time?

There is one, though, and it is null. The join is incorrect here.

@merykitty
Copy link
Member

I have looked at the example provided by @rwestrel , and it seems true that when the speculative type is empty, the node is speculatively unreachable (test1 is always called with flag1 being false, so the return type of inline2(flag2) inside the compilation of test1 is unreachable). Now what can we do with this?

@rwestrel
Copy link
Collaborator

rwestrel commented Nov 6, 2025

I have looked at the example provided by @rwestrel , and it seems true that when the speculative type is empty, the node is speculatively unreachable (test1 is always called with flag1 being false, so the return type of inline2(flag2) inside the compilation of test1 is unreachable). Now what can we do with this?

That's how I crafted the test to get conflicting profiles.

I also had to use TypeProfileLevel=222 to enable some profile collection that's disabled by default. Most compilation wouldn't even see that many speculative types.

Another way to get conflicting profiles would be to make sure profile collection only happens when some particular value is returned and not when some other is returned. Profile collection doesn't start on first execution and stops as soon as a method is compiled by c2. So if a method is called very often, 1) initially returns some type and 2) then only later some other type, and if compilation with c2 happens between 1) and 2), then profile data only reports the type collected in 1).

@rwestrel
Copy link
Collaborator

rwestrel commented Nov 6, 2025

I'm trying to understand why cleanup_speculative is doing so. It seems to come from JDK-8031755: Type speculation should be used to optimize explicit null checks. Maybe @rwestrel would remember something about it, but it has been quite a while since!

I don't remember the details but I think the rationale is that if we're seeing conflicting profile, there's a good chance profile data is inaccurate and we can as well ignore it.

@TobiHartmann
Copy link
Member

Given that this is not a Valhalla specific issue, I think this should be fixed in mainline. We can disable the verification in Valhalla or disable the command line in the test until the fix is merged.

@marc-chevalier
Copy link
Member Author

I think the rationale is that if we're seeing conflicting profile, there's a good chance profile data is inaccurate and we can as well ignore it.

That makes sense. It's more practical than ignoring it everywhere we use it when it's oddly too specific. Indeed, speculative types don't need to be sound (or complete), and they start with a (probably) unsound premise: profiling data. I wonder if we are implicitly expecting too much soundness from the speculative type in the verification process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready Pull request is ready to be integrated rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants