-
Notifications
You must be signed in to change notification settings - Fork 127
8367245: [lworld] C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN" #1717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back mchevalier! A progress list of the required criteria for merging this PR into |
|
@marc-chevalier This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
|
Do I understand right that 1) If that's correct why don't we run With: How can the meet be |
|
I don't think this is quite correct. At first, we have which make (which I'm not arguing with: it seems correct at that point). Then, some incremental inlining happen, the types of inputs change to ( In this invocation of After that, we do verification (so no change to the inputs anymore), and we re-call It's not the same as before, so it's a verification failure. As, for why the meet of is maybe there is a misunderstanding (that I had myself): null here means Java's null, not C++ nullptr. So, basically, the Phi is either null or a MyValue2 (possibly null). It seems natural that the (Hotspot's) meet is the same as |
|
Attached test case fails with "Missed optimization opportunity in PhaseIterGVN" as well (with mainline, nothing valhalla specific here). I run it with: Can you confirm it's indeed the same issue? |
|
I agree it's the same issue. Very nice! Both with mainline and valhalla. I had the suspicion it wasn't valhalla-specific because none of the concept and code involved was, but I couldn't find an example. I'll add the test. For the record, we have: It's now to fix in mainline (and maybe only in mainline, and just count on jdk -> valhalla merges). But there is still the question of the solution, and it is still not clear what is the best way to me. |
|
I think it is better to fix the type system. Having a join that does not satisfy |
|
I agree on principle, but I find that weird to have exactly this test with the |
|
The type system is quite complicated and it's hard to get right. This seems like a corner case: only for |
|
I disagree. The issue only surfaces in this particular occasion does not mean it will not appear in other circumstances, possibly in the future. Even if the code is there on purpose, it seems that the purpose is executed incorrectly. Additionally, an empty speculative type is supposed to mean that the path is speculatively unreachable, which is not the case here. So, another approach may be to fix the injection of speculative and assert that we should not obtain an empty speculative type? |
Or that profile data for 2 different points in the execution is inconsistent which given how profile data is collected seems as likely to me. Speculative types were added as a mechanism to fight profile pollution for scripting languages running on the JVM (specifically nashorn). They work really well in some cases. But they also have limited applicability. To me, it doesn't seem like a good use of developer time or our complexity budget to go with a complicated fix. |
|
I'm trying to understand why |
|
An important point is that:
There is one, though, and it is |
|
I have looked at the example provided by @rwestrel , and it seems true that when the speculative type is empty, the node is speculatively unreachable ( |
That's how I crafted the test to get conflicting profiles. I also had to use Another way to get conflicting profiles would be to make sure profile collection only happens when some particular value is returned and not when some other is returned. Profile collection doesn't start on first execution and stops as soon as a method is compiled by c2. So if a method is called very often, 1) initially returns some type and 2) then only later some other type, and if compilation with c2 happens between 1) and 2), then profile data only reports the type collected in 1). |
I don't remember the details but I think the rationale is that if we're seeing conflicting profile, there's a good chance profile data is inaccurate and we can as well ignore it. |
|
Given that this is not a Valhalla specific issue, I think this should be fixed in mainline. We can disable the verification in Valhalla or disable the command line in the test until the fix is merged. |
That makes sense. It's more practical than ignoring it everywhere we use it when it's oddly too specific. Indeed, speculative types don't need to be sound (or complete), and they start with a (probably) unsound premise: profiling data. I wonder if we are implicitly expecting too much soundness from the speculative type in the verification process. |
Analysis
Obervationally
IGVN
During IGVN, in
PhiNode::Value, aPhiNodehas 2 inputs. Their types are:We compute the join (HS' meet):
valhalla/src/hotspot/share/opto/cfgnode.cpp
Lines 1310 to 1317 in 412ec88
But the current
_type(of thePhiNodeas aTypeNode) isWe filter
tby_typevalhalla/src/hotspot/share/opto/cfgnode.cpp
Line 1332 in 412ec88
and we get
which is what we return. After the end of
Value, the returned becomes the newPhiNode's_type.valhalla/src/hotspot/share/opto/phaseX.cpp
Lines 2150 to 2164 in 412ec88
and
valhalla/src/hotspot/share/opto/node.cpp
Lines 1127 to 1133 in 412ec88
Verification
On verification,
in(1),in(2)have the same value, so doest. But this timeand so after filtering
tby (new)_typeand we getwhich is retuned. Verification gets angry because the new
ftis not the same as the previous one.But why?!
Details on type computation
In short, we are doing
and observing that
ft != ft'. It seems our lattice doesn't ensure(a /\ b) /\ b = a /\ bwhich is problematic for this kind of verfication that will just "try again and see if something change".To me, the surprising fact was that the intersection
What happened to the speculative type? Both
MyValue2andMyValue3are inheritingMyAbstract(and implementingMyInterface). So the code correctly find that the intersection of these speculative type isThe interesting part is that it's
AnyNull: indeed, what else is aMyValue2andMyValue3at the same time? And then,above_centerlinedecides it's not useful enough (too precise, too clone from HS' top/normal bottom) and remove the speculative type.valhalla/src/hotspot/share/opto/type.cpp
Lines 2886 to 2888 in 412ec88
But on the verification run,
compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *is intersected with the speculative type ofjava/lang/Object *, which is unknown (HS' bottom/normal top), so we are simply gettingMyValue2. If we did not discardAnyNullusingabove_centerline, we would have the intersection ofMyValue2andAnyNull, givingAnyNull, which is indeed stable.Ok, but the types are weird?
Indeed, they are! How can we get a speculative type
MyValue3on thePhiNodewhen inputs are bothObject, and one is speculated to be aMyValue2? This comes from incremental inlining. It seems that we have some profiling information on the returned type of a callee, that happens to beMyValue3, which propagate to thePhiNode. Later, the callee is inlined, and we get new type information (MyValue2) from its body (from the returned type of a callee of our callee, if I remember well), that reaches the input of ourPhiNode.Reproducing
In Valhalla
This crash is quite rare because:
PhiNode::Valueis called a second time, it will stabilize the_typefield before verification.To limitate the influence of 2., I've tested with an additional assert that would immediately do
in
PhiNode::Valueand compareftandft_. Indeed, we are never sure a run ofValueis not the last one: it should always be legal to stop anywhere (even if in a particular case), it was going to run further.With this extra check, the crash a bit more common, but still pretty rare. Tests that have been witness to crash then at least once:
compiler/valhalla/inlinetypes/TestCallingConvention.javacompiler/valhalla/inlinetypes/TestIntrinsics.javacompiler/valhalla/inlinetypes/TestArrays.javacompiler/valhalla/inlinetypes/TestBasicFunctionality.javaAll in
compiler/valhalla/inlinetypeswhile I was also testing with mainline tests. Suspicious, uh.In mainline
With the aforementioned extra check, I've tried to see if it could happen on mainline since the involved code seems not to be valhalla-specific. As we could expect given that only valhalla tests have been seen to crash, no such crash at all.
Crafting an example
I've tried to craft an example that would create a similar situation, but without luck. I never managed to reach a correct setup of incremental inlining, conflicting type profiling...
Fixing
I think changing the type system would be quite risky: it is all over the place. Also, fixing would require not to drop the speculative type when
above_centerline. This is not like something missing or a corner case that one didn't think of, it's rather removing a feature that is explicitly here, so probably on purpose.As a first approach, one could simply run
filter_speculativetwice, that should be enough as the second filter will simply select the non empty speculative type if there is only one, and this one won't beabove_centerline, or it would not exist as a speculative type already.To try to be a bit less aggressive, we can rather do that in case where we know it cannot be useful. If
ftobtained fromfilter_speculativehas no speculative type, butthas, we can know that it might be because it has been dropped, and computingt->filter_speculative(ft)could pick the speculative type oft. The speculative type can still be removed if the non-speculative type offtis exact and non null for instance, but we've still reached a fixpoint, so it's correct, but a little bit too much work.That being said, I'm not claiming it's a great solution: it seems that many parts do their job as expected, but the result is unfortunate. Ignoring speculative types in verification (or maybe of
PhiNodes only) would also work. Anyway, the problem is an unprecise type, but it is still sound: working with it shouldn't be an issue.I was told that maybe @rwestrel would have an opinion, or an idea to do differently.
Thanks,
Marc
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1717/head:pull/1717$ git checkout pull/1717Update a local copy of the PR:
$ git checkout pull/1717$ git pull https://git.openjdk.org/valhalla.git pull/1717/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1717View PR using the GUI difftool:
$ git pr show -t 1717Using diff file
Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1717.diff
Using Webrev
Link to Webrev Comment