-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F #26457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back galder! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, it looks really good :)
I'll need to do some testing later though.
@@ -0,0 +1,125 @@ | |||
package org.openjdk.bench.java.lang; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this benchmark belongs with the other vectorization benchmarks under
test/micro/org/openjdk/bench/vm/compiler/Vector*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really a language feature, more for vm/compiler
@@ -0,0 +1,125 @@ | |||
package org.openjdk.bench.java.lang; | |||
|
|||
import org.openjdk.jmh.annotations.Benchmark; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file might also need a Copyright?
|
@@ -1815,6 +1827,17 @@ Node* VectorReinterpretNode::Identity(PhaseGVN *phase) { | |||
return this; | |||
} | |||
|
|||
bool VectorReinterpretNode::implemented(int opc, uint vlen, BasicType src_type, BasicType dst_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opc
is not used in this method. Do we need this parameter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup not needed
} | ||
|
||
@Benchmark | ||
public long[] doubleToLongBits() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this be more concise (and maybe more readable as well) -
@Benchmark
public long[] doubleToLongBits() {
for (int i = 0; i < doubles.length; i++) {
resultLongs[i] = Double.doubleToLongBits(doubles[i]);
}
return resultLongs;
}
The loop should still get vectorized (if vectorizable).
Same for other benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe but there's a reason why I wrote these benchmark methods this way. Keeping each line doing one thing makes it easier to map each line to the assembly (e.g. perfasm
) and related IR nodes (e.g. PrintIdeal
). That IMO is more important than the conciseness of the benchmark. What do others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks!
Although this is not in the scope of this patch, but I wonder if we could rename |
@@ -1815,6 +1827,17 @@ Node* VectorReinterpretNode::Identity(PhaseGVN *phase) { | |||
return this; | |||
} | |||
|
|||
bool VectorReinterpretNode::implemented(int opc, uint vlen, BasicType src_type, BasicType dst_type) { | |||
if ((src_type == T_FLOAT && dst_type == T_INT) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion, do you feel a switch-case
could be more readable/clear in this case? Something like this -
bool VectorReinterpretNode::implemented(uint vlen, BasicType src_type, BasicType dst_type) {
switch (src_type) {
case T_FLOAT:
if (dst_type != T_INT) return false;
break;
case T_INT:
if (dst_type != T_FLOAT) return false;
break;
case T_DOUBLE:
if (dst_type != T_LONG) return false;
break;
case T_LONG:
if (dst_type != T_DOUBLE) return false;
break;
default:
return false;
}
return Matcher::match_rule_supported_auto_vectorization(Op_VectorReinterpret, vlen, dst_type);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both options look just fine to me, but I'm happy to re-write it like that if others also feel the same way.
You're suggesting to modify
|
That sounds reasonable to me |
Yes, that's right. I believe |
Internal testing of the latest commit looked good (not a review). |
@merykitty thanks for the approval. I've run tier1-3 tests for 147633f and they all passed, and the benchmark results are the same as in the description. Thanks @chhagedorn for running the tests! |
@@ -1632,6 +1632,8 @@ bool SuperWord::implemented(const Node_List* pack, const uint size) const { | |||
retValue = ReductionNode::implemented(opc, size, arith_type->basic_type()); | |||
} else if (VectorNode::is_convert_opcode(opc)) { | |||
retValue = VectorCastNode::implemented(opc, size, velt_basic_type(p0->in(1)), velt_basic_type(p0)); | |||
} else if (VectorNode::is_reinterpret_opcode(opc)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this affect Op_ReinterpretHF2S
that is also in VectorNode::is_reinterpret_opcode
?
I'm afraid that we need to test this with hardware or Intel's SDE, to make sure we have it running on a VM that actually supports Float16. Otherwise these instructions may not be used, and hence not tested right.
@galderz Can you run the relevant tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you run specific tiers in those platforms? Just hotspot compiler? Or individual tests such as ConvF2HFIdealizationTests
and TestFloat16ScalarOperations
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't know, I'd have to do the research myself. Probably focusing on the Float16 tests would be good enough. No other test would really use Float16, so running anything else would not be that useful probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some testing on x86_64 and aarch64 and the tests pass.
I also made sure that the test output demonstrated execution of the expected IR rule as per the requirements of each platform.
c7gn.2xlarge
Graviton3
==============================
Test summary
==============================
TEST TOTAL PASS FAIL ERROR SKIP
jtreg:test/hotspot/jtreg/compiler/c2/irTests/ConvF2HFIdealizationTests.java
1 1 0 0 0
jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
1 1 0 0 0
jtreg:test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java
1 1 0 0 0
==============================
TEST SUCCESS
$ tail ConvF2HFIdealizationTests.jtr
Messages from Test VM
---------------------
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in test1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
----------System.err:(3/35)----------
JavaTest Message: Test complete.
result: Passed. Execution successful
test result: Passed. Execution successful
$ tail TestFloat16ScalarOperations.jtr
Messages from Test VM
---------------------
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDivByPOT: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMulByTWO: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testInexactFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSNaNFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testQNaNFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testExactFP16ConstantPatterns: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testRandomFP16ConstantPatternSet1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testRandomFP16ConstantPatternSet2: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testRounding1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testRounding2: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMax: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testAddConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDivConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMin: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMinConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testEliminateIntermediateHF2S: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDivByOne: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testFMAConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMaxConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMul: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testconvF2HFAndS2HF: Feature constraint not met (applyIfCPUFeature): avx512_fp16, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testDiv: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSqrtConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSqrt: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testMulConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testFma: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testAdd1: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testAdd2: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSubConstantFolding: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
[IREncodingPrinter] Disabling IR matching for rule 1 of 2 in testSub: None of the feature constraints met (applyIfCPUFeatureOr): avx512_fp16, true, zfh, true
----------System.err:(3/35)----------
JavaTest Message: Test complete.
result: Passed. Execution successful
test result: Passed. Execution successful
c7i.xlarge
Intel(R) Xeon(R) Platinum 8488C (saphire rapids):
==============================
Test summary
==============================
TEST TOTAL PASS FAIL ERROR SKIP
jtreg:test/hotspot/jtreg/compiler/c2/irTests/ConvF2HFIdealizationTests.java
1 1 0 0 0
jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
1 1 0 0 0
jtreg:test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java
1 1 0 0 0
==============================
TEST SUCCESS
$ tail ConvF2HFIdealizationTests.jtr
Messages from Test VM
---------------------
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in test1: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
----------System.err:(3/35)----------
JavaTest Message: Test complete.
result: Passed. Execution successful
test result: Passed. Execution successful
$ tail TestFloat16ScalarOperations.jtr
Messages from Test VM
---------------------
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testDivByPOT: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMulByTWO: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testInexactFP16ConstantPatterns: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSNaNFP16ConstantPatterns: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testQNaNFP16ConstantPatterns: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testExactFP16ConstantPatterns: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testRandomFP16ConstantPatternSet1: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testRandomFP16ConstantPatternSet2: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testRounding1: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testRounding2: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMax: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testAddConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testDivConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMin: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMinConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testEliminateIntermediateHF2S: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testDivByOne: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testFMAConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMaxConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMul: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testconvF2HFAndS2HF: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testDiv: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSqrtConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSqrt: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testMulConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testFma: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testAdd1: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testAdd2: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSubConstantFolding: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
[IREncodingPrinter] Disabling IR matching for rule 2 of 2 in testSub: Not all feature constraints are met (applyIfCPUFeatureAnd): fphp, true, asimdhp, true
----------System.err:(3/35)----------
JavaTest Message: Test complete.
result: Passed. Execution successful
test result: Passed. Execution successful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I've noticed that TestFloat16ScalarOperations
does not have package
definition. Is that an oversight? It runs fine in spite of not having it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, as you mostly touched the auto-vectorization part of c2, could you please run these float16 tests as well (most of these enable auto-vectorization for Float16) -
compiler/vectorization/TestFloat16VectorOperations.java
compiler/vectorization/TestFloatConversionsVectorNaN.java
compiler/vectorization/TestFloatConversionsVector.java
compiler/vectorization/TestFloat16ToFloatConv.java
compiler/vectorization/TestFloat16VectorConvChain.java
compiler/intrinsics/float16/*
I've added support to vectorize
MoveD2L
,MoveL2D
,MoveF2I
andMoveI2F
nodes. The implementation follows a similar pattern to what is done with conversion (Conv*
) nodes. The tests inTestCompatibleUseDefTypeSize
have been updated with the new expectations.Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
The improvements observed are a result of vectorization. The lack of vectorization in
doubleToLongBits
andfloatToIntBits
demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26457/head:pull/26457
$ git checkout pull/26457
Update a local copy of the PR:
$ git checkout pull/26457
$ git pull https://git.openjdk.org/jdk.git pull/26457/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26457
View PR using the GUI difftool:
$ git pr show -t 26457
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26457.diff
Using Webrev
Link to Webrev Comment