-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8372460: Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance #28471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Welcome back swen! A progress list of the required criteria for merging this PR into |
|
@wenshao This change is no longer ready for integration - check the PR body for details. |
1. ShellWe run the following Shell command 2. Raw Benchmark DataPerformance data running on a MacBook M1 Pro: 3. Performance ComparisonPerformance Comparison: b649557 vs d8742d7
|
src/java.base/share/classes/java/time/format/DateTimeFormatterBuilder.java
Outdated
Show resolved
Hide resolved
The existing tests above can cover the cases where there are no non-ChronoFields, so no additional tests are needed. |
Webrevs
|
liach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of checking each component printer parser, we should check the public methods on DateTimeFormatterBuilder that can take a TemporalField and track the onlyChronoField there.
This is better because this is where users can actaully pass in non-ChronoField. For example, I last time discovered text printer parser, and now have discovered DefaultValueParser is problematic too.
So I believe guarding where users can pass custom TemporalField and adding a boolean field on a DateTimeFormatterBuilder to keep track of this is better.
|
Spreading out and duplicating the state across multiple classes isn't very satisfactory. |
+1. Never seen non-ChronoField in the wild |
I also plan to upgrade EnumMap to a custom ChronoFieldMap, like this: wenshao@b1cbc62 Keeping the current implementation would be easier.
If we upgrade to ChronoFieldMap, it will throw a ClassClastException not only in put, but also in other methods such as get/constainsKey, which would require too many changes. |
|
We should place more processing logic in the pattern parsing stage, rather than the text parsing stage. |
liach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think from your experiments, maintaining onlyChronoField is indeed way too painful. So I support updating the map in Parsed to use a custom implemented map. This should be not as risky as that map is never exposed to the public users.
This reverts commit b2b19e1.
liach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable in principle. However, we need to verify we indeed won't run into putting non-chronofield into an enum map by accident, and this is a bit hard...
/reviewers 2 reviewer
| */ | ||
| Parsed() { | ||
| @SuppressWarnings("unchecked") | ||
| Parsed(boolean onlyChronoField) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you know that only ChronoFields are used then imho the loop over the entries of fieldValues in method resolveFields can be skipped (line 290ff).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, but that should be a separate PR
| * Flag indicating whether this formatter only uses ChronoField instances. | ||
| * This is used to optimize the storage of parsed field values in the Parsed class. | ||
| */ | ||
| final boolean onlyChronoField; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add to DateTimePrinterParser the method:
public default boolean onlyChronoFields() {
return true;
}
and override in CompositePrinterParser, NumberPrinterParser, TextPrinterParser, DefaultValueParser with obvious implementations you should be able to get rid of this field, same in DateTimeFormatterBuilder. (Or keep the field, but initialize in the constructor from printerParser).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial version was similar to what you suggested. In the discussion above, I accepted liach's suggestion and modified it into the current implementation. I prefer the current implementation, and it will be easier to calculate chronoFieldsBitSet in the next step.
|
This version isn't well encapsulated and has changes across multiple files. |
|
|
I just noted that custom TemporalField implementations must be able to put any TemporalField they like into this map: jdk/src/java.base/share/classes/java/time/temporal/TemporalField.java Lines 364 to 379 in 2d09284
Roger's model will fail if a non-TemporalField puts into this map. This PR's model is safe because all |
RogerRiggs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has been through too many incremental changes.
I suspect a better solution is to implement a fit-for-purpose Map, optimized for ChronoFields but taking into account the possibility of unknown TemporalFields. All within the implementation of a Map<TemporalField, Long>.
I'd like to see this PR closed and take a fresh look with all that is learned by the attempt.
I believe that tasks that can be performed during the build process should not be done during the parse process. The process of building a DateTimeBuilder is executed once, while the parsing process is executed N times. For example, a pattern like Therefore, I think we should check whether chronoFieldOnly is used in DateTimeFormatterBuilder. |
|
Given early comments about parsing, I'd expect further work to allow queries of the Map testing for the fields needed by common patterns. A specialized Map could use a bitmap/array for the ChronoFields and test for multiple fields at a time. |
|
I think this work may be accepted for now for its immediate performance gain. Properly implementing a Map is a more complex task that is no less error prone compared to this onlyChronoField boolean tracker field. |
|
I primarily object to the spread of state across multiple classes where it is not needed. |


This PR optimizes the parsing performance of DateTimeFormatter by replacing HashMap with EnumMap in scenarios where the keys are exclusively ChronoField enum values.
When parsing date/time strings, DateTimeFormatter creates HashMaps to store intermediate parsed values. HashMap has more overhead for operations compared to specialized map implementations.
Since ChronoField is an enum and all keys in these maps are ChronoField instances, we can use EnumMap instead, which provides better performance for enum keys due to its optimized internal structure.
Parsing scenarios show improvements from 12% to 95%
Progress
Issue
Reviewers
Reviewers without OpenJDK IDs
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28471/head:pull/28471$ git checkout pull/28471Update a local copy of the PR:
$ git checkout pull/28471$ git pull https://git.openjdk.org/jdk.git pull/28471/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28471View PR using the GUI difftool:
$ git pr show -t 28471Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28471.diff
Using Webrev
Link to Webrev Comment