Skip to content

Conversation

@wenshao
Copy link
Contributor

@wenshao wenshao commented Nov 24, 2025

This PR optimizes the parsing performance of DateTimeFormatter by replacing HashMap with EnumMap in scenarios where the keys are exclusively ChronoField enum values.

When parsing date/time strings, DateTimeFormatter creates HashMaps to store intermediate parsed values. HashMap has more overhead for operations compared to specialized map implementations.

Since ChronoField is an enum and all keys in these maps are ChronoField instances, we can use EnumMap instead, which provides better performance for enum keys due to its optimized internal structure.

Parsing scenarios show improvements from 12% to 95%


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 2 Reviewers)

Issue

  • JDK-8372460: Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance (Enhancement - P4)

Reviewers

Reviewers without OpenJDK IDs

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28471/head:pull/28471
$ git checkout pull/28471

Update a local copy of the PR:
$ git checkout pull/28471
$ git pull https://git.openjdk.org/jdk.git pull/28471/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28471

View PR using the GUI difftool:
$ git pr show -t 28471

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28471.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 24, 2025

👋 Welcome back swen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@wenshao wenshao changed the title Improve DateTimeFormatter::parse performance by using EnumMap Use EnumMap to improve DateTimeFormatter parse performance Nov 24, 2025
@openjdk
Copy link

openjdk bot commented Nov 24, 2025

@wenshao This change is no longer ready for integration - check the PR body for details.

@openjdk
Copy link

openjdk bot commented Nov 24, 2025

@wenshao The following labels will be automatically applied to this pull request:

  • core-libs
  • i18n

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@wenshao
Copy link
Contributor Author

wenshao commented Nov 24, 2025

1. Shell

We run the following Shell command

# master
git checkout b6495573e9dc5470df268b63f8e7a93f38406cd2
make test TEST="micro:java.time.format.DateTimeFormatterParse"

# this pr
git checkout d8742d7514abfe0e36f105fa7310fdb1755ae546
make test TEST="micro:java.time.format.DateTimeFormatterParse"

2. Raw Benchmark Data

Performance data running on a MacBook M1 Pro:

# b649557 (master)
Benchmark                                           Mode  Cnt     Score     Error   Units
DateTimeFormatterParse.parseInstant                thrpt   15  2066.130 ± 126.134  ops/ms
DateTimeFormatterParse.parseLocalDate              thrpt   15  5014.987 ± 424.759  ops/ms
DateTimeFormatterParse.parseLocalDateTime          thrpt   15  3821.083 ± 390.928  ops/ms
DateTimeFormatterParse.parseLocalDateTimeWithNano  thrpt   15  3529.090 ± 209.195  ops/ms
DateTimeFormatterParse.parseLocalTime              thrpt   15  4275.904 ± 335.752  ops/ms
DateTimeFormatterParse.parseLocalTimeWithNano      thrpt   15  4596.255 ± 195.175  ops/ms
DateTimeFormatterParse.parseOffsetDateTime         thrpt   15  2330.924 ± 152.061  ops/ms
DateTimeFormatterParse.parseZonedDateTime          thrpt   15  1837.753 ± 107.873  ops/ms

# d8742d7 (this pr)
Benchmark                                           Mode  Cnt     Score     Error   Units
DateTimeFormatterParse.parseInstant                thrpt   15  2900.168 ±  56.079  ops/ms
DateTimeFormatterParse.parseLocalDate              thrpt   15  9787.592 ± 384.437  ops/ms
DateTimeFormatterParse.parseLocalDateTime          thrpt   15  5046.838 ± 271.451  ops/ms
DateTimeFormatterParse.parseLocalDateTimeWithNano  thrpt   15  3963.050 ± 434.662  ops/ms
DateTimeFormatterParse.parseLocalTime              thrpt   15  8196.707 ± 329.547  ops/ms
DateTimeFormatterParse.parseLocalTimeWithNano      thrpt   15  8387.213 ± 652.292  ops/ms
DateTimeFormatterParse.parseOffsetDateTime         thrpt   15  3291.076 ± 294.889  ops/ms
DateTimeFormatterParse.parseZonedDateTime          thrpt   15  2069.595 ± 293.385  ops/ms

3. Performance Comparison

Performance Comparison: b649557 vs d8742d7

Benchmark b649557 d8742d7 Improvement Factor
DateTimeFormatterParse.parseInstant 2066.130 ± 126.134 2900.168 ± 56.079 1.404x
DateTimeFormatterParse.parseLocalDate 5014.987 ± 424.759 9787.592 ± 384.437 1.952x
DateTimeFormatterParse.parseLocalDateTime 3821.083 ± 390.928 5046.838 ± 271.451 1.321x
DateTimeFormatterParse.parseLocalDateTimeWithNano 3529.090 ± 209.195 3963.050 ± 434.662 1.123x
DateTimeFormatterParse.parseLocalTime 4275.904 ± 335.752 8196.707 ± 329.547 1.919x
DateTimeFormatterParse.parseLocalTimeWithNano 4596.255 ± 195.175 8387.213 ± 652.292 1.825x
DateTimeFormatterParse.parseOffsetDateTime 2330.924 ± 152.061 3291.076 ± 294.889 1.412x
DateTimeFormatterParse.parseZonedDateTime 1837.753 ± 107.873 2069.595 ± 293.385 1.126x

@wenshao wenshao changed the title Use EnumMap to improve DateTimeFormatter parse performance Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance Nov 24, 2025
@wenshao
Copy link
Contributor Author

wenshao commented Nov 25, 2025

java/time/tck/java/time/temporal/TCKWeekFields.java
java/time/tck/java/time/temporal/TCKIsoFields.java
java/time/tck/java/time/temporal/TCKJulianFields.java
java/time/tck/java/time/format/TCKDateTimeParseResolver.java
java/time/tck/java/time/format/TCKLocalizedFieldParser.java
java/time/tck/java/time/format/TCKDateTimeFormatters.java
java/time/tck/java/time/format/TCKDateTimeFormatterBuilder.java

The existing tests above can cover the cases where there are no non-ChronoFields, so no additional tests are needed.

@wenshao wenshao changed the title Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance 8372460: Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance Nov 25, 2025
@wenshao wenshao marked this pull request as ready for review November 25, 2025 06:26
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 25, 2025
@mlbridge
Copy link

mlbridge bot commented Nov 25, 2025

@wenshao wenshao changed the title 8372460: Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance Nov 25, 2025
@wenshao wenshao changed the title Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance 8372460: Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance Nov 25, 2025
@wenshao wenshao requested a review from liach November 28, 2025 10:59
Copy link
Member

@liach liach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of checking each component printer parser, we should check the public methods on DateTimeFormatterBuilder that can take a TemporalField and track the onlyChronoField there.

This is better because this is where users can actaully pass in non-ChronoField. For example, I last time discovered text printer parser, and now have discovered DefaultValueParser is problematic too.

So I believe guarding where users can pass custom TemporalField and adding a boolean field on a DateTimeFormatterBuilder to keep track of this is better.

@RogerRiggs
Copy link
Contributor

Spreading out and duplicating the state across multiple classes isn't very satisfactory.
Since non-ChronoField is very unlikely, I'd suggest a more localized change confined to Parsed.
Always create the initial EnumMap and refactor the fieldValues.put() calls to a private utility method to catch the ClassCatchException and upgrade the map to a HashMap.
That should retain the performance improvements without any extra overhead or non-local code changes for all of the normal cases.

@naotoj
Copy link
Member

naotoj commented Dec 1, 2025

Since non-ChronoField is very unlikely, I'd suggest a more localized change confined to Parsed.

+1. Never seen non-ChronoField in the wild

@wenshao
Copy link
Contributor Author

wenshao commented Dec 2, 2025

Spreading out and duplicating the state across multiple classes isn't very satisfactory. Since non-ChronoField is very unlikely, I'd suggest a more localized change confined to Parsed. Always create the initial EnumMap and refactor the fieldValues.put() calls to a private utility method to catch the ClassCatchException and upgrade the map to a HashMap. That should retain the performance improvements without any extra overhead or non-local code changes for all of the normal cases.

I also plan to upgrade EnumMap to a custom ChronoFieldMap, like this: wenshao@b1cbc62 Keeping the current implementation would be easier.

image

If we upgrade to ChronoFieldMap, it will throw a ClassClastException not only in put, but also in other methods such as get/constainsKey, which would require too many changes.

@wenshao
Copy link
Contributor Author

wenshao commented Dec 2, 2025

We should place more processing logic in the pattern parsing stage, rather than the text parsing stage.

Copy link
Member

@liach liach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from your experiments, maintaining onlyChronoField is indeed way too painful. So I support updating the map in Parsed to use a custom implemented map. This should be not as risky as that map is never exposed to the public users.

@wenshao wenshao requested a review from liach December 5, 2025 07:49
@wenshao
Copy link
Contributor Author

wenshao commented Dec 9, 2025

image

As shown in the image above, the DateTimeFormatterBuilder#appendValue method does not need to call checkField; it only needs to be called within appendInternal.

Copy link
Member

@liach liach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable in principle. However, we need to verify we indeed won't run into putting non-chronofield into an enum map by accident, and this is a bit hard...

/reviewers 2 reviewer

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 16, 2025
@openjdk
Copy link

openjdk bot commented Dec 16, 2025

@liach
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Dec 16, 2025
@wenshao wenshao mentioned this pull request Dec 16, 2025
3 tasks
*/
Parsed() {
@SuppressWarnings("unchecked")
Parsed(boolean onlyChronoField) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you know that only ChronoFields are used then imho the loop over the entries of fieldValues in method resolveFields can be skipped (line 290ff).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, but that should be a separate PR

* Flag indicating whether this formatter only uses ChronoField instances.
* This is used to optimize the storage of parsed field values in the Parsed class.
*/
final boolean onlyChronoField;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add to DateTimePrinterParser the method:

public default boolean onlyChronoFields() {
    return true;
} 

and override in CompositePrinterParser, NumberPrinterParser, TextPrinterParser, DefaultValueParser with obvious implementations you should be able to get rid of this field, same in DateTimeFormatterBuilder. (Or keep the field, but initialize in the constructor from printerParser).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d8742d7

The initial version was similar to what you suggested. In the discussion above, I accepted liach's suggestion and modified it into the current implementation. I prefer the current implementation, and it will be easier to calculate chronoFieldsBitSet in the next step.

@RogerRiggs
Copy link
Contributor

This version isn't well encapsulated and has changes across multiple files.
What was suggested at the beginning of December is prototyped in PR #28936.

@wenshao
Copy link
Contributor Author

wenshao commented Dec 20, 2025

This version isn't well encapsulated and has changes across multiple files. What was suggested at the beginning of December is prototyped in PR #28936.

  1. Using exceptions for logic control seems like a bad practice.

  2. A DateTimeFormatter is reused multiple times. Our approach of calculating onlyChronoField only runs once during construction. However, using a runtime approach with try-catch ClassCastException executes the corresponding code every time parse is called. This is a choice between executing once and executing multiple times.

  3. In the DateTimeFormatter::checkField method, we could later add an int chronoFieldBitSet field to record which ChronoFields are used. This could further optimize other methods of Parsed.

@liach
Copy link
Member

liach commented Dec 20, 2025

I just noted that custom TemporalField implementations must be able to put any TemporalField they like into this map:

* @param fieldValues the map of fields to values, which can be updated, not null
* @param partialTemporal the partially complete temporal to query for zone and
* chronology; querying for other things is undefined and not recommended, not null
* @param resolverStyle the requested type of resolve, not null
* @return the resolved temporal object; null if resolving only
* changed the map, or no resolve occurred
* @throws ArithmeticException if numeric overflow occurs
* @throws DateTimeException if resolving results in an error. This must not be thrown
* by querying a field on the temporal without first checking if it is supported
*/
default TemporalAccessor resolve(
Map<TemporalField, Long> fieldValues,
TemporalAccessor partialTemporal,
ResolverStyle resolverStyle) {
return null;
}

Roger's model will fail if a non-TemporalField puts into this map. This PR's model is safe because all TemporalField here will be ChronoField which won't try to do dangerous stuff to the map.

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been through too many incremental changes.
I suspect a better solution is to implement a fit-for-purpose Map, optimized for ChronoFields but taking into account the possibility of unknown TemporalFields. All within the implementation of a Map<TemporalField, Long>.
I'd like to see this PR closed and take a fresh look with all that is learned by the attempt.

@wenshao
Copy link
Contributor Author

wenshao commented Dec 21, 2025

This PR has been through too many incremental changes. I suspect a better solution is to implement a fit-for-purpose Map, optimized for ChronoFields but taking into account the possibility of unknown TemporalFields. All within the implementation of a Map<TemporalField, Long>. I'd like to see this PR closed and take a fresh look with all that is learned by the attempt.

I believe that tasks that can be performed during the build process should not be done during the parse process.

The process of building a DateTimeBuilder is executed once, while the parsing process is executed N times.

For example, a pattern like yyyy-MM-dd HH:mm:ss.SSS requires calling the put method of the Map 7 times during the parsing process.

Therefore, I think we should check whether chronoFieldOnly is used in DateTimeFormatterBuilder.

@RogerRiggs
Copy link
Contributor

Given early comments about parsing, I'd expect further work to allow queries of the Map testing for the fields needed by common patterns. A specialized Map could use a bitmap/array for the ChronoFields and test for multiple fields at a time.
A specialized Map could have a putChronoField method that would bypass extra testing on the type, it would be used by the implementation in Parsed maintaining encapsulation.
There is an edge case that could be used for an custom implementation of TemporalField in which the TemporField.resolve implementation for the new custom field could put a new non-ChronoField field into the map.

@liach
Copy link
Member

liach commented Dec 22, 2025

I think this work may be accepted for now for its immediate performance gain. Properly implementing a Map is a more complex task that is no less error prone compared to this onlyChronoField boolean tracker field.

@RogerRiggs
Copy link
Contributor

I primarily object to the spread of state across multiple classes where it is not needed.
Accepting short term gains, just puts off final solutions and tends to muddy the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

6 participants