Skip to content

Include :date, :datetime, and :time with minimal options #1083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 21, 2025

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Jun 23, 2025

See #1082 for some of the discussion leading to this proposal.

This is an alternative to #1077, where instead of dateStyle and timeStyle we start with the following options:

  • :datetime
    • dateFields
    • dateLength
    • timePrecision
    • timeZoneStyle
  • :date
    • fields
    • length
  • :time
    • precision
    • timeZoneStyle

With these, we are able to represent each of the style option combinations rather succinctly:

Style options (#1077) This PR
:date style=short :date length=short
:date style=medium :date length=medium
:date style=long :date length=long
:date style=full :date fields=year-month-day-weekday length=long
:time style=short :time precision=minute
:time style=medium :time precision=second
:time style=long :time precision=second timeZoneStyle=short
:time style=full :time precision=second timeZoneStyle=long
:datetime dateStyle=short timeStyle=medium :datetime dateLength=short timePrecision=second

For the most part, dateStyle maps to dateLength (and dateFields to bring in the weekday), while timeStyle maps to timePrecision and timeZoneStyle.

This approach allows for the same locale-specific optionality as is currently in CLDR data, with e.g. ps and th including the era with the year, and many locales defaulting to including a dayperiod (am/pm) indicator in the time.

The expectation here is that the options being provided here are sufficiently easy to use and understand without any external reference that we won't need to later introduce dateStyle and timeStyle, while providing a sufficient foundation for filling in the remaining details.

This is not intended to be a complete solution for datetime formatting, but a step towards that. If we accept this, we will need to subsequently and separately consider additional option values like timePrecision=minuteOptional or dateFields=year-month, as well as additional options like yearStyle or fractionalSecondDigits, or even dateStyle and timeStyle. They've all been left out of here to keep this as small as possible, while providing a cohesive whole for many date/time formatting needs.

@eemeli eemeli requested review from aphillips, sffc and mihnita June 23, 2025 22:27
@sffc
Copy link
Member

sffc commented Jun 27, 2025

I'm okay with this, although we should have a discussion with the broader group on the option naming bikeshed.

@aphillips aphillips added functions Issue pertains to the default function set normative Issue affects normative text in the specification LDML48 labels Jul 2, 2025
@eemeli
Copy link
Collaborator Author

eemeli commented Jul 6, 2025

I've dropped the "day" option value from :datetime dateFields and :date fields, as @sffc pointed out on the call that no formatting patterns are available for this.

I've kept in weekday and day-weekday, as those do appear to be supported by data and are more common in usage than day.

@aphillips
Copy link
Member

@eemeli We can always add day back in the future. There are formatting requirements here, I guess, with messages like "You will receive your shipment on the 14th." (which suggests the ability to select ordinal in-sentence form instead of standalone).

- `hour`
- `minute` (default)
- `second`
- `timeZoneStyle`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeZoneStyle or zoneStyle?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeZoneStyle. As I mention in #1082 (comment),

I'd be fine with timeZoneStyle or timeZoneDisplay. Abbreviating out the "time" part would be a mistake -- it's not at all obvious to anyone who's not worked closely with datetime formatting that an unspecified "zone" is referring to a time zone [1].

[1] Unscientifically just verified this by asking three non-developer relatives about the meaning of the last option in the example at the top post here. They were quite confused, until I changed the name to timeZoneStyle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The experimental ICU4X syntax uses zoneStyle and the Rust API uses zone

- `minute` (default)
- `second`
- `timeZoneStyle`
- `never` (default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "never" string used elsewhere in MF2? Seems like "hidden" or something might make more sense in this context

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's used for a number of number function options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little discussion: "never" feels better when used as an adverb in context with always/never

Comment on lines 147 to 151
- `longGeneric`
- `longOffset`
- `short`
- `shortGeneric`
- `shortOffset`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: these are the ECMA names, which are not exactly the same as the names in the experimental ICU4X JSON syntax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +36 to +42
- `dateFields`
- `weekday`
- `day-weekday`
- `month-day`
- `month-day-weekday`
- `year-month-day` (default)
- `year-month-day-weekday`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super happy with mixing camel case and kebab case, but I don't really want the field sets to be in camel case

Copy link
Member

@sffc sffc Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The casing was a major point of contention in the ICU4X TC, which is one reason we ended up landing on YMD. (I remember where I was when we had that discussion: I was in Helsinki walking to the Future Frontend speaker dinner)

Comment on lines +93 to +99
- `fields`
- `weekday`
- `day-weekday`
- `month-day`
- `month-day-weekday`
- `year-month-day` (default)
- `year-month-day-weekday`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @eemeli removed the day fieldset based on an offhand remark I made at a meeting in June that the data for that fieldset was not high quality.

But, there is data for that fieldset in CLDR. It is just the number like "15", and my comment was that I think it would be better as "the 15th". But that's not my prerogative: it is something CLDR should decide, and it has clearly taken the position that it should be just the integer (this is from en.xml):

					<availableFormats>
						<dateFormatItem id="Bh">h B</dateFormatItem>
						<dateFormatItem id="Bhm">h:mm B</dateFormatItem>
						<dateFormatItem id="Bhms">h:mm:ss B</dateFormatItem>
						<dateFormatItem id="d">d</dateFormatItem>
						<dateFormatItem id="E">ccc</dateFormatItem>
						<dateFormatItem id="EBh">E h B</dateFormatItem>
						<dateFormatItem id="EBhm">E h:mm B</dateFormatItem>
						<dateFormatItem id="EBhms">E h:mm:ss B</dateFormatItem>
						<dateFormatItem id="Ed">d E</dateFormatItem>

I have a similar complaint about day-weekday, which I think should be "Monday the 15th", but the data currently says "15 Monday"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aphillips's comment above #1083 (comment) would indicate a similar sentiment.

Out of an abundance of caution, my preferred solution here would be to leave out day-weekday as well from this PR, so that it may be discussed later.

@sffc
Copy link
Member

sffc commented Jul 7, 2025

Regarding options whose different values impact ICU4X data loading:

  • dateFields requires data loading. For example, it determines whether or not you need to load month names.
  • length requires data loading. For example, it determines whether you load the long month names or the short month names.
  • timePrecision does not currently require data loading, but I don't see a strong reason to treat it differently from the other options.
  • timeZoneStyle requires data loading. For example, it determines whether you need to load the time zone location names.

@eemeli
Copy link
Collaborator Author

eemeli commented Jul 8, 2025

@sffc Given that you don't mention anything about the override options in your last comment, can we presume that those do not impose any data loading concerns?

@eemeli
Copy link
Collaborator Author

eemeli commented Jul 8, 2025

I've applied some updates following yesterday's discussion:

  • The draft status note is re-introduced, and expanded to refer to Semantic Skeletons. Future expansion of the option set is specifically mentioned.
  • The set of valid timeZoneStyle option values is reduced to only long and short. The option is not renamed.
  • All non-override options are required to be literal.

@sffc
Copy link
Member

sffc commented Jul 8, 2025

@sffc Given that you don't mention anything about the override options in your last comment, can we presume that those do not impose any data loading concerns?

I was commenting on the options listed in #1083 (comment) which didn't mention the override options.

  • timeZone: We've gone out of our way in ICU4X to make this one be settable at format time
  • hour12: Impacts data loading because you need different patterns for the different hour cycles
  • calendar: Impacts data loading because you need to load different patterns and display names

Separate question: May I ask why you want to use hour12 instead of relying on -u-hc? Perhaps better for another discussion issue. I'm uncomfortable stabilizing that option without dissecting the pros and cons.

I already have an issue open for timeZone: #961

@aphillips aphillips merged commit 1bcc219 into main Jul 21, 2025
2 checks passed
@aphillips aphillips deleted the datetime-fields branch July 21, 2025 16:44
@sffc
Copy link
Member

sffc commented Jul 21, 2025

As discussed today, there are actually 3 places where information can be injected:

  1. Features consistent across all locales
  2. Features reachable from the locale/environment but not the input
  3. Features reachable from the input

For example:

  1. A developer writes a message that uses datetime formatting with a Month+Day+Time field set. To format that message, ICU4X doesn't need to ship any data for era names and weekday names in any locale. This can happen at compile time, bucket (1).
  2. When creating a MessageFormatter, the developer loads the translation for the user's locale. The locale also specifies the preferred calendar system and time zone. This happens at formatter creation time, bucket (2).
  3. When using the MessageFormatter, the user provides the specific date, time, and time zone. This happens at format time, bucket (3).

The constraints on each option are:

  1. Bucket 1: should be fixed inside the message, not reading from input or environment
  2. Bucket 2: may read from the environment (the same as the locale), but may not change based on inputs
  3. Bucket 3: may change at any time, even based on an input

With this new terminology, I should clarify where each of the options lands:

Option Preferred Bucket Theoretically Feasible Bucket
dateFields 1 1
length 1 2
timePrecision 1 3
timeZoneStyle 1 1
timeZone 2 3
hour12 2 2
calendar 2 2

"Preferred Bucket" is the bucket where I think the option should fit in order to be most future-proof. "Theoretically Feasible Bucket" is a bucket where the option could be implemented and still work with ICU4X's current data model.

In other words, my position is:

  • I prefer for the options to be in their preferred bucket
  • I will not stop the WG from adopting the "Theoretically Feasible" bucket if they believe that there is a motivating use case and moving up to the next bucket is the way to achieve that use case
  • If the WG feels that there is a very strong reason to move one of these options to a higher bucket than the "Theoretically Feasible" one, it will require additional discussion and the ICU4X-TC may or may not support it.

@macchiati
Copy link
Member

Can you add a bit more info?

1 A developer writes a message that uses datetime formatting with a Month+Day+Time field set. To format that message, ICU4X doesn't need to ship any data for era names and weekday names in any locale. This can happen at compile time, bucket (1).

Does that mean that if the message specified Month+Day+Time+Weekday, it would move into bucket 2?

2 When creating a MessageFormatter, the developer loads the translation for the user's locale. The locale also specifies the preferred calendar system and time zone. This happens at formatter creation time, bucket (2).

Does that mean that if the message specifies Month+Day+Time+Timezone, and the developer hasn't loaded "the translation for the user's locale"*, that the message formatting will fail? Or fallback to without the timezone?

* I take it to mean not literally "the translation for the user's locale", meaning "English" for "en"))

3 When using the MessageFormatter, the user provides the specific date, time, and time zone. This happens at format time, bucket (3).

Does "user" mean end-user (eg I enter in a date in a spreadsheet cell), or rather the caller (the code calling message format)? In either case, I'm confused.

Suppose the code makes the following (pseudocode) call.

userDate = 2025-07-21
userTime = 14:35:50
userZone = America/Los_Angeles

formatMessage(message, userDate, userTime, userZone)

Does that mean that no matter what the message is, this is in bucket 3?

@sffc
Copy link
Member

sffc commented Jul 22, 2025

1 A developer writes a message that uses datetime formatting with a Month+Day+Time field set. To format that message, ICU4X doesn't need to ship any data for era names and weekday names in any locale. This can happen at compile time, bucket (1).

Does that mean that if the message specified Month+Day+Time+Weekday, it would move into bucket 2?

I'm putting options into buckets, not messages. The field set is always in bucket 1.

2 When creating a MessageFormatter, the developer loads the translation for the user's locale. The locale also specifies the preferred calendar system and time zone. This happens at formatter creation time, bucket (2).

Does that mean that if the message specifies Month+Day+Time+Timezone, and the developer hasn't loaded "the translation for the user's locale"*, that the message formatting will fail? Or fallback to without the timezone?

If the developer hasn't loaded the translation of the message into the user's locale, then they won't be able to instantiate their MessageFormatter.

3 When using the MessageFormatter, the user provides the specific date, time, and time zone. This happens at format time, bucket (3).

Does "user" mean end-user (eg I enter in a date in a spreadsheet cell), or rather the caller (the code calling message format)? In either case, I'm confused.

What I meant was that the specific date, time, and time zone are provided at runtime, likely by an end user. We don't know what they are either at build time (bucket 1) or formatter creation time (bucket 2); we only know them at format time (bucket 3).

Suppose the code makes the following (pseudocode) call.

userDate = 2025-07-21 userTime = 14:35:50 userZone = America/Los_Angeles

formatMessage(message, userDate, userTime, userZone)

Does that mean that no matter what the message is, this is in bucket 3?

userDate, userTime, and userZone are provided in a call to formatMessage, meaning that those three options are ones that I would consider to be bucket 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Issue pertains to the default function set normative Issue affects normative text in the specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants