Skip to content

Conversation

@pedberg-icu
Copy link
Contributor

@pedberg-icu pedberg-icu commented Sep 26, 2025

CLDR-11536

  • This PR completes the ticket.

  • Remove attempted quoting of literal text in intervalFormatFallback (literal text in that pattern should not be quoted; but in the cases here the attempt to quote was also using curly quotes)

ALLOW_MANY_COMMITS=true

Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the wrong approach. Will comment more

@macchiati
Copy link
Member

That is, I think all of the patterns in cldr that contain (0), ... are intended to be simple substitution formats, where all of the arguments are formatted before insertion in place of the placeholders. That means that the only ASCII ' needed are those to quote a literal { or ascii '

@pedberg-icu
Copy link
Contributor Author

@macchiati Responses:

That is, I think all of the patterns in cldr that contain (0), ... are intended to be simple substitution formats, where all of the arguments are formatted before insertion in place of the placeholders

That is not what the CLDR spec says (at least currently for dateTimeFormats), nor I think what is implemented in ICU (will check on that). For example with dateTimePatterns (e.g. "{1} 'at' {0}") it says: "Except for the substitution markers {0} and {1}, text in the dateTimeFormat is interpreted as part of a date/time pattern, and is subject to the same rules described in Date Format Patterns. This includes the need to enclose ASCII letters in single quotes if they are intended to represent literal text."

And the code substitutes the patterns (not formatted results) for {1} and {0} and then treats the entire thing as a date format, so that literal text including A-Za-z needs quoting, but nothing else.

Checking on the behavior with intervalFormats (intervalFormatFalback and greatestDifference)...

That means that the only ASCII ' needed are those to quote a literal { or ascii '

Not sure what you are requesting here. Compared to my text in the PR, are you requesting that I use add literal '{' as something that needs quoting? Are you requesting any changes to the data?

Anyway, I need to run some checks in ICU to see what the actual behavior is.

@macchiati
Copy link
Member

macchiati commented Sep 26, 2025

I agree that we should just make sure that the patterns in the xml files that you include work for ICU; which may involve using ' marks as you do — after verifying what ICU does. But I think we want to hold off on the spec changes until we resolve https://unicode-org.atlassian.net/browse/CLDR-10321 — because we don't want to change it and then later on perhaps change it back.

We need to determine which we want to do, quoting from Felipe's description:

  1. Use ''MessageFormat'' syntax to put in the patterns together, and then date/time syntax to format the date-time and get the end result, '''or'''
  2. Format the pieces with their respective patterns, and then use ''MessageFormat'' syntax to put the formatted strings together using the combination pattern.

It looks like we have the following patterns (as I recall, there is a test to verify that all {0} patterns are represented here), so I think the ideal case would be to make all of them work similarly with regard to { and '.

https://github.com/unicode-org/cldr/blob/main/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/Placeholders.txt

@pedberg-icu
Copy link
Contributor Author

pedberg-icu commented Sep 26, 2025

OK, well... Unfortunately I cannot do any more on this until Sept 30 afternoon or Oct 1.

And actually I am not sure about the quoting requirements for intervalFormatFallback, will need to actually test before changing the data.

@macchiati
Copy link
Member

NOTE: we need to resolve what ICU does with the patterns (see the discussions) before release (can be part of the ICU feedback on the CLDR beta).

I'm worried that there may be a logKnownError for this that disables an ICU test, thus masking what would show up as an error in ICU-client software.

@pedberg-icu pedberg-icu marked this pull request as draft October 1, 2025 16:24
@pedberg-icu pedberg-icu force-pushed the CLDR-11536-fix-literal-quoting-in-intervalFormatFallback branch from ec05864 to 2058b15 Compare October 1, 2025 21:38
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@pedberg-icu
Copy link
Contributor Author

@macchiati @markusicu OK, I ran an experiment in ICU (details will be added to the ticket for this) which made it clear that literal text in intervalFormatFallback should not have quoting of any kind (except what is intended as literal text); the pattern is treated as a Message Format. In contrast, ASCII-range literal text in the greatestDifference patter does need straight single quotes, since the entire pattern is treated as a date format. True in both ICU4C and ICU4J. So I have update this PR to just remove the erroneous quoting (which was using cury quotes anyway), and to remove any spec changes.

@pedberg-icu pedberg-icu marked this pull request as ready for review October 1, 2025 21:44
@pedberg-icu
Copy link
Contributor Author

@macchiati Mark, your "needs changes" (now addressed) is blocking merging of this even if approved by someone else...

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this was about syntactic quoting with ASCII apostrophes. Another look at the ticket tells me that I was mistaken.

The remaining changes here -- removing non-ASCII curly quotes which have no syntactic effect -- lgtm.

@markusicu
Copy link
Member

@macchiati Mark, your "needs changes" (now addressed) is blocking merging of this even if approved by someone else...

@pedberg-icu if Mark doesn't get around to approving, you can dismiss his negative review with a brief justifying comment.

@pedberg-icu pedberg-icu requested a review from macchiati October 1, 2025 23:48
@pedberg-icu pedberg-icu dismissed macchiati’s stale review October 1, 2025 23:49

The items have been addressed: spec changes reverted, and data fixed to work in ICU based on tests with ICU.

@pedberg-icu pedberg-icu merged commit dd1c861 into unicode-org:main Oct 1, 2025
11 checks passed
@pedberg-icu pedberg-icu deleted the CLDR-11536-fix-literal-quoting-in-intervalFormatFallback branch October 1, 2025 23:50
@pedberg-icu
Copy link
Contributor Author

@pedberg-icu if Mark doesn't get around to approving, you can dismiss his negative review with a brief justifying comment.

Thanks, did that.

@pedberg-icu pedberg-icu changed the title CLDR-11536 Fixed literal quoting in intervalFormatFallback, clarified quoting requirement in spec CLDR-11536 Fixed literal quoting in intervalFormatFallback Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants