Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/ldml/tr35.md
Original file line number Diff line number Diff line change
Expand Up @@ -511,12 +511,12 @@ A _Unicode locale identifier_ is composed of a Unicode language identifier plus
| ----------------------------------------------------------------------------------------------------- | ----------------------------------------------- | ------------------- |
| <a name="unicode_locale_id" href="#unicode_locale_id">`unicode_locale_id`</a> | `= unicode_language_id`<br/>  `extensions*`<br/>  `pu_extensions? ;` |
| <a name="extensions" href="#extensions">`extensions`</a> | `= unicode_locale_extensions`<br/>`\| transformed_extensions`<br/>` \| other_extensions ;` |
| <a name="unicode_locale_extensions" href="#unicode_locale_extensions">`unicode_locale_extensions`</a> | `= sep [uU]`<br/>  `((sep keyword)+`<br/>  `\|(sep attribute)+ (sep ufield)*) ;` |
| <a name="unicode_locale_extensions" href="#unicode_locale_extensions">`unicode_locale_extensions`</a> | `= sep [uU]`<br/>  `((sep ufield)+`<br/>  `\|(sep attribute)+ (sep ufield)*) ;` |
| <a name="transformed_extensions" href="#transformed_extensions">`transformed_extensions`</a> | `= sep [tT]`<br/>  `((sep tlang (sep tfield)*)`<br/>  `\| (sep tfield)+) ;` |
| <a name="pu_extensions" href="#pu_extensions">`pu_extensions`</a> | `= sep [xX]`<br/>` (sep alphanum{1,8})+ ;` |
| <a name="other_extensions" href="#other_extensions">`other_extensions`</a> | `= sep [alphanum-[tTuUxX]]`<br/>` (sep alphanum{2,8})+ ;` |
| <a name="ufield" href="#ufield">`ufield`</a><br/>(Also known as `keyword`) | `= ukey (sep uvalue)? ;` |
| <a name="ukey" href="#ukey">`ukey`</a><br/>(Also known as `key`) | `= alphanum alpha ;`<br/>(Note that this is narrower than in [[RFC6067](https://www.ietf.org/rfc/rfc6067.txt)], so that it is disjoint with tkey.) | [`validity`](#Key_Type_Definitions)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-47/common/bcp47) |
| <a name="ukey" href="#ukey">`ukey`</a><br/>(Also known as `key`) | `= alphanum alpha ;` | [`validity`](#Key_Type_Definitions)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-47/common/bcp47) <br/>(Note that this is narrower than in [[RFC6067](https://www.ietf.org/rfc/rfc6067.txt)], so that it is disjoint with `tkey`.) |
| <a name="uvalue" href="#uvalue">`uvalue`</a><br/>(Also known as `type`) | `= alphanum{3,8}`<br/>` (sep alphanum{3,8})* ;` | [`validity`](#Key_Type_Definitions)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-47/common/bcp47) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ref to maint-47 probably should be updated to just the latest version, but I will file a separate ticket for that (for CLDR 49 or later).

| `attribute` | `= alphanum{3,8} ;` |
| <a name="unicode_subdivision_id" href="#unicode_subdivision_id">`unicode_subdivision_id`</a> | `= `[`unicode_region_subtag`](#unicode_region_subtag)` unicode_subdivision_suffix ;` | [`validity`](#unicode_subdivision_subtag_validity)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-47/common/validity/subdivision.xml) |
Expand Down Expand Up @@ -575,8 +575,8 @@ A [`unicode_locale_id`](#unicode_locale_id) has _canonical syntax_ when:
* Any variants are in alphabetical order (eg, en-fonipa-scouse, not en-scouse-fonipa)
* Any extensions are in alphabetical order by their singleton (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)
* All attributes are sorted in alphabetical order.
* All keywords and tfields are sorted by alphabetical order of their keys, within their respective extensions.
* Any type or tfield value "true" is removed.
* All `ufield`s and `tfield`s are sorted by alphabetical order of their keys, within their respective extensions.
* Any `ufield` or `tfield` value "true" is removed.

For example, the canonical form of "en-u-foo-bar-nu-thai-ca-buddhist-kk-true" is "en-u-bar-foo-ca-buddhist-kk-nu-thai". The attributes `"foo"` and `"bar"` in this example are provided only for illustration; no attribute subtags are defined by the current CLDR specification.

Expand Down Expand Up @@ -943,7 +943,7 @@ These subtags are all in lowercase (that is the canonical casing for these subta

**The -u- Extension.** The syntax of 'u' extension subtags is defined by the rule `unicode_locale_extensions` in [Unicode locale identifier](#Unicode_locale_identifier), except the separator of subtags `sep` must be always hyphen '-' when the extension is used as a part of BCP 47 language tag.

A 'u' extension may contain multiple `attribute` s or `keyword` s as defined in [Unicode locale identifier](#Unicode_locale_identifier). The canonical syntax is defined as in [Canonical Unicode Locale Identifiers](#Canonical_Unicode_Locale_Identifiers).
A 'u' extension may contain multiple `attribute`s or `ufield`s as defined in [Unicode locale identifier](#Unicode_locale_identifier). The canonical syntax is defined as in [Canonical Unicode Locale Identifiers](#Canonical_Unicode_Locale_Identifiers).

_See also [Unicode Extensions for BCP 47](https://cldr.unicode.org/index/bcp47-extension) on the CLDR site._

Expand Down Expand Up @@ -1017,8 +1017,8 @@ The BCP 47 form for keys and types is the canonical form, and recommended. Other
<td><code>standard</code></td>
<td>The default ordering for each language. For root it is based on the [<a href="#DUCET">DUCET</a>] (Default Unicode Collation Element Table): see <i><a href="tr35-collation.md#Root_Collation">Root Collation</a></i>. Each other locale is based on that, except for appropriate modifications to certain characters for that language.</td></tr>
<tr><td><code>search</code></td>
<td>A special collation type dedicated for string search—it is not used to determine the relative order of two strings, but only to determine whether they should be considered equivalent for the specified strength, using the string search matching rules appropriate for the language. Compared to the normal collator for the language, this may add or remove primary equivalences, may make additional characters ignorable or change secondary equivalences, and may modify contractions to allow matching within them, depending on the desired behavior. For example, in Czech, the distinction between ‘a’ and ‘á’ is secondary for normal collation, but primary for search; a search for ‘a’ should never match ‘á’ and vice versa. A search collator is normally used with strength set to PRIMARY or SECONDARY (should be SECONDARY if using “asymmetric” search as described in the [<a href="https://www.unicode.org/reports/tr41/#UTS10">UCA</a>] section Asymmetric Search). The search collator in root supplies matching rules that are appropriate for most languages (and which are different than the root collation behavior); language-specific search collators may be provided to override the matching rules for a given language as necessary.</td></tr>
<tr><td colspan="2"><p>Other keywords provide additional choices for certain locales; <i>they only have effect in certain locales.</i></p></td></tr>
<td>A special collation type dedicated for string search — it is not used to determine the relative order of two strings, but only to determine whether they should be considered equivalent for the specified strength, using the string search matching rules appropriate for the language. Compared to the normal collator for the language, this may add or remove primary equivalences, may make additional characters ignorable or change secondary equivalences, and may modify contractions to allow matching within them, depending on the desired behavior. For example, in Czech, the distinction between ‘a’ and ‘á’ is secondary for normal collation, but primary for search; a search for ‘a’ should never match ‘á’ and vice versa. A search collator is normally used with strength set to PRIMARY or SECONDARY (should be SECONDARY if using “asymmetric” search as described in the [<a href="https://www.unicode.org/reports/tr41/#UTS10">UCA</a>] section Asymmetric Search). The search collator in root supplies matching rules that are appropriate for most languages (and which are different than the root collation behavior); language-specific search collators may be provided to override the matching rules for a given language as necessary.</td></tr>
<tr><td colspan="2"><p>Other ufields provide additional choices for certain locales; <i>they only have effect in certain locales.</i></p></td></tr>
<tr><td colspan="2">…</td></tr>
<tr><td><code>phonetic</code></td>
<td>Requests a phonetic variant if available, where text is sorted based on pronunciation. It may interleave different scripts, if multiple scripts are in common use.</td></tr>
Expand Down
Loading