8364007: Add no-argument codePointCount method to CharSequence and String #26461

tats-u · 2025-07-24T14:50:07Z

Adds codePointCount() overloads to String, Character, (Abstract)StringBuilder, and StringBuffer to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks.

if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) {
    throw new Exception("exceeding length");
}

Is a CSR required to this change?

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8364007: Add no-argument codePointCount method to CharSequence and String (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461
$ git checkout pull/26461

Update a local copy of the PR:
$ git checkout pull/26461
$ git pull https://git.openjdk.org/jdk.git pull/26461/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26461

View PR using the GUI difftool:
$ git pr show -t 26461

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26461.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-07-24T14:51:07Z

👋 Welcome back tats-u! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-07-24T14:51:23Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-07-24T14:52:00Z

@tats-u The following labels will be automatically applied to this pull request:

compiler
core-libs
i18n

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-07-24T14:59:30Z

Webrevs

RogerRiggs · 2025-07-24T15:10:37Z

The recommended process for proposing new APIs is to put the proposal to the OpenJDK core-libs-dev mail alias.
Putting the effort into a PR before there is some agreement on the value is premature.
And yes, every change to the spec needs a CSR.

RogerRiggs · 2025-07-24T15:12:38Z

To keep the proposal focused on the APIs, please drop the changes to modules other than java.base.

liach

Also, do we need codePointCount on CharSequence?

liach · 2025-07-24T16:18:41Z

src/java.base/share/classes/java/lang/AbstractStringBuilder.java

+        int count = this.count;
+        byte[] value = this.value;
+        if (isLatin1(coder)) {
+            return value.length;


Suggested change

return value.length;

return count;

I see, I fixed the argument passed to StringUTF16.codePointCount too.

myankelev

Could you please add a bug number under @bug?

test/jdk/java/lang/StringBuilder/Supplementary.java

myankelev · 2025-07-24T22:07:38Z

src/java.base/share/classes/java/lang/Character.java

+        final int length = seq.length();
+        int n = length;
+        for (int i = 0; i < length; ) {
+            if (isHighSurrogate(seq.charAt(i++)) && i < length &&


Imo this is quite hard to read, especially with i++ inside of the if statement. What do you think about changing it to this?

for (int i = 1; i < length-1; i++) { if (isHighSurrogate(seq.charAt(i)) && isLowSurrogate(seq.charAt(i + 1))) { n--; i++; } }

edit: fixed a typo in my example

In the first place it yields an incorrect result for sequences whose first character is a supplementary character.

jshell> int len(CharSequence seq) { ...> final int length = seq.length(); ...> int n = length; ...> for (int i = 1; i < length-1; i++) { ...> if (isHighSurrogate(seq.charAt(i)) && ...> isLowSurrogate(seq.charAt(i + 1))) { ...> n--; ...> i++; ...> } ...> } ...> return n; ...> } | 次を作成しました: メソッド len(CharSequence)。しかし、 method isHighSurrogate(char), and method isLowSurrogate(char)が宣言されるまで、起動できません jshell> boolean isHighSurrogate(char ch) { ...> return 0xd800 <= ch && ch <= 0xdbff; ...> } | 次を作成しました: メソッド isHighSurrogate(char) jshell> boolean isLowSurrogate(char ch) { ...> return 0xdc00 <= ch && ch <= 0xdfff; ...> } | 次を作成しました: メソッド isLowSurrogate(char) jshell> len("𠮷"); $5 ==> 2 jshell> len("OK👍"); $6 ==> 3 jshell> len("👍👍"); $7 ==> 3

I will not change it alone unless the existing overload int codePointCount(CharSequence seq, int beginIndex, int endIndex) is also planned to be changed.

Co-authored-By: Chen Liang <[email protected]>

Co-authored-by: Mikhail Yankelevich <[email protected]>

tats-u · 2025-07-26T08:48:11Z

The recommended process for proposing new APIs is to put the proposal to the OpenJDK core-libs-dev mail alias.

I glanced over https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/thread.html and those for some past months, but I did not get how to send one.

According to https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149338.html and sub messages, the content in this PR seems to be transferred to the mailing list.

Also, do we need codePointCount on CharSequence?

I did not add it because it does not have an existing overload and has a simple (but not efficient) workaround (codePoints().count()), but it would be nice if it exists.

Could you please add a bug number under @bug?

Which doc comments shall I add it?

P.S. only classes for test (containing each test-running methods) including Supplementary?

And yes, every change to the spec needs a CSR.

I got it, but do you know how non-Authors like me create ones?

…length())`

tats-u · 2025-07-26T10:20:20Z

How and where can I add tests for default implementing methods in CharSequence?

jaikiran · 2025-07-26T14:21:39Z

Hello @tats-u,

The recommended process for proposing new APIs is to put the proposal to the OpenJDK core-libs-dev mail alias.

I glanced over https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/thread.html and those for some past months, but I did not get how to send one.

The OpenJDK contribution guide has the necessary details on how to contribute to the project. Specifically this section https://openjdk.org/guide/#socialize-your-change is of relevance. In order to send a mail to the core-libs-dev mailing list, please first subscribe to that mailing list https://mail.openjdk.org/mailman/listinfo/core-libs-dev and initiate a discussion explaining the need and motivation for this new API. After there's some agreement about this proposal, the implementation changes in this PR can be pursued further.

AlanBateman · 2025-07-26T14:27:28Z

The addition to CharSequence will require static analysis to check for conflicts with implementation. It will also likely impact the CharBuffer spec.

tats-u · 2025-07-27T10:23:03Z

please first subscribe to that mailing list https://mail.openjdk.org/mailman/listinfo/core-libs-dev

Does this mailing list system require us to subscribe the list to post a new mail to the list? I would like to leave it at least after this PR is merged because I would not like my mailbox to be messed up by emails not related to this change.

The addition to CharSequence will require static analysis to check for conflicts with implementation. It will also likely impact the CharBuffer spec.

The title of the JBS issue seems to be changed by you but it looks like the default method for CharSequence should be stripped for this time according to your concerns. No codePointCount methods have been added to CharSequence so it may be too early for us to add one to CharSequence. Do you think that you should replace CharSequence in the title with another class name?

AlanBateman · 2025-07-27T15:19:13Z

No codePointCount methods have been added to CharSequence so it may be too early for us to add one to CharSequence. Do you think that you should replace CharSequence in the title with another class name?

Can you clarify what you mean? Right now your PR is proposing to add a default method named codePointCount to CharSequence.

tats-u · 2025-07-27T23:05:25Z

Right now your PR is proposing to add a default method named codePointCount to CharSequence.

If it should be excluded for this time, I will push an additional commit to remove it from the content in this PR.

AlanBateman · 2025-07-28T09:16:15Z

Right now your PR is proposing to add a default method named codePointCount to CharSequence.

If it should be excluded for this time, I will push an additional commit to remove it from the content in this PR.

I think we should mull over the addition of CharSequence::codePointCount. On the surface it looks like it fits but we can't rush it (CharSequence is widely implemented and additions to this interface have a history of disruption in the eco system).

What is the reason for proposing Character.codePointCount(CharSequence) aswell?

tats-u · 2025-07-28T12:26:39Z

I think we should mull over the addition of CharSequence::codePointCount. On the surface it looks like it fits but we can't rush it (CharSequence is widely implemented and additions to this interface have a history of disruption in the eco system).

We might as well defer it until another JBS issue if it is too difficult to decide whether it should be included in this PR.

What is the reason for proposing Character.codePointCount(CharSequence) aswell?

It already has an overload with the start and end indices unlike CharSequence like String and AbstractStringBuilder
Less harmful than CharSequence::codePointCount because it is just a static method.
There are already the (CharSequence, int, int) and (char[], int, int) overloads and the (char[], int, int) overload is used for the test for String::codePointCount(int, int). We should add the (char[]) overload for test and also add the (CharSequence) for consistency.

naotoj · 2025-07-28T16:08:32Z

The addition to CharSequence will require static analysis to check for conflicts with implementation. It will also likely impact the CharBuffer spec.

Looking at the original JSR 204 issue: https://bugs.openjdk.org/browse/JDK-4985217, it is interesting that the problem description included CharSequence but not in the proposed API. Tried to find the reason behind, but could not find any relevant information so far.
As to the general comment, I am not so sure adding the no-arg overrides, as they would simply be convenience methods to codePointCount(0, length()) which to me adding not a significant benefit. My $0.02

tats-u · 2025-08-03T13:35:32Z

Its author may have prioritized the versatility of the APIs.

codePointCount(0, length())

This workaround is only effective if the instance expression is sufficiently short or can afford to be stored to a new temporary variable once. It can be a pain in the neck that you have to write the expression even twice to get the number of code points in the entire string instance.

P.S. I subscribed the mailing list (but changed the settings not to receive any emails)

vicente-romero-oracle · 2025-08-11T21:06:58Z

/label remove compiler

openjdk · 2025-08-11T21:09:13Z

@vicente-romero-oracle
The compiler label was successfully removed.

8364007: Add overload without arguments to codePointCount in String etc.

18a4945

openjdk bot added core-libs [email protected] compiler [email protected] i18n [email protected] labels Jul 24, 2025

Remove trailing spaces

6f2e1d2

openjdk bot added the rfr Pull request is ready for review label Jul 24, 2025

liach reviewed Jul 24, 2025

View reviewed changes

myankelev reviewed Jul 24, 2025

View reviewed changes

test/jdk/java/lang/StringBuilder/Supplementary.java Show resolved Hide resolved

myankelev reviewed Jul 24, 2025

View reviewed changes

tats-u and others added 4 commits July 26, 2025 15:59

Fix test

12c12cb

Fix how to get code point count in StringBuilder

444163b

Co-authored-By: Chen Liang <[email protected]>

Fix copyright year

af3fe8b

Co-authored-by: Mikhail Yankelevich <[email protected]>

Discard changes out of than java.base

63eb4a7

tats-u added 4 commits July 26, 2025 18:03

Discard changes on code whose form is not `str.codePointCount(0, str.…

8d4840b

…length())`

Update @bug entries in test class doc comments

1e7da59

Add default implementation on codePointCount in CharSequence

4811c9d

Update @bug in correct file

0e55e35

tats-u changed the title ~~8364007: Add overload without arguments to codePointCount in String etc.~~ 8364007: Add no-argument codePointCount method to CharSequence and String Jul 27, 2025

openjdk bot removed the compiler [email protected] label Aug 11, 2025

8364007: Add no-argument codePointCount method to CharSequence and String #26461

Are you sure you want to change the base?

8364007: Add no-argument codePointCount method to CharSequence and String #26461

Conversation

tats-u commented Jul 24, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Jul 24, 2025

Uh oh!

openjdk bot commented Jul 24, 2025

Uh oh!

openjdk bot commented Jul 24, 2025

Uh oh!

mlbridge bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

RogerRiggs commented Jul 24, 2025

Uh oh!

RogerRiggs commented Jul 24, 2025

Uh oh!

liach left a comment

Choose a reason for hiding this comment

Uh oh!

liach Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

tats-u Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

myankelev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

myankelev Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tats-u Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

tats-u commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tats-u commented Jul 26, 2025

Uh oh!

jaikiran commented Jul 26, 2025

Uh oh!

AlanBateman commented Jul 26, 2025

Uh oh!

tats-u commented Jul 27, 2025

Uh oh!

AlanBateman commented Jul 27, 2025

Uh oh!

tats-u commented Jul 27, 2025

Uh oh!

AlanBateman commented Jul 28, 2025

Uh oh!

tats-u commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naotoj commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tats-u commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vicente-romero-oracle commented Aug 11, 2025

Uh oh!

openjdk bot commented Aug 11, 2025

Uh oh!

Uh oh!

tats-u commented Jul 24, 2025 •

edited by openjdk bot

Loading

mlbridge bot commented Jul 24, 2025 •

edited

Loading

myankelev Jul 24, 2025 •

edited

Loading

tats-u commented Jul 26, 2025 •

edited

Loading

tats-u commented Jul 28, 2025 •

edited

Loading

naotoj commented Jul 28, 2025 •

edited

Loading

tats-u commented Aug 3, 2025 •

edited

Loading