-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8364007: Add no-argument codePointCount method to CharSequence and String #26461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back tats-u! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
Webrevs
|
The recommended process for proposing new APIs is to put the proposal to the OpenJDK core-libs-dev mail alias. |
To keep the proposal focused on the APIs, please drop the changes to modules other than java.base. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we need codePointCount
on CharSequence
?
int count = this.count; | ||
byte[] value = this.value; | ||
if (isLatin1(coder)) { | ||
return value.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return value.length; | |
return count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I fixed the argument passed to StringUTF16.codePointCount
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a bug number under @bug
?
final int length = seq.length(); | ||
int n = length; | ||
for (int i = 0; i < length; ) { | ||
if (isHighSurrogate(seq.charAt(i++)) && i < length && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo this is quite hard to read, especially with i++
inside of the if statement. What do you think about changing it to this?
for (int i = 1; i < length-1; i++) {
if (isHighSurrogate(seq.charAt(i)) &&
isLowSurrogate(seq.charAt(i + 1))) {
n--;
i++;
}
}
edit: fixed a typo in my example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first place it yields an incorrect result for sequences whose first character is a supplementary character.
jshell> int len(CharSequence seq) {
...> final int length = seq.length();
...> int n = length;
...> for (int i = 1; i < length-1; i++) {
...> if (isHighSurrogate(seq.charAt(i)) &&
...> isLowSurrogate(seq.charAt(i + 1))) {
...> n--;
...> i++;
...> }
...> }
...> return n;
...> }
| 次を作成しました: メソッド len(CharSequence)。しかし、 method isHighSurrogate(char), and method isLowSurrogate(char)が宣言されるまで、起動できません
jshell> boolean isHighSurrogate(char ch) {
...> return 0xd800 <= ch && ch <= 0xdbff;
...> }
| 次を作成しました: メソッド isHighSurrogate(char)
jshell> boolean isLowSurrogate(char ch) {
...> return 0xdc00 <= ch && ch <= 0xdfff;
...> }
| 次を作成しました: メソッド isLowSurrogate(char)
jshell> len("𠮷");
$5 ==> 2
jshell> len("OK👍");
$6 ==> 3
jshell> len("👍👍");
$7 ==> 3
I will not change it alone unless the existing overload int codePointCount(CharSequence seq, int beginIndex, int endIndex)
is also planned to be changed.
Co-authored-By: Chen Liang <[email protected]>
Co-authored-by: Mikhail Yankelevich <[email protected]>
I glanced over https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/thread.html and those for some past months, but I did not get how to send one. According to https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149338.html and sub messages, the content in this PR seems to be transferred to the mailing list.
I did not add it because it does not have an existing overload and has a simple (but not efficient) workaround (
Which doc comments shall I add it? P.S. only classes for test (containing each test-running methods) including
I got it, but do you know how non-Authors like me create ones? |
How and where can I add tests for default implementing methods in |
Hello @tats-u,
The OpenJDK contribution guide has the necessary details on how to contribute to the project. Specifically this section https://openjdk.org/guide/#socialize-your-change is of relevance. In order to send a mail to the core-libs-dev mailing list, please first subscribe to that mailing list https://mail.openjdk.org/mailman/listinfo/core-libs-dev and initiate a discussion explaining the need and motivation for this new API. After there's some agreement about this proposal, the implementation changes in this PR can be pursued further. |
The addition to CharSequence will require static analysis to check for conflicts with implementation. It will also likely impact the CharBuffer spec. |
Does this mailing list system require us to subscribe the list to post a new mail to the list? I would like to leave it at least after this PR is merged because I would not like my mailbox to be messed up by emails not related to this change.
The title of the JBS issue seems to be changed by you but it looks like the default method for |
Can you clarify what you mean? Right now your PR is proposing to add a default method named codePointCount to CharSequence. |
If it should be excluded for this time, I will push an additional commit to remove it from the content in this PR. |
I think we should mull over the addition of CharSequence::codePointCount. On the surface it looks like it fits but we can't rush it (CharSequence is widely implemented and additions to this interface have a history of disruption in the eco system). What is the reason for proposing Character.codePointCount(CharSequence) aswell? |
We might as well defer it until another JBS issue if it is too difficult to decide whether it should be included in this PR.
|
Looking at the original JSR 204 issue: https://bugs.openjdk.org/browse/JDK-4985217, it is interesting that the problem description included |
Its author may have prioritized the versatility of the APIs.
This workaround is only effective if the instance expression is sufficiently short or can afford to be stored to a new temporary variable once. It can be a pain in the neck that you have to write the expression even twice to get the number of code points in the entire string instance. P.S. I subscribed the mailing list (but changed the settings not to receive any emails) |
/label remove compiler |
@vicente-romero-oracle |
Adds
codePointCount()
overloads toString
,Character
,(Abstract)StringBuilder
, andStringBuffer
to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks.Is a CSR required to this change?
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461
$ git checkout pull/26461
Update a local copy of the PR:
$ git checkout pull/26461
$ git pull https://git.openjdk.org/jdk.git pull/26461/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26461
View PR using the GUI difftool:
$ git pr show -t 26461
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26461.diff
Using Webrev
Link to Webrev Comment