Improve Character emulation #10113

zbynek · 2025-04-06T22:14:41Z

Fixes #9705
Fixes #1989 (not 100% of methods are covered, but some are just impossible without bundling unicode database into compiled code)

Implements several APIs using https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape that should be well supported in all browsers.

The implementation is mostly similar to https://groups.google.com/g/google-web-toolkit-contributors/c/73-aScAShs4/m/gZAhlUXiBAAJ , but the fallbacks for old Edge versions are not included.

user/super/com/google/gwt/emul/java/lang/Character.java

niloc132 · 2025-04-30T16:33:02Z

user/super/com/google/gwt/emul/java/lang/Character.java

+  // Known differences between Java 17 and Chrome 135
+  // 11f50 .. 11f59, 16ac0 .. 16ac9, 1e4f0 .. 1e4f9, 1fbf0 .. 1fbf9


Should we leave this method out if it is still wrong? Which side do we consider to be "wrong" here, Java or Chrome? Would it make sense to have a test (possibly ignored) that shows this, so it is easier to reevaluate later?

Also, consider using hex in digit() above, so that it is the same convention as here, or change this to be decimal?

Maybe instead of this comment I should just link to the compatibility table https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Character.html#conformance
and mention that JRE behaviour on any Java < 24 won't match recent browser releases.

niloc132 · 2025-04-30T16:37:21Z

user/test/com/google/gwt/emultest/java/lang/CharacterTest.java

 /**
 * Tests for java.lang.Character.
 */
+@DoNotRunWith(Platform.HtmlUnitBug)


Can the specific failing tests be annotated with this, each with a comment about why it fails, rather than skip existing tests that pass?

I's pretty much all the tests that call any Character.is* method. They fail because the regexp doesn't even parse in HtmlUnit 2-4. If we want any meaningful test coverage we need #10115 + a custom build of HtmlUnit that includes mozilla/rhino#1848

niloc132 · 2025-04-30T16:37:48Z

user/super/com/google/gwt/emul/java/lang/CaseMapper.java

+  }
+
+  // If String.toUpperCase produces more than 1 codepoint, Character.toUpperCase should
+  // act either as identity or title-case conversion (not supported in GWT).


Maybe add a failing test and mark it ignored so we can see about restoring it when it works?

Co-authored-by: Colin Alworth <[email protected]>

zbynek · 2025-05-21T17:57:44Z

Marking as draft until HtmlUnit can be updated to 4.x.

niloc132 · 2025-09-02T18:45:11Z

Marking as draft until HtmlUnit can be updated to 4.x.

How about just a @DoNotRunWith that we can reevaluate later, and confirm that it runs in manual mode?

zbynek · 2025-09-02T22:43:30Z

How about just a @DoNotRunWith that we can reevaluate later, and confirm that it runs in manual mode?

Works for me, but it also means that any GwtTestcase out there using HtmlUnit as a runner that happens to call Character.isDigit will need to be ignored or switched to Selenium. Are we OK with that?

niloc132 · 2025-09-04T01:31:17Z

I get it, thanks, that is unreasonable to force at this time. Let's sit on this then as you suggested.

It will be a little tricky to do right - forcing new htmlunit also means dropping legacy dev mode tests or fixing the impls you have in your patch.

If we can't fix the new impl to drop legacy dev mode, we have to:

drop legacy dev mode once and for all (maybe its time?)
support old htmlunit (keep both old and new)
avoid updating htmlunit for a while longer (and thus avoid this patch until we stop avoiding it)

I really want to keep the "mavenize the build" release as light on features as possible so that projects can reliably upgrade just by changing dependencies and not expect to encounter any other changes along the way - my initial hope had been for 2.13 to ship soon with all of the code changes, followed a month or so later by 2.14. Then, 2.15 might do various other things like upgrade all kinds of dependencies, continue dropping old deprecated features, etc - teams that successfully move to 2.14 then might have issues with that next step, but at least won't be confusing those changes with the modularization efforts.

To that end, breaking "gwt-runstyle-htmlunit2" out of the main "this artifact contains the compiler" jar will make it at least possible for projects to supply an alternate implementation (of their own or another ecosystem/official build). Then, the existence of legacy dev mode is its own problem to solve.

zbynek marked this pull request as draft April 6, 2025 22:14

zbynek requested a review from Copilot April 6, 2025 22:25

This comment was marked as resolved.

Sign in to view

zbynek force-pushed the character-emul branch 6 times, most recently from 051a0d0 to 1ab960c Compare April 13, 2025 06:33

zbynek mentioned this pull request Apr 13, 2025

Update HtmlUnit to 4.11.1, update necessary dependencies #10115

Open

zbynek force-pushed the character-emul branch 2 times, most recently from d3a8f64 to 1ad6bc2 Compare April 16, 2025 00:20

Improve Character emulation

2f5f97a

zbynek force-pushed the character-emul branch from 1ad6bc2 to 2f5f97a Compare April 26, 2025 11:51

zbynek marked this pull request as ready for review April 26, 2025 11:51

Fix compile error

d9cd717

zbynek added the Category-JRE label Apr 27, 2025

niloc132 reviewed Apr 30, 2025

View reviewed changes

zbynek and others added 2 commits April 30, 2025 20:20

Call isDigit directly

32180fd

Co-authored-by: Colin Alworth <[email protected]>

Use hex codes, add documentation

2f55a39

zbynek marked this pull request as draft May 21, 2025 17:55

niloc132 mentioned this pull request Nov 21, 2025

A simpler, smaller and faster implementation of String.replace #10193

Open

niloc132 added this to the 2.14 milestone Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Character emulation #10113

Improve Character emulation #10113

zbynek commented Apr 6, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

niloc132 Apr 30, 2025

Uh oh!

zbynek Apr 30, 2025

Uh oh!

niloc132 Apr 30, 2025

Uh oh!

zbynek Apr 30, 2025

Uh oh!

niloc132 Apr 30, 2025

Uh oh!

zbynek commented May 21, 2025

Uh oh!

niloc132 commented Sep 2, 2025

Uh oh!

zbynek commented Sep 2, 2025

Uh oh!

niloc132 commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Known differences between Java 17 and Chrome 135
		// 11f50 .. 11f59, 16ac0 .. 16ac9, 1e4f0 .. 1e4f9, 1fbf0 .. 1fbf9

Improve Character emulation #10113

Are you sure you want to change the base?

Improve Character emulation #10113

Conversation

zbynek commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

niloc132 Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

zbynek Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

niloc132 Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

zbynek Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

niloc132 Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

zbynek commented May 21, 2025

Uh oh!

niloc132 commented Sep 2, 2025

Uh oh!

zbynek commented Sep 2, 2025

Uh oh!

niloc132 commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zbynek commented Apr 6, 2025 •

edited

Loading