Skip to content

Conversation

zbynek
Copy link
Collaborator

@zbynek zbynek commented Apr 6, 2025

Fixes #9705
Fixes #1989 (not 100% of methods are covered, but some are just impossible without bundling unicode database into compiled code)

Implements several APIs using https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape that should be well supported in all browsers.

The implementation is mostly similar to https://groups.google.com/g/google-web-toolkit-contributors/c/73-aScAShs4/m/gZAhlUXiBAAJ , but the fallbacks for old Edge versions are not included.

@zbynek zbynek marked this pull request as draft April 6, 2025 22:14
@zbynek zbynek requested a review from Copilot April 6, 2025 22:25
Copilot

This comment was marked as resolved.

@zbynek zbynek force-pushed the character-emul branch 6 times, most recently from 051a0d0 to 1ab960c Compare April 13, 2025 06:33
@zbynek zbynek force-pushed the character-emul branch 2 times, most recently from d3a8f64 to 1ad6bc2 Compare April 16, 2025 00:20
@zbynek zbynek marked this pull request as ready for review April 26, 2025 11:51
Comment on lines 252 to 253
// Known differences between Java 17 and Chrome 135
// 11f50 .. 11f59, 16ac0 .. 16ac9, 1e4f0 .. 1e4f9, 1fbf0 .. 1fbf9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave this method out if it is still wrong? Which side do we consider to be "wrong" here, Java or Chrome? Would it make sense to have a test (possibly ignored) that shows this, so it is easier to reevaluate later?

Also, consider using hex in digit() above, so that it is the same convention as here, or change this to be decimal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of this comment I should just link to the compatibility table https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Character.html#conformance
and mention that JRE behaviour on any Java < 24 won't match recent browser releases.

/**
* Tests for java.lang.Character.
*/
@DoNotRunWith(Platform.HtmlUnitBug)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the specific failing tests be annotated with this, each with a comment about why it fails, rather than skip existing tests that pass?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I's pretty much all the tests that call any Character.is* method. They fail because the regexp doesn't even parse in HtmlUnit 2-4. If we want any meaningful test coverage we need #10115 + a custom build of HtmlUnit that includes mozilla/rhino#1848

}

// If String.toUpperCase produces more than 1 codepoint, Character.toUpperCase should
// act either as identity or title-case conversion (not supported in GWT).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a failing test and mark it ignored so we can see about restoring it when it works?

@zbynek zbynek marked this pull request as draft May 21, 2025 17:55
@zbynek
Copy link
Collaborator Author

zbynek commented May 21, 2025

Marking as draft until HtmlUnit can be updated to 4.x.

@niloc132
Copy link
Member

niloc132 commented Sep 2, 2025

Marking as draft until HtmlUnit can be updated to 4.x.

How about just a @DoNotRunWith that we can reevaluate later, and confirm that it runs in manual mode?

@zbynek
Copy link
Collaborator Author

zbynek commented Sep 2, 2025

How about just a @DoNotRunWith that we can reevaluate later, and confirm that it runs in manual mode?

Works for me, but it also means that any GwtTestcase out there using HtmlUnit as a runner that happens to call Character.isDigit will need to be ignored or switched to Selenium. Are we OK with that?

@niloc132
Copy link
Member

niloc132 commented Sep 4, 2025

I get it, thanks, that is unreasonable to force at this time. Let's sit on this then as you suggested.

It will be a little tricky to do right - forcing new htmlunit also means dropping legacy dev mode tests or fixing the impls you have in your patch.

If we can't fix the new impl to drop legacy dev mode, we have to:

  • drop legacy dev mode once and for all (maybe its time?)
  • support old htmlunit (keep both old and new)
  • avoid updating htmlunit for a while longer (and thus avoid this patch until we stop avoiding it)

I really want to keep the "mavenize the build" release as light on features as possible so that projects can reliably upgrade just by changing dependencies and not expect to encounter any other changes along the way - my initial hope had been for 2.13 to ship soon with all of the code changes, followed a month or so later by 2.14. Then, 2.15 might do various other things like upgrade all kinds of dependencies, continue dropping old deprecated features, etc - teams that successfully move to 2.14 then might have issues with that next step, but at least won't be confusing those changes with the modularization efforts.

To that end, breaking "gwt-runstyle-htmlunit2" out of the main "this artifact contains the compiler" jar will make it at least possible for projects to supply an alternate implementation (of their own or another ecosystem/official build). Then, the existence of legacy dev mode is its own problem to solve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Character.toString(...) results in ERROR (compile) Various is*(char) methods on Character only support USASCII, codePoint versions are missing

2 participants