You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve Chinese character support with robust Unicode property union
Replace hard-coded Unicode range with comprehensive Unicode property approach
to fix incomplete Han character coverage in MCP tool name formatting.
Changes:
- Replace \u4e00-\u9fa5 range with union of Unicode Script and Block properties
- Use \p{IsHan} + \p{InCJK_Unified_Ideographs} + \p{InCJK_Compatibility_Ideographs}
- Fix boundary case where \u9fff was incorrectly excluded by script-only approach
- Add comprehensive test coverage for all Han character blocks and edge cases
Technical details:
- Addresses Unicode Script vs Block classification differences across JDK versions
- \u9fff (鿿) is in CJK Unified Ideographs block but not Han script in some JDKs
- Union approach ensures complete coverage while maintaining exclusion of other scripts
- Future-proof solution that automatically includes new Han characters in Unicode updates
Test coverage added:
- CJK Unified Ideographs boundary cases (\u4e00, \u9fff)
- CJK Extension A characters (\u3400)
- CJK Compatibility Ideographs (\uf900)
- Mixed character block scenarios
- Proper exclusion verification for non-Han scripts (Hiragana, Emoji, etc.)
Fixes incomplete Chinese character support while maintaining backward compatibility
and minimal risk profile of the original change.
Signed-off-by: shishuiwuhen2009
Signed-off-by: Mark Pollack <[email protected]>
Auto-cherry-pick to 1.0.x
Fixes#4192
0 commit comments