-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🐛 Bug: String case conversion functions produce inconsistent results across environments
Description
The toTitle(), toLower(), and toUpper() functions use icu::Locale::getDefault(), which reads system locale settings. This causes different outputs in different environments (dev vs staging vs production).
Environment
- ICU Version: [e.g., 70.1]
- OS: Linux
- Container: Docker/Kubernetes
- Locale Settings: Varies (
LANG=CvsLANG=en_US.UTF-8)
Steps to Reproduce
- Run the following code in an environment with
LANG=en_US.UTF-8:
std::string result = toTitle("a.b,c");
// Output: "A.b,C"- Run the same code in a container with LANG=en_US_POSIX:
std::string result = toTitle("a.b,c");
// Output: "A.B,C"Expected Behavior
The functions should produce consistent, deterministic output regardless of system locale settings, matching Spark SQL behavior:
toTitle("a.b,c") → "A.b,C" (always)
toLower("TITLE") → "title" (always)
toUpper("title") → "TITLE" (always)
Actual Behavior
Output varies based on environment variables (LC_ALL, LC_CTYPE, LANG):
✅ LANG=en_US.UTF-8: toTitle("a.b,c") → "A.b,C"
❌ LANG=en_US_POSIX: toTitle("a.b,c") → "A.B,C"
Root Cause
Using icu::Locale::getDefault() makes the behavior environment-dependent.
Proposed Solution
Replace icu::Locale::getDefault() with icu::Locale::getRoot() in:
toTitle()
toLower()
toUpper()
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working