Skip to content

String case conversion functions produce inconsistent results across environments #5

@yangzhg

Description

@yangzhg

🐛 Bug: String case conversion functions produce inconsistent results across environments

Description

The toTitle(), toLower(), and toUpper() functions use icu::Locale::getDefault(), which reads system locale settings. This causes different outputs in different environments (dev vs staging vs production).

Environment

  • ICU Version: [e.g., 70.1]
  • OS: Linux
  • Container: Docker/Kubernetes
  • Locale Settings: Varies (LANG=C vs LANG=en_US.UTF-8)

Steps to Reproduce

  1. Run the following code in an environment with LANG=en_US.UTF-8:
std::string result = toTitle("a.b,c");
// Output: "A.b,C"
  1. Run the same code in a container with LANG=en_US_POSIX:
std::string result = toTitle("a.b,c");
// Output: "A.B,C"

Expected Behavior

The functions should produce consistent, deterministic output regardless of system locale settings, matching Spark SQL behavior:

toTitle("a.b,c") → "A.b,C" (always)
toLower("TITLE") → "title" (always)
toUpper("title") → "TITLE" (always)

Actual Behavior

Output varies based on environment variables (LC_ALL, LC_CTYPE, LANG):

✅ LANG=en_US.UTF-8: toTitle("a.b,c") → "A.b,C"
❌ LANG=en_US_POSIX: toTitle("a.b,c") → "A.B,C"

Root Cause

Using icu::Locale::getDefault() makes the behavior environment-dependent.

Proposed Solution

Replace icu::Locale::getDefault() with icu::Locale::getRoot() in:

toTitle()
toLower()
toUpper()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions