Skip to content

Commit 654ac0f

Browse files
CLDR-18745 README_llm_cldr_validator.md (#4904)
1 parent f14a542 commit 654ac0f

File tree

1 file changed

+161
-0
lines changed

1 file changed

+161
-0
lines changed
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# LLM CLDR Data Validator
2+
3+
![Language](https://img.shields.io/badge/Language-Python-blue.svg)
4+
![License](https://img.shields.io/badge/License-MIT-green.svg)
5+
6+
This Python script is an automated tool to evaluate the accuracy of Large Language Models (LLMs) concerning locale-specific unit preferences. It compares the LLM's knowledge against the official Unicode Common Locale Data Repository (CLDR) to verify data conformance.
7+
8+
---
9+
10+
## ✨ Features
11+
12+
* **Interactive Validation**: Accepts natural language questions from the command line.
13+
* **Live LLM Queries**: Uses the OpenAI API (`gpt-4o-mini`) to generate structured JSON data based on the prompt.
14+
* **Ground Truth Comparison**: Validates the LLM's output against a local `unitPreferenceData.json` file from the CLDR.
15+
* **Detailed Reporting**: Produces a final JSON report showing the LLM's response, the official CLDR data, and a unit-by-unit comparison with a "Match" or "Mismatch" status.
16+
* **Fallback Logic**: Correctly handles lookups for regions not explicitly listed in the CLDR data by using the world default (`001`).
17+
18+
---
19+
20+
## 🛠️ Prerequisites
21+
22+
* Python 3.8+
23+
* OpenAI Python library:
24+
25+
```bash
26+
pip install openai
27+
```
28+
29+
---
30+
31+
## ⚙️ Setup & Configuration
32+
33+
1. **Place Files**: Ensure the following two files are in the same project directory:
34+
35+
* `llm_cldr_validator.py`
36+
* `unitPreferenceData.json` (The CLDR data file)
37+
38+
2. **Add API Key**: Open `llm_cldr_validator.py` in a text editor. Find the following line and replace the placeholder with your actual OpenAI API key.
39+
40+
```python
41+
# In the generate_data_with_llm function:
42+
client = OpenAI(api_key="YOUR API KEY")
43+
```
44+
45+
> \[!WARNING]
46+
> **Security Alert**: Never commit files with hardcoded API keys to public repositories like GitHub. For production applications, always use environment variables or a dedicated secret manager.
47+
48+
---
49+
50+
## 🚀 How to Run
51+
52+
1. Open your terminal or command prompt.
53+
2. Navigate to the directory containing your files.
54+
55+
```bash
56+
cd path/to/your/project_folder
57+
```
58+
3. Execute the script:
59+
60+
```bash
61+
python llm_cldr_validator.py
62+
```
63+
4. The script will prompt you to enter a question. Type your question and press Enter.
64+
65+
---
66+
67+
## 📝 Examples
68+
69+
### Example 1: Successful Match
70+
71+
This example shows a straightforward case where the LLM's output perfectly matches the CLDR standard.
72+
73+
**Prompt:**
74+
75+
```bash
76+
Enter your question about local data: What is the unit for weather temperature in the United States?
77+
```
78+
79+
**Final Validation Report:**
80+
81+
```json
82+
{
83+
"ValidationInput": {
84+
"Prompt": "What is the unit for weather temperature in the United States?",
85+
"LLM_Entity": "United States",
86+
"LLM_CountryCode": "US",
87+
"CLDR_Lookup": "Category: 'temperature', Usage: 'weather', Region: 'US'"
88+
},
89+
"LLM_Units_Found": [
90+
"fahrenheit"
91+
],
92+
"CLDR_Units_Found": [
93+
"fahrenheit"
94+
],
95+
"Comparison": [
96+
{
97+
"Unit_1": {
98+
"LLM_Unit": "fahrenheit",
99+
"CLDR_Unit": "fahrenheit",
100+
"Status": "Match"
101+
}
102+
}
103+
]
104+
}
105+
```
106+
107+
### Example 2: In-Depth Mismatch Analysis
108+
109+
This example showcases the validator's ability to handle complex prompts and identify nuanced differences between an LLM's conversational output and the strict CLDR standard.
110+
111+
**Prompt:**
112+
113+
```bash
114+
Enter your question about local data: In a United Kingdom, english speaker context, what units are used for measuring human height?
115+
```
116+
117+
**Final Validation Report:**
118+
119+
```json
120+
{
121+
"ValidationInput": {
122+
"Prompt": "In a United Kingdom, english speaker context, what units are used for measuring human height?",
123+
"LLM_Entity": "United Kingdom",
124+
"LLM_CountryCode": "GB",
125+
"CLDR_Lookup": "Category: 'length', Usage: 'person-height', Region: 'GB'"
126+
},
127+
"LLM_Units_Found": [
128+
"feet-and-inches",
129+
"centimeters"
130+
],
131+
"CLDR_Units_Found": [
132+
"foot-and-inch",
133+
"inch"
134+
],
135+
"Comparison": [
136+
{
137+
"Unit_1": {
138+
"LLM_Unit": "feet-and-inches",
139+
"CLDR_Unit": "foot-and-inch",
140+
"Status": "Mismatch"
141+
}
142+
},
143+
{
144+
"Unit_2": {
145+
"LLM_Unit": "centimeters",
146+
"CLDR_Unit": "inch",
147+
"Status": "Mismatch"
148+
}
149+
}
150+
]
151+
}
152+
```
153+
154+
#### Analysis of the Mismatch
155+
156+
This result is a success for the validator, as it highlights key differences:
157+
158+
1. **Subtle Wording**: The LLM used the grammatically natural "feet-and-inches" (plural), while the CLDR standard specifies the canonical unit name "foot-and-inch" (singular).
159+
2. **Preference Order**: The LLM suggested "centimeters" as a logical secondary unit (common in medical settings). However, the official CLDR preference for the UK lists "inch" as the next preferred unit after "foot-and-inch".
160+
161+
This demonstrates the tool's value in catching discrepancies between an LLM's generalized knowledge and a formal data standard.

0 commit comments

Comments
 (0)