-
Notifications
You must be signed in to change notification settings - Fork 176
Description
Query Information
PPL Command/Query:
source=test-rename-bug
| rename status as http_status
| dedup http_status
| fields http_status
Expected Result:
The query should return deduplicated values from the status field under the renamed alias http_status:
{
"schema": [{"name": "http_status", "type": "string"}],
"datarows": [["200"], ["500"], ["404"]],
"total": 3,
"size": 3
}Actual Result:
The query returns only null values:
{
"schema": [{"name": "http_status", "type": "string"}],
"datarows": [[null]],
"total": 1,
"size": 1
}Dataset Information
Dataset/Schema Type: Custom (simple test schema)
Index Mapping:
{
"mappings": {
"properties": {
"status": { "type": "keyword" },
"service": { "type": "keyword" },
"value": { "type": "integer" }
}
}
}Sample Data:
{"status":"200","service":"api","value":100}
{"status":"500","service":"web","value":200}
{"status":"200","service":"db","value":150}
{"status":"404","service":"api","value":50}
{"status":"500","service":"api","value":75}Bug Description
Issue Summary:
When using the rename command to alias a field, subsequent dedup operations on the renamed field fail and return null values instead of the actual deduplicated data. This affects all field types (keyword, text, numeric, nested), not just nested fields.
Steps to Reproduce:
- Create an index with any field (e.g.,
statusas keyword type) - Insert documents with duplicate values in that field
- Execute:
source=<index> | rename <field> as <alias> | dedup <alias> | fields <alias> - Observe that the result contains only null values
Comparison:
✅ Working (without rename):
source=test-rename-bug | dedup status | fields status
Returns: ["200"], ["500"], ["404"]
❌ Failing (with rename):
source=test-rename-bug | rename status as http_status | dedup http_status | fields http_status
Returns: [null]
✅ Working (rename without dedup):
source=test-rename-bug | rename status as http_status | fields http_status
Returns: All 5 documents with correct values
Impact:
This bug makes it impossible to use rename and dedup together in a query pipeline, which is a common use case for data transformation and analysis. Users must choose between renaming fields for readability or deduplicating data, but cannot do both.
Environment Information
OpenSearch Version: 3.4.0-SNAPSHOT
Additional Details:
- Tested with Calcite query engine enabled (default in 3.x)
- Issue reproduced on both OTEL schema and simple custom schemas
- Related to issue [BUG] Rename does not work with nested fields #2740 but affects ALL field types, not just nested fields
Root Cause Analysis
Execution Plan Analysis
Using the _explain endpoint reveals the issue:
Working Query Plan (without rename):
Physical: CalciteEnumerableIndexScan(
PushDownContext=[[PROJECT->[status], FILTER->IS NOT NULL($0)]]
)
Failing Query Plan (with rename):
Logical: LogicalProject(http_status=[$2])
Physical: CalciteEnumerableIndexScan(
PushDownContext=[[PROJECT->[status], FILTER->IS NOT NULL($0)]]
)
The physical plan correctly pushes down the original field name (status) to OpenSearch, but the logical plan references the renamed field name (http_status).
Code-Level Root Cause
Disclaimer: This is a preliminary analysis and requires further investigation.
The bug is in OpenSearchDedupPushdownRule.java:
// Line 57-58
final List<String> fieldNameList = projectWithWindow.getInput().getRowType().getFieldNames();
List<Integer> selectColumns = PlanUtils.getSelectColumns(windows.getFirst().partitionKeys);
String fieldName = fieldNameList.get(selectColumns.getFirst());
// Line 60
CalciteLogicalIndexScan newScan = scan.pushDownCollapse(finalOutput, fieldName);Problem: After a rename operation, projectWithWindow.getInput().getRowType().getFieldNames() returns the renamed field name (e.g., "http_status"), not the original field name (e.g., "status").
This renamed field name is then passed to pushDownCollapse() in CalciteLogicalIndexScan.java:
public CalciteLogicalIndexScan pushDownCollapse(Project finalOutput, String fieldName) {
ExprType fieldType = osIndex.getFieldTypes().get(fieldName);
if (fieldType == null) {
if (LOG.isDebugEnabled()) {
LOG.debug("Cannot pushdown the dedup '{}' due to it is not a index field", fieldName);
}
return null; // Fails silently
}
// ...
}Problem: osIndex.getFieldTypes() only contains the original field names from the OpenSearch index mapping, not any renamed aliases. When it looks up "http_status", it returns null, causing the dedup pushdown optimization to fail silently.
Without the pushdown optimization, the dedup operation falls back to a less efficient execution path that doesn't properly handle the renamed field, resulting in null values in the output.
Tentative Proposed Fix
Disclaimer: This is a preliminary analysis and requires further investigation.
Option 1: Resolve Renamed Field to Original Field Name
Modify OpenSearchDedupPushdownRule.java to get the original field name from the scan's row type instead of the renamed row type:
// Instead of:
String fieldName = fieldNameList.get(selectColumns.getFirst());
// Use the scan's original field names:
final List<String> originalFieldNames = scan.getRowType().getFieldNames();
String fieldName = originalFieldNames.get(selectColumns.getFirst());Option 2: Use ExprType's getOriginalPath()
The codebase already has a mechanism to track original field paths via ExprType.getOriginalPath(). The dedup pushdown rule could leverage this to resolve renamed fields back to their original names.
Workaround
None available. Users cannot use rename and dedup together in the same query pipeline. The only workaround is to avoid renaming fields before deduplication:
# Workaround: Dedup first, then rename
source=test-rename-bug
| dedup status
| rename status as http_status
| fields http_status
However, this workaround may not be suitable for all use cases, especially when the dedup field needs to be computed or transformed before deduplication.
Additional Testing
Test Case 1: Simple Keyword Field
source=test-rename-bug | rename status as http_status | dedup http_status | fields http_status
Result: ❌ Returns [null]
Test Case 2: Nested Field (OTEL Schema)
source=otel-v1-apm-span-000001
| rename `span.attributes.http@status_code` as my_precious
| dedup my_precious
| fields my_precious
Result: ❌ Returns [null]
Test Case 3: Rename + Fields (No Dedup)
source=test-rename-bug | rename status as http_status | fields http_status
Result: ✅ Works correctly, returns all 5 documents with proper values
Test Case 4: Dedup + Rename (Reversed Order)
source=test-rename-bug | dedup status | rename status as http_status | fields http_status
Result: ✅ Works correctly, returns deduplicated values
Related Issues
- [BUG] Rename does not work with nested fields #2740 - Reports a similar issue with nested fields, but this bug affects all field types
Metadata
Metadata
Assignees
Labels
Type
Projects
Status