-
Notifications
You must be signed in to change notification settings - Fork 176
Description
Query Information
PPL Command/Query:
source=apache_http_logs
| eval ip_str=CAST(clientIP AS STRING)
| eval IPToCountry=geoip("my-datasource", ip_str, "country_iso_code")
| stats count() by span(datetime, 1M) as month, IPToCountry
Expected Result:
The query should successfully execute and return aggregated counts grouped by month and country ISO code derived from the IP addresses using the geoip function.
Actual Result:
{
"error": {
"reason": "Error occurred in OpenSearch engine: all shards failed",
"details": "Shard[0]: java.lang.IllegalStateException: Failed to deserialize RexNode and its required structure: {...\"class\": \"org.opensearch.sql.expression.function.UserDefinedFunctionBuilder$1\"...}",
"type": "SearchPhaseExecutionException"
},
"status": 500
}Dataset Information
Dataset/Schema Type
- OpenTelemetry (OTEL)
- Simple Schema for Observability (SS4O)
- Open Cybersecurity Schema Framework (OCSF)
- Custom (Apache HTTP logs)
Index Mapping
{
"mappings": {
"properties": {
"clientIP": {
"type": "text",
"fields": {
"keyword": { "type": "keyword" }
}
},
"datetime": { "type": "date_nanos" },
"request": { "type": "text" },
"status": { "type": "text" },
"bytes": { "type": "integer" },
"API": { "type": "text" },
"protocol": { "type": "text" }
}
}
}Sample Data
{
"clientIP": "192.168.1.100",
"datetime": "2025-07-09T01:47:14.900000Z",
"request": "GET",
"API": "/api/users",
"protocol": "HTTP/1.1",
"status": "200",
"bytes": 1024
}Bug Description
Issue Summary:
The geoip() function fails with deserialization error when used in aggregation grouping (stats ... by) context. The query works correctly when geoip is used without aggregation, but fails when the geoip result is used as a grouping field in stats operations.
Steps to Reproduce:
- Create an index with IP address fields (text or ip type)
- Configure a geospatial datasource for geoip lookups
- Execute a query that uses geoip() result in stats grouping:
source=apache_http_logs
| eval ip_str=CAST(clientIP AS STRING)
| eval IPToCountry=geoip("my-datasource", ip_str, "country_iso_code")
| stats count() by IPToCountry
- Observe the deserialization error
Comparison with Working Query:
The geoip function works correctly without aggregation:
source=apache_http_logs
| eval ip_str=CAST(clientIP AS STRING)
| eval IPToCountry=geoip("my-datasource", ip_str, "country_iso_code")
| fields clientIP, IPToCountry
| head 5
Impact:
This bug prevents users from performing aggregations grouped by geolocation data derived from IP addresses, which is a critical use case for:
- Security analytics (grouping events by country/region)
- Log analysis (traffic analysis by geographic location)
- Observability dashboards (metrics by geographic distribution)
Users must work around this by avoiding aggregations on geoip results or performing aggregations client-side.
Environment Information
OpenSearch Version: 3.3.0-SNAPSHOT
Additional Details:
- The issue is specific to the Calcite-based query engine
- Related to issue [BUG] geoip() function fails with IP type field reference #4468 (geoip with IP type fields) but distinct - this issue occurs even with the workaround of casting to STRING
- Related to issue [BUG] Aggregation is pushed down through fields generated by window operator #4137 (aggregation pushdown with UDFs) which was addressed by PR Prevent aggregation push down when it has inner filter #4002, but that fix only prevented pushdown for window functions and inner filters, not for UDFs in grouping context
Tentative Root Cause Analysis
This is a preliminary analysis and requires further investigation.
The root cause appears to be in the aggregation pushdown mechanism when User-Defined Functions (UDFs) are involved. When the query planner pushes down aggregations to OpenSearch as scripts, it serializes the RexNode expressions including the geoip UDF call. The serialization includes the class name org.opensearch.sql.expression.function.UserDefinedFunctionBuilder$1, which is an anonymous inner class created by the toUDF() method in UserDefinedFunctionBuilder.java.
During deserialization on the data node (in RelJsonSerializer.deserialize() and ExtendedRelJson.toRex()), the system attempts to instantiate this class using AvaticaUtils.instantiatePlugin() which requires a no-argument constructor. However, anonymous inner classes cannot be instantiated this way, leading to the deserialization failure.
The relevant code paths:
- Serialization:
/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializer.java- serializes RexNode with UDF class information - Deserialization:
/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ExtendedRelJson.java(line ~450 intoRex()method) - attempts to deserialize and instantiate the UDF - UDF Creation:
/core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java(lines 42-56 intoUDF()method) - creates anonymous SqlUserDefinedFunction
Tentative Proposed Fix
This is a preliminary analysis and requires further investigation.
There are two potential approaches to fix this issue:
Option 1: Prevent Aggregation Pushdown for UDFs in Grouping Context
Similar to the fix in PR #4002 that prevented pushdown for window functions, add logic to detect when UDFs are used in aggregation grouping fields and prevent pushdown in those cases. This would be implemented in the aggregation pushdown optimization rules.
Location: /opensearch/src/main/java/org/opensearch/sql/opensearch/planner/logical/rule/ (aggregation pushdown rules)
Option 2: Make UDF Classes Serializable/Deserializable
Modify the UDF registration mechanism to use concrete named classes instead of anonymous inner classes, or implement custom serialization/deserialization logic that can handle UDF reconstruction without relying on class instantiation.
This would require changes to:
UserDefinedFunctionBuilder.toUDF()to create named classes or register UDFs in a way that allows reconstructionExtendedRelJson.toRex()to add special handling for UDF deserialization using a registry pattern
Recommendation: Option 1 is safer and more consistent with the existing fix for window functions. Option 2 would provide better performance but requires more extensive changes and testing.
Workaround
Currently, there is no simple workaround for this issue. Users cannot perform aggregations grouped by geoip function results. The only alternatives are:
- Perform aggregations client-side after retrieving geoip-enriched data
- Pre-compute and store geolocation data in the index during ingestion
Related Issues:
- [BUG] geoip() function fails with IP type field reference #4468 - geoip() function fails with IP type field reference
- [BUG] Aggregation is pushed down through fields generated by window operator #4137 - Aggregation is pushed down through fields generated by window operator (fixed by PR Prevent aggregation push down when it has inner filter #4002)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status