Skip to content

[BUG] geoip() function fails with deserialization error when used in aggregation grouping context #4478

@alexey-temnikov

Description

@alexey-temnikov

Query Information

PPL Command/Query:

source=apache_http_logs 
| eval ip_str=CAST(clientIP AS STRING) 
| eval IPToCountry=geoip("my-datasource", ip_str, "country_iso_code") 
| stats count() by span(datetime, 1M) as month, IPToCountry

Expected Result:
The query should successfully execute and return aggregated counts grouped by month and country ISO code derived from the IP addresses using the geoip function.

Actual Result:

{
  "error": {
    "reason": "Error occurred in OpenSearch engine: all shards failed",
    "details": "Shard[0]: java.lang.IllegalStateException: Failed to deserialize RexNode and its required structure: {...\"class\": \"org.opensearch.sql.expression.function.UserDefinedFunctionBuilder$1\"...}",
    "type": "SearchPhaseExecutionException"
  },
  "status": 500
}

Dataset Information

Dataset/Schema Type

  • OpenTelemetry (OTEL)
  • Simple Schema for Observability (SS4O)
  • Open Cybersecurity Schema Framework (OCSF)
  • Custom (Apache HTTP logs)

Index Mapping

{
  "mappings": {
    "properties": {
      "clientIP": { 
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword" }
        }
      },
      "datetime": { "type": "date_nanos" },
      "request": { "type": "text" },
      "status": { "type": "text" },
      "bytes": { "type": "integer" },
      "API": { "type": "text" },
      "protocol": { "type": "text" }
    }
  }
}

Sample Data

{
  "clientIP": "192.168.1.100",
  "datetime": "2025-07-09T01:47:14.900000Z",
  "request": "GET",
  "API": "/api/users",
  "protocol": "HTTP/1.1",
  "status": "200",
  "bytes": 1024
}

Bug Description

Issue Summary:
The geoip() function fails with deserialization error when used in aggregation grouping (stats ... by) context. The query works correctly when geoip is used without aggregation, but fails when the geoip result is used as a grouping field in stats operations.

Steps to Reproduce:

  1. Create an index with IP address fields (text or ip type)
  2. Configure a geospatial datasource for geoip lookups
  3. Execute a query that uses geoip() result in stats grouping:
source=apache_http_logs 
| eval ip_str=CAST(clientIP AS STRING) 
| eval IPToCountry=geoip("my-datasource", ip_str, "country_iso_code") 
| stats count() by IPToCountry
  1. Observe the deserialization error

Comparison with Working Query:
The geoip function works correctly without aggregation:

source=apache_http_logs 
| eval ip_str=CAST(clientIP AS STRING) 
| eval IPToCountry=geoip("my-datasource", ip_str, "country_iso_code") 
| fields clientIP, IPToCountry 
| head 5

Impact:
This bug prevents users from performing aggregations grouped by geolocation data derived from IP addresses, which is a critical use case for:

  • Security analytics (grouping events by country/region)
  • Log analysis (traffic analysis by geographic location)
  • Observability dashboards (metrics by geographic distribution)

Users must work around this by avoiding aggregations on geoip results or performing aggregations client-side.

Environment Information

OpenSearch Version: 3.3.0-SNAPSHOT

Additional Details:

Tentative Root Cause Analysis

This is a preliminary analysis and requires further investigation.

The root cause appears to be in the aggregation pushdown mechanism when User-Defined Functions (UDFs) are involved. When the query planner pushes down aggregations to OpenSearch as scripts, it serializes the RexNode expressions including the geoip UDF call. The serialization includes the class name org.opensearch.sql.expression.function.UserDefinedFunctionBuilder$1, which is an anonymous inner class created by the toUDF() method in UserDefinedFunctionBuilder.java.

During deserialization on the data node (in RelJsonSerializer.deserialize() and ExtendedRelJson.toRex()), the system attempts to instantiate this class using AvaticaUtils.instantiatePlugin() which requires a no-argument constructor. However, anonymous inner classes cannot be instantiated this way, leading to the deserialization failure.

The relevant code paths:

  1. Serialization: /opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializer.java - serializes RexNode with UDF class information
  2. Deserialization: /opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ExtendedRelJson.java (line ~450 in toRex() method) - attempts to deserialize and instantiate the UDF
  3. UDF Creation: /core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java (lines 42-56 in toUDF() method) - creates anonymous SqlUserDefinedFunction

Tentative Proposed Fix

This is a preliminary analysis and requires further investigation.

There are two potential approaches to fix this issue:

Option 1: Prevent Aggregation Pushdown for UDFs in Grouping Context

Similar to the fix in PR #4002 that prevented pushdown for window functions, add logic to detect when UDFs are used in aggregation grouping fields and prevent pushdown in those cases. This would be implemented in the aggregation pushdown optimization rules.

Location: /opensearch/src/main/java/org/opensearch/sql/opensearch/planner/logical/rule/ (aggregation pushdown rules)

Option 2: Make UDF Classes Serializable/Deserializable

Modify the UDF registration mechanism to use concrete named classes instead of anonymous inner classes, or implement custom serialization/deserialization logic that can handle UDF reconstruction without relying on class instantiation.

This would require changes to:

  • UserDefinedFunctionBuilder.toUDF() to create named classes or register UDFs in a way that allows reconstruction
  • ExtendedRelJson.toRex() to add special handling for UDF deserialization using a registry pattern

Recommendation: Option 1 is safer and more consistent with the existing fix for window functions. Option 2 would provide better performance but requires more extensive changes and testing.

Workaround

Currently, there is no simple workaround for this issue. Users cannot perform aggregations grouped by geoip function results. The only alternatives are:

  1. Perform aggregations client-side after retrieving geoip-enriched data
  2. Pre-compute and store geolocation data in the index during ingestion

Related Issues:

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagebugSomething isn't workingpushdownpushdown related issues

Type

No type

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions