chore(engine): Add "compatibility node" to physical plan to adhere with naming of "colliding labels" in v1 engine #19470

chaudum · 2025-10-10T09:57:36Z

Summary

The v1 engine has a mechanism to rename labels in case they have the same name but different origin, such as labels, structured metadata, or parsed fields.

In case a log line has a structured metadata key with the same name as the label name of the stream, than the metadata key is suffixed with _extracted, such as service_extracted, if service exists in both labels and metadata.
In case a parser creates a parsed field with the same as the label name of the stream, then the parsed key is suffixed with _extracted in the same way as case 1. However, if the field name also collides with a structured metadata key, then the extracted structured metadata is replaced with the extracted parsed field.

This PR only implements the first case. As a follow up PR, the second case needs to be implemented as well. Additionally, the newly introduced "compatibility node" should also be made optional with a feature flag and/or per-request.

Signed-off-by: Christian Haudum <[email protected]>

The `ColumnCompat` node is used to guarantee compatibility with the old engine. In the new engine it is possible to have the same column name but from different sources, such as labels or structured metadata. While the old engine suffixes colliding names with a `_extracted`, the new engine returns them in both the labels and structured metadata response. This new node is used to keep the old behaviour without implementing the logic directly into the engine, but having it separate, so it can easliy be disabled again. Signed-off-by: Christian Haudum <[email protected]>

Signed-off-by: Christian Haudum <[email protected]>

ashwanthgoli

lgtm

ashwanthgoli · 2025-10-10T11:46:53Z

pkg/engine/internal/executor/compat.go

+		schema := batch.Schema()
+		for idx := range schema.NumFields() {
+			ident, err := semconv.ParseFQN(schema.Field(idx).Name)
+			if err != nil {


release batch here?

since we return a new record in the happy path, can we defer batch.Release() on 22 instead? I really prefer the open/close pattern, and using defer to close right after we open in the code.

Signed-off-by: Christian Haudum <[email protected]>

rfratto · 2025-10-10T16:16:51Z

pkg/engine/internal/planner/physical/planner.go

 		slices.Reverse(groups)
 	}

+	// TODO(chaudum): Make it configurable to keep/remove this compatibility node


Is it correct for compatibility to happen this early in the pipeline? Don't we need it after parse stages and the like?

This one always needs to happen, and needs to be before any filter node, since an ambiguous label filter does the evaluation on the COALESCE of metadata and label columns.

In case of a parse, we need a second one, that is placed directly after parse.

rfratto · 2025-10-10T16:17:35Z

pkg/engine/internal/planner/physical/planner.go

 		slices.Reverse(groups)
 	}

+	// TODO(chaudum): Make it configurable to keep/remove this compatibility node


Should there be a compatibility node in the logical plan that is responsible for this? We want compatibility based on how LogQL is being used, so it does feel like the logical plan's responsibility for placing it.

Interesting, haven't thought about that, but you have a point here.

Could be simply an Operator COMPAT, wdyt?

%1 = EQ label.env "prod" %2 = MAKETABLE [selector=%1, predicates=[], shard=0_of_1] ... %8 = LIMIT %7 [skip=0, fetch=1000] %9 = COMPAT %8 RETURN %9

Would it be ok to add this in a separate PR?

COMPAT makes sense to me 👍 Maybe even LOGQL_COMPAT if we want to be extra verbose.

Sure, I'm fine with it being done in a separate PR.

not much to add here, but I like LOGQL_COMPAT fwiw

rfratto · 2025-10-10T16:18:29Z

pkg/engine/internal/planner/physical/planner.go


+	// TODO(chaudum): Make it configurable to keep/remove this compatibility node
+	compat := &ColumnCompat{
+		id:          "MetadataOverLabel",


This ID won't be unique for very long; supporting binary operations over vectors like sum(...) + sum(...) is likely to end up with two distinct MAKE_TABLE operations in the logical plan.

The ID here is not relevant at all.

It will be relevant very soon :) The scheduler will need unique IDs.

rfratto · 2025-10-10T16:22:19Z

pkg/engine/internal/executor/compat.go

+			})
+		}
+
+		// Create a new builder with the updated schema.


I'm not sure we need to create a new builder here since the contents of the arrays don't change for compatibility mapping.

Can we use the existing arrays and provide a new schema which handles the renaming via array.NewRecord? That would also be much faster than copying the data in the arrays.

That's what I do with the columns that do not have conflicts.
However, I cannot just rename the full column of a batch, since we need to do that on a row basis.

I'm not sure I'm following yet. You can create a new *arrow.Schema where the field has the new resolved name, but give it the same underlying array (via array.NewRecord).

I second this, there should be a way to reuse the column data and just rename the field in the schema.

Line 147 creates the new batch using array.NewRecord(newSchema, newSchemaColumns, batch.NumRows()), where newSchemaColums is the []arrow.Array that holds the existing unmodified columns (line 111) and the modified columns (line 137, line 141).

Maybe I'm having a hard time following from the code why we need to modify the columns? It reads as if it were just copying data, but I guess you're saying it's doing more than that?

After talking offline, it's clearer to me now that this is similar to a multi-in, multi-out coalesce operation.

I do find the logic here hard to follow and validate: it seems like sourceFieldBuilder never has a non-NULL appended to it, but I don't think that's true? I don't have any suggestions how to make this easier to understand, and I don't want to block us having this, so I'm comfortable with it being merged and coming back to this later.

the need for the new column was finally made clear to me due to the fact that they columns won't always conflict (either by rows in the same batch or across batches). I think this case is worth a test, at least for documentation purposes.

These cases are covered by tests

I do find the logic here hard to follow and validate: it seems like sourceFieldBuilder never has a non-NULL appended to it, but I don't think that's true?

Right, that is incorrect behaviour and will be fixed with
5b16cfb

trevorwhitney

not much to add, just a nit about releasing semantics and agreement with everyone else :)

looks good though!

trevorwhitney · 2025-10-10T18:09:06Z

pkg/engine/internal/executor/compat.go

+		schema := batch.Schema()
+		for idx := range schema.NumFields() {
+			ident, err := semconv.ParseFQN(schema.Field(idx).Name)
+			if err != nil {


since we return a new record in the happy path, can we defer batch.Release() on 22 instead? I really prefer the open/close pattern, and using defer to close right after we open in the code.

trevorwhitney · 2025-10-10T18:10:41Z

pkg/engine/internal/executor/compat.go

+
+		// Return early if there are no colliding column names.
+		if len(duplicates) == 0 {
+			return successState(batch)


ahh, I was wrong, we don't always create a new record. I wonder if we want to for cleaner release semantics?

We could still defer release on line 22, and explicitly retain here, as this is a short circuit.

trevorwhitney · 2025-10-10T18:20:55Z

pkg/engine/internal/executor/compat_test.go

+			name:   "multiple duplicates",
+			slice1: []string{"a", "b", "c", "d"},
+			slice2: []string{"c", "d", "e", "f"},
+			expected: []duplicate{
+				{
+					value: "c",
+					s1Idx: 2,
+					s2Idx: 0,
+				},
+				{
+					value: "d",
+					s1Idx: 3,
+					s2Idx: 1,
+				},
+			},
+		},
+		{
+			name:   "duplicate with different positions",
+			slice1: []string{"x", "y", "z"},
+			slice2: []string{"z", "y", "x"},
+			expected: []duplicate{
+				{
+					value: "x",
+					s1Idx: 0,
+					s2Idx: 2,
+				},
+				{
+					value: "y",
+					s1Idx: 1,
+					s2Idx: 1,
+				},
+				{
+					value: "z",
+					s1Idx: 2,
+					s2Idx: 0,
+				},
+			},
+		},


I think these two cases are likely overlapping in the actual code they test.

trevorwhitney · 2025-10-10T18:24:48Z

pkg/engine/internal/planner/physical/planner.go

 		slices.Reverse(groups)
 	}

+	// TODO(chaudum): Make it configurable to keep/remove this compatibility node


not much to add here, but I like LOGQL_COMPAT fwiw

trevorwhitney · 2025-10-10T18:29:46Z

pkg/engine/internal/executor/compat.go

+			})
+		}
+
+		// Create a new builder with the updated schema.


I second this, there should be a way to reuse the column data and just rename the field in the schema.

Follow up on #19470

After parsing log lines, field names need to be checked whether they collide with label field names. Follow up on #19470

The v1 engine has a mechanism to rename labels in case they have the same name but different origin, such as labels, structured metadata, or parsed fields. 1. In case a log line has a structured metadata key with the same name as the label name of the stream, than the metadata key is suffixed with `_extracted`, such as `service_extracted`, if `service` exists in both `labels` and `metadata`. 2. In case a parser creates a parsed field with the same as the label name of the stream, then the parsed key is suffixed with `_extracted` in the same way as case 1. However, if the field name also collides with a structured metadata key, then the extracted structured metadata is replaced with the extracted parsed field. This PR only implements the second case. This PR is a follow up on #19470 Signed-off-by: Christian Haudum <[email protected]> Co-authored-by: Ivan Kalita <[email protected]>

chaudum requested a review from a team as a code owner October 10, 2025 09:57

pull-request-size bot added the size/XXL label Oct 10, 2025

chaudum force-pushed the chaudum/compat-nodes branch 2 times, most recently from bda3a3b to 353a588 Compare October 10, 2025 11:13

pull-request-size bot added size/XL and removed size/XXL labels Oct 10, 2025

chaudum added 5 commits October 10, 2025 14:04

chore: Cache fully qualified name of semconv.Identifier

39bede7

Signed-off-by: Christian Haudum <[email protected]>

chore: Generate benchmark data with colliding columns names

995cb9c

Signed-off-by: Christian Haudum <[email protected]>

chore: Implement ColumnCompat pipeline stage

fe11bda

Signed-off-by: Christian Haudum <[email protected]>

chore: Run make lint

bddd381

Signed-off-by: Christian Haudum <[email protected]>

chaudum force-pushed the chaudum/compat-nodes branch from 0d27b2a to bddd381 Compare October 10, 2025 12:04

ashwanthgoli approved these changes Oct 10, 2025

View reviewed changes

chaudum added 2 commits October 10, 2025 17:17

chore: Improve how unmodified columns are copied into new batch

9734cf3

Signed-off-by: Christian Haudum <[email protected]>

chore: Update config.yaml for local Loki setup with benchmark data

5640a87

Signed-off-by: Christian Haudum <[email protected]>

chaudum requested review from ivkalita, rfratto and sophiewaldman October 10, 2025 15:20

rfratto reviewed Oct 10, 2025

View reviewed changes

trevorwhitney reviewed Oct 10, 2025

View reviewed changes

chore: Correctly retain/release batches

9903658

chaudum merged commit c06eb63 into main Oct 12, 2025
69 checks passed

chaudum deleted the chaudum/compat-nodes branch October 12, 2025 19:00

chaudum added a commit that referenced this pull request Oct 13, 2025

chore(engine): Add compatibility node for parsed columns

1f4d854

Follow up on #19470

chaudum added a commit that referenced this pull request Oct 13, 2025

chore(engine): Add compatibility node for parsed columns

72f3b9b

After parsing log lines, field names need to be checked whether they collide with label field names. Follow up on #19470

chaudum mentioned this pull request Oct 13, 2025

chore(engine): Add compatibility node for parsed columns #19485

Merged

chaudum added a commit that referenced this pull request Oct 14, 2025

chore(engine): Add compatibility node for parsed columns

a657d51

After parsing log lines, field names need to be checked whether they collide with label field names. Follow up on #19470

chaudum added a commit that referenced this pull request Oct 14, 2025

chore(engine): Add compatibility node for parsed columns

1ebcddc

After parsing log lines, field names need to be checked whether they collide with label field names. Follow up on #19470

chore(engine): Add "compatibility node" to physical plan to adhere with naming of "colliding labels" in v1 engine #19470

chore(engine): Add "compatibility node" to physical plan to adhere with naming of "colliding labels" in v1 engine #19470

Uh oh!

Conversation

chaudum commented Oct 10, 2025

Summary

Uh oh!

ashwanthgoli left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfratto Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trevorwhitney left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rfratto Oct 10, 2025 •

edited

Loading