-
Notifications
You must be signed in to change notification settings - Fork 3.9k
feat: implement unwrap as a projection
#19409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
de9b26a to
9007cf8
Compare
ba88390 to
0f7c19b
Compare
| UnaryOpUnwrapBytes // Unwrap string bytes to float value operation (unwrap). | ||
| UnaryOpUnwrapDuration // Unwrap string duration to float value operation (unwrap). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have individual operations for unwrap bytes and unwrap duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without the different types, how do we determine the correct conversion function? would you parse that during logical planning and pass that information through in a way other than the operation? the pattern for registering functions leaned heavily on operation types, so it seemed like a good place to make the differentiation to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chaudum and I agreed offline we'll need different operations, but they should be renamed to cast operations as that's what these actually represent, ie. how to do the cast.
| const ( | ||
| // UnwrapOpInvalid indicates an invalid unwrap operation. | ||
| UnwrapOpInvalid UnwrapOp = iota | ||
| Unwrap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this considered an unwrap INT or unwrap FLOAT, or generic unwrap NUMBER?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unwrap would be a unwrap FLOAT -> strconv.ParseFloat(v, 64)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to cast and got made the base case more explicit by naming it CastFloat
5e75771 to
4ec6ce1
Compare
894b194 to
e2f6ebf
Compare
rfratto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I really like how this flows as a projection and unary expression.
| UnaryOpCastFloat // Cast string to float value operation (unwrap). | ||
| UnaryOpCastBytes // Cast string bytes to float value operation (unwrap). | ||
| UnaryOpCastDuration // Cast string duration to float value operation (unwrap). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these say "to float value", but is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, because the type of the value column the casted result is put in will always be a float64, and the conversion functions all return a float64. The difference is in how they get that value (ie. strconv, time.Parse, or humanize)
| // Unwrap applies an [Projection] operation, with an [UnaryOp] unwrap operation, to the Builder. | ||
| func (b *Builder) Unwrap(identifier string, operation string) *Builder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I intended Builder to have an API which more closely reflects the logical plan rather than LogQL constructs.
Can we update this to Builder.Cast maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, interesting, I though of builder as the bridge between logql and the logical plan, so the parameters of this function match closely to what we get out of logql for the unwrap operation. I'm happy to go with Cast though, as I think the arguments are still consistent.
| // Check for unwrap in the LogRangeExpr | ||
| if e.Left.Unwrap != nil { | ||
| unwrapIdentifier = e.Left.Unwrap.Identifier | ||
| unwrapOperation = e.Left.Unwrap.Operation | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it valid for a LogQL query to have more than one unwrap? If so, this breaks.
Maybe in the meantime, we can return errUnimplemented if we see an unwrap but unwrapIdentifier is already set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can return errUnimplemented 👍 . The only way I can think of having multiple unwrap statements is on opposite sides of a binary operation, so in two separate range expressions. I haven't thought about how that would be handled in this PR.
| // Check for unwrap in the LogRangeExpr | ||
| if e.Left.Unwrap != nil { | ||
| unwrapIdentifier = e.Left.Unwrap.Identifier | ||
| unwrapOperation = e.Left.Unwrap.Operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should be validating that we recognize and support whatever e.Left.Unwrap.Operation is and return errUnimplmeneted otherwise.
| } | ||
| } | ||
|
|
||
| schema := arrow.NewSchema(fields, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this propagate the metadata from the input? Are we using metadata anywhere anymore? (Ditto with newKeepPipeline above)
| } | ||
|
|
||
| func newExpandPipeline(expressions []physical.UnaryExpression, evaluator *expressionEvaluator, input Pipeline) (*GenericPipeline, error) { | ||
| return newGenericPipeline(func(ctx context.Context, inputs []Pipeline) (arrow.Record, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally we validate that len(inputs) == 1 here so things don't silently break. Ditto with newKeepPipeline
| pipeline := NewFilterPipeline(filter, input, expressionEvaluator{}, alloc) | ||
| defer pipeline.Close() | ||
| e := newExpressionEvaluator(alloc) | ||
| pipeline := NewFilterPipeline(filter, input, e, alloc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're missing a defer pipeline.Close() here, it was there but got removed in the diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, good catch. many rebases
| if errTracker.hasErrors { | ||
| defer errorCol.Release() | ||
| defer errorDetailsCol.Release() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this since that's already handled by the defer to releaseBuilders on the line above
pkg/engine/internal/executor/cast.go
Outdated
| func buildOutputFields( | ||
| hasErrors bool, | ||
| ) []arrow.Field { | ||
| fields := make([]arrow.Field, 0, 3) | ||
|
|
||
| // Add value field. Not nullable in practice since we use 0.0 when conversion fails, but as of | ||
| // writing all coumns are marked as nullable, even Timestamp and Message, so staying consistent | ||
| fields = append(fields, semconv.FieldFromIdent(semconv.ColumnIdentValue, true)) | ||
|
|
||
| // Add error fields if needed | ||
| if hasErrors { | ||
| fields = append(fields, | ||
| semconv.FieldFromIdent(semconv.ColumnIdentError, true), | ||
| semconv.FieldFromIdent(semconv.ColumnIdentErrorDetails, true), | ||
| ) | ||
| } | ||
|
|
||
| return fields | ||
| } | ||
|
|
||
| func buildResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buildOutputFields and buildResult might be a little too generically named for being in the executorPackage, can we give them more precise names?
chaudum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing more to add than what has already been said.
What this PR does / why we need it:
This PR continues our efforts towards LogQL backwards compatibility, with a focus on queries needed for Drilldown, by implementing support for
unwrap. Unwrap was implemented as a projection.Special notes for your reviewer:
Checklist
CONTRIBUTING.mdguide (required)featPRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.mddeprecated-config.yamlanddeleted-config.yamlfiles respectively in thetools/deprecated-config-checkerdirectory. Example PR