-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38258][model] support model args in ptf and add ml_predict builtin ptf #26924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
2eeadb8
to
ed213d6
Compare
@@ -147,7 +159,9 @@ private static void checkScalarArgsOnly(List<StaticArgument> defaultArgs) { | |||
checkPassThroughColumns(declaredArgs); | |||
|
|||
final List<StaticArgument> newStaticArgs = new ArrayList<>(declaredArgs); | |||
newStaticArgs.addAll(PROCESS_TABLE_FUNCTION_SYSTEM_ARGS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious what the system arguments mean. Is this something that the user needs to be aware of? I do not see this phrase in the Flip and there is no more information in the Jira. I suggest including a description and motivation behind this piece. It appears to be a type of static arg that will be added if the boolean flag is on, but I am not sure when this would/should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to control whether uid
, ontime
field will be added to ptf input. This is currently used by ml_predict
because it doesn't need uid
and ontime
field. It's not exposed to PTF function user can define. Yes. I can add more description if this approach makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This in not entirely correct. A user-defined PTF can implement a TypeInference and avoid system args, but this is kind of second-level API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @lihaosky! I added some comments to improve the contribution a bit but nothing major.
@@ -651,6 +652,12 @@ public Optional<TableSemantics> getTableSemantics(int pos) { | |||
return Optional.of(semantics); | |||
} | |||
|
|||
@Override | |||
public Optional<ModelSemantics> getModelSemantics(int pos) { | |||
// TODO: Add ModelReferenceExpression checks and TableApiModelSemantics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not leave TODO in the code base, sometimes they stay there forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
@@ -298,6 +298,11 @@ public Builder staticArguments(StaticArgument... staticArguments) { | |||
return this; | |||
} | |||
|
|||
public Builder allowSystemArguments(boolean allowSystemArguments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's call this disableSystemArguments
and in TypeInference. By default, this then can be false.
public interface ModelSemantics { | ||
|
||
/** | ||
* Input data type expected by the passed model. Extracting type from PTF class definition is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Input data type expected by the passed model. Extracting type from PTF class definition is | |
* Input data type expected by the passed model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop last sentence. It rather confuses.
DataType inputDataType(); | ||
|
||
/** | ||
* Output data type produced by the passed model. Extracting type from PTF class definition is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Output data type produced by the passed model. Extracting type from PTF class definition is | |
* Output data type produced by the passed model. |
if (tableSemantics == null) { | ||
if (throwOnFailure) { | ||
throw new ValidationException( | ||
"First argument must be a table for ML_PREDICT function."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An input type strategy is optional if static arguments have been declared. You can assume that this check has been done already. An input type strategy might only be useful if you want to do additional validation, like validateTableAndDescriptorArguments below
import org.apache.flink.table.types.DataType; | ||
|
||
/** Mock implementation of {@link ModelSemantics} for testing purposes. */ | ||
public class ModelSemanticsMock implements ModelSemantics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move next to CallContextMock into utils package
@@ -62,6 +65,7 @@ public final class CallBindingCallContext extends AbstractSqlCallContext { | |||
private final List<DataType> argumentDataTypes; | |||
private final @Nullable DataType outputType; | |||
private final @Nullable List<StaticArgument> staticArguments; | |||
private final SqlValidator validator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validator does not really fit in here, can we avoid it? SqlModelCall should have resolved types already. I think TableArgCall has the same? we should synchronize the two if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored
@@ -1327,7 +1326,7 @@ public List<SqlGroupedWindowFunction> getAuxiliaryFunctions() { | |||
public static final SqlFunction SESSION = new SqlSessionTableFunction(); | |||
|
|||
// MODEL TABLE FUNCTIONS | |||
public static final SqlFunction ML_PREDICT = new SqlMLPredictTableFunction(); | |||
// public static final SqlFunction ML_PREDICT = new SqlMLPredictTableFunction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
@@ -335,6 +335,8 @@ public RelNode visit(FunctionQueryOperation functionTable) { | |||
inputStack.add(relBuilder.build()); | |||
return tableArgCall; | |||
} | |||
// TODO: Check ModelReferenceExpression and construct | |||
// RexModelArgCall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to fix this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
util.addTemporarySystemFunction("f", NoSystemArgsTableFunction.class); | ||
assertThatThrownBy(() -> util.verifyRelPlan("SELECT * FROM f(r => TABLE t, i => 1);")) | ||
.satisfies( | ||
anyCauseMatches("Disabling uid/time attributes is not supported for PTF.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyCauseMatches("Disabling uid/time attributes is not supported for PTF.")); | |
anyCauseMatches("Disabling system arguments is not supported for user-defined PTF yet.")); |
if (tableSemantics == null) { | ||
if (throwOnFailure) { | ||
throw new ValidationException( | ||
"First argument must be a table for ML_PREDICT function."); | ||
} else { | ||
return Optional.empty(); | ||
} | ||
} | ||
|
||
// Check that second argument is a model | ||
ModelSemantics modelSemantics = callContext.getModelSemantics(1).orElse(null); | ||
if (modelSemantics == null) { | ||
if (throwOnFailure) { | ||
throw new ValidationException( | ||
"Second argument must be a model for ML_PREDICT function."); | ||
} else { | ||
return Optional.empty(); | ||
} | ||
} | ||
|
||
// Check that third argument is a descriptor with column names | ||
Optional<ColumnList> descriptorColumns = callContext.getArgumentValue(2, ColumnList.class); | ||
if (descriptorColumns.isEmpty()) { | ||
if (throwOnFailure) { | ||
throw new ValidationException( | ||
"Third argument must be a descriptor with simple column names for ML_PREDICT function."); | ||
} else { | ||
return Optional.empty(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twalthr , I added these checks back since I have tests in MLPredictInputTypeStrategyTest
testing invalid argument as well. Also tableSemantics
etc are needed below. Maybe doesn't hurt to do extra check and give meaningful error message.
@flinkbot run azure |
What is the purpose of the change
ml_predict
builtin ptf definitionBrief change log
ModelSemantics
inCallContext
model
inStaticArgument
disableSystemArguments
to control whether adduid
androwtime
in ptf input/outputml_predict
builtin ptfVerifying this change
ml_predict
type inferenceDoes this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation