-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: Add auto_explain mode
#19316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add auto_explain mode
#19316
Conversation
| self.cache = | ||
| Self::compute_properties(&self.input, Arc::clone(&self.input.schema())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be recomputed since the output changes.
| if auto_explain { | ||
| if duration.as_millis() >= auto_explain_min_duration as u128 { | ||
| export_auto_explain(out, &auto_explain_output)?; | ||
| } | ||
| concat_batches(&inner_schema, &batches).map_err(DataFusionError::from) | ||
| } else { | ||
| Ok(out) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auto_explain mode will return the input's batches instead of the analyze.
| let fd: &mut dyn Write = match output { | ||
| "stdout" => &mut io::stdout(), | ||
| "stderr" => &mut io::stderr(), | ||
| _ => &mut OpenOptions::new().create(true).append(true).open(output)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need any kind of validation of the file location ?
Or it is left to the developer/admin to make sure it is a safe place ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need some kind of synchronisation when a file path is used for the output ? Two or more DF sessions using the same config may try to write to the same file simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need any kind of validation of the file location ?
Or it is left to the developer/admin to make sure it is a safe place ?
I think it's better to leave this to the user (either way, an error is returned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need some kind of synchronisation when a file path is used for the output ? Two or more DF sessions using the same config may try to write to the same file simultaneously.
I think again the responsibility of this falls on the user. Is it common to use multiple sessions over the same config?
| # test auto_explain | ||
|
|
||
| statement ok | ||
| set datafusion.explain.auto_explain_output = 'test_files/scratch/auto_explain.txt'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does something assert the contents of this output file ?
Does something remove this file at the end ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally tried to load the file to a table as CSV, as I think it is the only feasible way to check the contents, but since the file cannot be removed the result would always change. I mainly added these sqllogictests just to check the "set ..." commands.
As for removing the file, I'm not sure it is possible. With that said, I don't think it is necessary since it's written to the sqllogictest temporary dir.
Co-authored-by: Martin Grigorov <[email protected]>
Co-authored-by: Martin Grigorov <[email protected]>
Which issue does this PR close?
auto_explainmode #19215.Rationale for this change
Allowing users to check the execution plans without needing to change the existing application.
The
auto_explainmode can be enabled with thedatafusion.explain.auto_explainconfig. In addition, there are two other configs:datafusion.explain.auto_explain_output: sets the output location of the plans. Supportsstdout,stderr, and a file path.datafusion.explain.auto_explain_min_duration: only outputs plans whose duration is greater than this value (similar to Postgres'auto_explain.log_min_duration).Example in
datafusion-cli:What changes are included in this PR?
AnalyzeExecoperator to support theauto_explainmode.AnalyzeExecoperator whenauto_explainis enabled.Are these changes tested?
Yes.
Are there any user-facing changes?
New feature, but it's completely optional.