Skip to content

Conversation

@nuno-faria
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Allowing users to check the execution plans without needing to change the existing application.

The auto_explain mode can be enabled with the datafusion.explain.auto_explain config. In addition, there are two other configs:

  • datafusion.explain.auto_explain_output: sets the output location of the plans. Supports stdout, stderr, and a file path.
  • datafusion.explain.auto_explain_min_duration: only outputs plans whose duration is greater than this value (similar to Postgres' auto_explain.log_min_duration).

Example in datafusion-cli:

-- regular mode
> select 1;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row(s) fetched.

-- with auto_explain enabled (the plan is not actually part of the result, it is sent to stdout)
> select 1;
+-------------------+------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                         |
+-------------------+------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[1 as Int64(1)], metrics=[output_rows=1, elapsed_compute=21.50µs, output_bytes=8.0 B, output_batches=1] |
|                   |   PlaceholderRowExec, metrics=[]                                                                                             |
|                   |                                                                                                                              |
+-------------------+------------------------------------------------------------------------------------------------------------------------------+
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row(s) fetched.

What changes are included in this PR?

  • Extended the existing AnalyzeExec operator to support the auto_explain mode.
  • Added new explain configs.
  • Wrap plans in a AnalyzeExec operator when auto_explain is enabled.
  • Added tests.

Are these changes tested?

Yes.

Are there any user-facing changes?

New feature, but it's completely optional.

@github-actions github-actions bot added documentation Improvements or additions to documentation core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate physical-plan Changes to the physical-plan crate labels Dec 14, 2025
Comment on lines +127 to +128
self.cache =
Self::compute_properties(&self.input, Arc::clone(&self.input.schema()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be recomputed since the output changes.

Comment on lines 256 to 263
if auto_explain {
if duration.as_millis() >= auto_explain_min_duration as u128 {
export_auto_explain(out, &auto_explain_output)?;
}
concat_batches(&inner_schema, &batches).map_err(DataFusionError::from)
} else {
Ok(out)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto_explain mode will return the input's batches instead of the analyze.

@nuno-faria
Copy link
Contributor Author

let fd: &mut dyn Write = match output {
"stdout" => &mut io::stdout(),
"stderr" => &mut io::stderr(),
_ => &mut OpenOptions::new().create(true).append(true).open(output)?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need any kind of validation of the file location ?
Or it is left to the developer/admin to make sure it is a safe place ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need some kind of synchronisation when a file path is used for the output ? Two or more DF sessions using the same config may try to write to the same file simultaneously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need any kind of validation of the file location ?
Or it is left to the developer/admin to make sure it is a safe place ?

I think it's better to leave this to the user (either way, an error is returned).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need some kind of synchronisation when a file path is used for the output ? Two or more DF sessions using the same config may try to write to the same file simultaneously.

I think again the responsibility of this falls on the user. Is it common to use multiple sessions over the same config?

# test auto_explain

statement ok
set datafusion.explain.auto_explain_output = 'test_files/scratch/auto_explain.txt';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does something assert the contents of this output file ?
Does something remove this file at the end ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally tried to load the file to a table as CSV, as I think it is the only feasible way to check the contents, but since the file cannot be removed the result would always change. I mainly added these sqllogictests just to check the "set ..." commands.

As for removing the file, I'm not sure it is possible. With that said, I don't think it is necessary since it's written to the sqllogictest temporary dir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add auto_explain mode

2 participants