Gt plt summary customization #160

ScottFB101 · 2025-08-12T04:50:47Z

Summary

Thank you for contributing to gt-extras! To make this process easier for everyone, please explain the context and purpose of your contribution. Also, list the changes made to the existing code or documentation.

This PR addresses the 4 out of the 5 enhancements proposed in Issue #146. The one enhancement not addressed was: "Display times instead of just dates". I felt that was deserving of another separate PR.

The changes made to the existing code are rather straightforward, and don't need much background besides knowing they address the 4 enhancements previously referenced. This is my opinion of course, and I'm happy to elaborate more on the code here if there's a want for it.

As for the context and purpose of my contribution, I use so many posit-backed packages and tools, that it only feels right to get involved in contributing to the ecosystem.

Related GitHub Issues and PRs

Ref: Allow users to customize gt_plt_summary() #146

Checklist

I understand and agree to the Code of Conduct.
I have followed the Style Guide for Python Code as best as possible for the submitted code.
I have added pytest unit tests for any new functionality.

Introduces a 'Mode' column to the summary table in gt_plt_summary, with logic to handle cases with no singular mode or multiple modes. Updates function arguments to allow hiding descriptive stats and mode, and adjusts .gitignore to exclude VSCode launch configuration.

Mode column is now conditionally dropped based on the add_mode flag, and descriptive stats columns are managed more flexibly. This improves clarity and control over which summary statistics are displayed.

Introduces an 'interactivity' parameter to summary plot functions, allowing toggling of interactive features such as hover tooltips and CSS. All relevant plotting and SVG generation functions now accept and respect this parameter, enabling more flexible plot rendering. NOTE: The code that creates the tooltips still runs, it just isn't append to the SVG Element.

Introduces a color_mapping parameter to gt_plt_summary, allowing users to override the default color scheme. A helper function change_color_mapping updates the global COLOR_MAPPING when user overrides are provided.

Refactored summary DataFrame creation to conditionally include descriptive statistics and mode based on parameters. Enabled interactivity by default in category bar and histogram SVG plots. Commented out redundant DataFrame column drops in gt_plt_summary.

… to boolean and numeric plots. Changed the default value of the 'interactivity' parameter to True in all summary plot functions to ensure interactive plots are generated by default. Also

Added unit tests to verify that change_color_mapping correctly updates COLOR_MAPPING, handles None and empty dict inputs, and preserves existing keys. This improves test coverage for color mapping customization.

Adds checks to ensure that missing value handling and number formatting are only applied to columns present in the DataFrame. This prevents errors in Polars when columns like 'Mean', 'Median', 'SD', or 'Mode' are absent.

The .vscode/launch.json entry was deleted from .gitignore

Removed the change_color_mapping function and integrated color mapping overrides directly into gt_plt_summary via the new_color_mapping parameter. Updated related docstrings and cleaned up unused code and tests for change_color_mapping.

Added documentation for the 'interactivity' parameter in the gt_plt_summary function, explaining its purpose for toggling interactive features in Plot Overview column graphs.

gt_extras/summary.py

juleswg23 · 2025-08-12T14:59:23Z

Thank you Scott! This is looking really awesome, I really think it upgrades the user experience for this function.

I've made some comments (all pretty small) to places I would consider edits. Take a look, and let me know if you think any of the changes are unreasonable, I'd be happy to consider other angles.

Once you feel like the PR is in a good place, the last thing to add are tests. This could probably be one snapshot test with all the optional parameters turned on, and then a few individual tests each checking one of the new parameters. When you run make test, it should report what lines are covered/uncovered... I'd love to be at full coverage for this PR!

Replaces the 'hide_desc_stats' parameter with 'show_desc_stats' in gt_plt_summary and related functions for clarity. Updates logic and docstrings to reflect the new parameter name and default value.

Improves the docstring for gt_plt_summary by clarifying the new_color_mapping parameter as a dictionary mapping data types to hex color codes, and adds a new example demonstrating custom color mapping with ocean swell data. Also corrects default value descriptions for show_desc_stats and add_mode.

Moved the creation and insertion of visual bars and tooltips inside the interactivity check in _make_categories_bar_svg. This ensures that these SVG elements are only added when interactivity is enabled, preventing unnecessary elements in static mode.

Adjusts the placement and conditional rendering of tooltips and visual bars in the _make_categories_bar_svg and _make_histogram_svg functions. Tooltips are now only created and appended when interactivity is enabled, preventing unnecessary elements in non-interactive SVGs.

ScottFB101 · 2025-08-13T04:31:14Z

Of course! Happy to help, it's been good practice for me.

Alright I made all the changes, but will admit I'm struggling a bit figuring out the snapshot test (not something I've dealt with before), so I can't get 2 of the 500ish lines of summary.py to get coverage using pytest.

gt_extras/summary.py

juleswg23 · 2025-08-13T14:18:55Z

I can't get 2 of the 500ish lines of summary.py to get coverage using pytest.

This is getting close! The snapshot test is great, it's not necessarily supposed to hit every branch, just to check a standard example's output. If you run make test-update and commit the test_summary.ambr file in the tests/__snapshots/ directory, those snapshot tests should pass with all future as you have them now in all future make test executions.

To get coverage on the un-hit lines (for example, I am seeing the below line isn't covered by the snapshot) you can write tests similar to the rest of the ones in test_summary.py. So for example, to test the below branch:

# Limiting the number of modes displayed to two at maximum
elif len(mode_val) > 2:
    mode_val = "Greater than 2 Modes"
# Converting to string, then listing together
else:
    mode_val = ", ".join(str(i) for i in sorted(mode_val.to_list()))

We can use tests like:

@pytest.mark.parametrize("DataFrame", [pd.DataFrame, pl.DataFrame])
def test_gt_plt_summary_two_modes(DataFrame):
    df = DataFrame({"numeric": [1, 1, 2, 2, 3]})

    result = gt_plt_summary(df, add_mode=True)
    html = result.as_raw_html()

    assert '<td class="gt_row gt_left">1, 2</td>' in html

@pytest.mark.parametrize("DataFrame", [pd.DataFrame, pl.DataFrame])
def test_gt_plt_summary_greater_than_two_modes(DataFrame):
    df = DataFrame({"numeric": [1, 1, 2, 2, 3, 4, 4]})

    result = gt_plt_summary(df, add_mode=True)
    html = result.as_raw_html()

    assert '<td class="gt_row gt_left">Greater than 2 Modes</td>' in html

I know it's not the flashiest to do, but it would be nice to have these kinds of specific tests for each individual new parameter, and then optionally any edge cases you might want to include. (For example, testing behavior when add_mode=True and show_desc_stats=False.)

juleswg23 · 2025-08-13T14:31:37Z

gt_extras/summary.py

-    summary_df = _create_summary_df(df)
+    if new_color_mapping:
+        global COLOR_MAPPING
+        COLOR_MAPPING.update(new_color_mapping)


I am noticing a slight bug with this as I run the tests.

https://github.com/posit-dev/gt-extras/pull/160/checks#step:7:671

Specifically, the color it is checking for has changed (line 671 it's counting the blue fills), because in a previous call to gt_plt_summary(), the globals were updated.

It's not the prettiest fix, but one option is to set color_mapping = COLOR_MAPPING, then update color_mapping and pass it all the way down to the SVG functions.

I'm also open to other workarounds.

How would you feel about using a Pytest fixture that resets to the default color_mapping prior to the each separate test?

I don't think it solves the root of the problem.

If I run these three in succession, the third one is colored red, rather than orange.

import pandas as pd from gt_extras import gt_plt_summary df = pd.DataFrame({"numeric": [1, 1, 2, 2, 3]}) gt_plt_summary(df)

df = pd.DataFrame({"numeric": [1, 1, 2, 2, 3]}) gt_plt_summary(df, new_color_mapping={"numeric":"red"})

df = pd.DataFrame({"numeric": [1, 1, 2, 2, 3]}) gt_plt_summary(df)

Must of commented in the wrong spot originally, so there may be a similar response in this PR thread.

I could make a Pytest fixture that resets the color mapping to the preset from the summary.py script prior to each test.

Yes you could, and it's true that it would make the tests pass. But I'm not sure it would solve the underlying problem.

Take the above example - the user just has to change the colors once and it changes the color mapping for all subsequent calls, which feels like unexpected behavior. For that reason, changing the actual behavior of the function seems like a better approach to me.

gt_extras/summary.py

Introduces tests to verify gt_plt_summary correctly handles cases with two modes and more than two modes in both pandas and polars DataFrames.

ScottFB101 · 2025-08-14T01:35:38Z

I can't get 2 of the 500ish lines of summary.py to get coverage using pytest.

# Limiting the number of modes displayed to two at maximum
elif len(mode_val) > 2:
    mode_val = "Greater than 2 Modes"
# Converting to string, then listing together
else:
    mode_val = ", ".join(str(i) for i in sorted(mode_val.to_list()))

We can use tests like:

@pytest.mark.parametrize("DataFrame", [pd.DataFrame, pl.DataFrame])
def test_gt_plt_summary_two_modes(DataFrame):
    df = DataFrame({"numeric": [1, 1, 2, 2, 3]})

    result = gt_plt_summary(df, add_mode=True)
    html = result.as_raw_html()

    assert '<td class="gt_row gt_left">1, 2</td>' in html

@pytest.mark.parametrize("DataFrame", [pd.DataFrame, pl.DataFrame])
def test_gt_plt_summary_greater_than_two_modes(DataFrame):
    df = DataFrame({"numeric": [1, 1, 2, 2, 3, 4, 4]})

    result = gt_plt_summary(df, add_mode=True)
    html = result.as_raw_html()

    assert '<td class="gt_row gt_left">Greater than 2 Modes</td>' in html

Unfortunately, Pandas handles the Mode column as an object. I guess there's no native string column type? Polars can handle string columns, so the alignment in the data cell varies depending on dataframe type. For example, Pandas results in "gt_right", Polars in "gt_left".

For the time being, I strictly checked that the correct values were in the data cell, but didn't check alignment.

juleswg23 · 2025-08-14T04:02:44Z

Unfortunately, Pandas handles the Mode column as an object. I guess there's no native string column type? Polars can handle string columns, so the alignment in the data cell varies depending on dataframe type. For example, Pandas results in "gt_right", Polars in "gt_left".

For the time being, I strictly checked that the correct values were in the data cell, but didn't check alignment.

Yeah, I'm seeing this too. I'd guess there's some workaround within the dataframe, but we could supersede it and set the following:

gt = gt.cols_align(align="right", columns="Mode")

Just note that's not going to work if the mode column isn't present.

I also don't think it's that bad to leave it as is. In that case, the user can edit the table if they see fit.

Changed assertions in test_gt_plt_summary_two_modes and test_gt_plt_summary_greater_than_two_modes to verify the exact HTML table cell output instead of just substring matches. This ensures the tests are more robust and accurately reflect the rendered HTML structure.

Adds right alignment to the 'Mode' column in the summary table when add_mode is enabled

ScottFB101 · 2025-08-14T04:49:36Z

Unfortunately, Pandas handles the Mode column as an object. I guess there's no native string column type? Polars can handle string columns, so the alignment in the data cell varies depending on dataframe type. For example, Pandas results in "gt_right", Polars in "gt_left".
For the time being, I strictly checked that the correct values were in the data cell, but didn't check alignment.

Yeah, I'm seeing this too. I'd guess there's some workaround within the dataframe, but we could supersede it and set the following:
gt = gt.cols_align(align="right", columns="Mode")
Just note that's not going to work if the mode column isn't present.

I also don't think it's that bad to leave it as is. In that case, the user can edit the table if they see fit.

I added that code to summary.py in e3d54d8

ScottFB101 added 13 commits August 8, 2025 15:10

allow hiding of Mean, Median, and SD columns

757fc7e

Refactor summary stats column handling

6b90ffd

Mode column is now conditionally dropped based on the add_mode flag, and descriptive stats columns are managed more flexibly. This improves clarity and control over which summary statistics are displayed.

Descriptions of new boolean variables

a01e083

Add customizable color mapping to gt_plt_summary

497e249

Introduces a color_mapping parameter to gt_plt_summary, allowing users to override the default color scheme. A helper function change_color_mapping updates the global COLOR_MAPPING when user overrides are provided.

Enable interactivity by default in summary plots. Added interactivity…

fffc965

… to boolean and numeric plots. Changed the default value of the 'interactivity' parameter to True in all summary plot functions to ensure interactive plots are generated by default. Also

Add tests for change_color_mapping function

c8ea289

Added unit tests to verify that change_color_mapping correctly updates COLOR_MAPPING, handles None and empty dict inputs, and preserves existing keys. This improves test coverage for color mapping customization.

Fix column existence checks for summary formatting

ac1cff6

Adds checks to ensure that missing value handling and number formatting are only applied to columns present in the DataFrame. This prevents errors in Polars when columns like 'Mean', 'Median', 'SD', or 'Mode' are absent.

Remove .vscode/launch.json from .gitignore

a12833a

The .vscode/launch.json entry was deleted from .gitignore

Refactor color mapping override in gt_plt_summary

d4e26aa

Removed the change_color_mapping function and integrated color mapping overrides directly into gt_plt_summary via the new_color_mapping parameter. Updated related docstrings and cleaned up unused code and tests for change_color_mapping.

Document interactivity parameter in gt_plt_summary

cf23558

Added documentation for the 'interactivity' parameter in the gt_plt_summary function, explaining its purpose for toggling interactive features in Plot Overview column graphs.