Skip to content

Gt plt summary customization #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Aug 21, 2025

Conversation

ScottFB101
Copy link
Contributor

Summary

Thank you for contributing to gt-extras! To make this process easier for everyone, please explain the context and purpose of your contribution. Also, list the changes made to the existing code or documentation.

This PR addresses the 4 out of the 5 enhancements proposed in Issue #146. The one enhancement not addressed was: "Display times instead of just dates". I felt that was deserving of another separate PR.

The changes made to the existing code are rather straightforward, and don't need much background besides knowing they address the 4 enhancements previously referenced. This is my opinion of course, and I'm happy to elaborate more on the code here if there's a want for it.

As for the context and purpose of my contribution, I use so many posit-backed packages and tools, that it only feels right to get involved in contributing to the ecosystem.

Related GitHub Issues and PRs

Checklist

Introduces a 'Mode' column to the summary table in gt_plt_summary, with logic to handle cases with no singular mode or multiple modes. Updates function arguments to allow hiding descriptive stats and mode, and adjusts .gitignore to exclude VSCode launch configuration.
Mode column is now conditionally dropped based on the add_mode flag, and descriptive stats columns are managed more flexibly. This improves clarity and control over which summary statistics are displayed.
Introduces an 'interactivity' parameter to summary plot functions, allowing toggling of interactive features such as hover tooltips and CSS. All relevant plotting and SVG generation functions now accept and respect this parameter, enabling more flexible plot rendering.

NOTE: The code that creates the tooltips still runs, it just isn't append to the SVG Element.
Introduces a color_mapping parameter to gt_plt_summary, allowing users to override the default color scheme. A helper function change_color_mapping updates the global COLOR_MAPPING when user overrides are provided.
Refactored summary DataFrame creation to conditionally include descriptive statistics and mode based on parameters. Enabled interactivity by default in category bar and histogram SVG plots. Commented out redundant DataFrame column drops in gt_plt_summary.
… to boolean and numeric plots.

Changed the default value of the 'interactivity' parameter to True in all summary plot functions to ensure interactive plots are generated by default.

Also
Added unit tests to verify that change_color_mapping correctly updates COLOR_MAPPING, handles None and empty dict inputs, and preserves existing keys. This improves test coverage for color mapping customization.
Adds checks to ensure that missing value handling and number formatting are only applied to columns present in the DataFrame. This prevents errors in Polars when columns like 'Mean', 'Median', 'SD', or 'Mode' are absent.
The .vscode/launch.json entry was deleted from .gitignore
Removed the change_color_mapping function and integrated color mapping overrides directly into gt_plt_summary via the new_color_mapping parameter. Updated related docstrings and cleaned up unused code and tests for change_color_mapping.
Added documentation for the 'interactivity' parameter in the gt_plt_summary function, explaining its purpose for toggling interactive features in Plot Overview column graphs.
@juleswg23
Copy link
Collaborator

Thank you Scott! This is looking really awesome, I really think it upgrades the user experience for this function.

I've made some comments (all pretty small) to places I would consider edits. Take a look, and let me know if you think any of the changes are unreasonable, I'd be happy to consider other angles.

Once you feel like the PR is in a good place, the last thing to add are tests. This could probably be one snapshot test with all the optional parameters turned on, and then a few individual tests each checking one of the new parameters. When you run make test, it should report what lines are covered/uncovered... I'd love to be at full coverage for this PR!

Replaces the 'hide_desc_stats' parameter with 'show_desc_stats' in gt_plt_summary and related functions for clarity. Updates logic and docstrings to reflect the new parameter name and default value.
Improves the docstring for gt_plt_summary by clarifying the new_color_mapping parameter as a dictionary mapping data types to hex color codes, and adds a new example demonstrating custom color mapping with ocean swell data. Also corrects default value descriptions for show_desc_stats and add_mode.
Moved the creation and insertion of visual bars and tooltips inside the interactivity check in _make_categories_bar_svg. This ensures that these SVG elements are only added when interactivity is enabled, preventing unnecessary elements in static mode.
Adjusts the placement and conditional rendering of tooltips and visual bars in the _make_categories_bar_svg and _make_histogram_svg functions. Tooltips are now only created and appended when interactivity is enabled, preventing unnecessary elements in non-interactive SVGs.
@ScottFB101
Copy link
Contributor Author

Of course! Happy to help, it's been good practice for me.

Alright I made all the changes, but will admit I'm struggling a bit figuring out the snapshot test (not something I've dealt with before), so I can't get 2 of the 500ish lines of summary.py to get coverage using pytest.

@juleswg23
Copy link
Collaborator

juleswg23 commented Aug 13, 2025

I can't get 2 of the 500ish lines of summary.py to get coverage using pytest.

This is getting close! The snapshot test is great, it's not necessarily supposed to hit every branch, just to check a standard example's output. If you run make test-update and commit the test_summary.ambr file in the tests/__snapshots/ directory, those snapshot tests should pass with all future as you have them now in all future make test executions.

To get coverage on the un-hit lines (for example, I am seeing the below line isn't covered by the snapshot) you can write tests similar to the rest of the ones in test_summary.py. So for example, to test the below branch:

# Limiting the number of modes displayed to two at maximum
elif len(mode_val) > 2:
    mode_val = "Greater than 2 Modes"
# Converting to string, then listing together
else:
    mode_val = ", ".join(str(i) for i in sorted(mode_val.to_list()))

We can use tests like:

@pytest.mark.parametrize("DataFrame", [pd.DataFrame, pl.DataFrame])
def test_gt_plt_summary_two_modes(DataFrame):
    df = DataFrame({"numeric": [1, 1, 2, 2, 3]})

    result = gt_plt_summary(df, add_mode=True)
    html = result.as_raw_html()

    assert '<td class="gt_row gt_left">1, 2</td>' in html

@pytest.mark.parametrize("DataFrame", [pd.DataFrame, pl.DataFrame])
def test_gt_plt_summary_greater_than_two_modes(DataFrame):
    df = DataFrame({"numeric": [1, 1, 2, 2, 3, 4, 4]})

    result = gt_plt_summary(df, add_mode=True)
    html = result.as_raw_html()

    assert '<td class="gt_row gt_left">Greater than 2 Modes</td>' in html

I know it's not the flashiest to do, but it would be nice to have these kinds of specific tests for each individual new parameter, and then optionally any edge cases you might want to include. (For example, testing behavior when add_mode=True and show_desc_stats=False.)

summary_df = _create_summary_df(df)
if new_color_mapping:
global COLOR_MAPPING
COLOR_MAPPING.update(new_color_mapping)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am noticing a slight bug with this as I run the tests.

https://github.com/posit-dev/gt-extras/pull/160/checks#step:7:671

image

Specifically, the color it is checking for has changed (line 671 it's counting the blue fills), because in a previous call to gt_plt_summary(), the globals were updated.

It's not the prettiest fix, but one option is to set color_mapping = COLOR_MAPPING, then update color_mapping and pass it all the way down to the SVG functions.

I'm also open to other workarounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about using a Pytest fixture that resets to the default color_mapping prior to the each separate test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it solves the root of the problem.

If I run these three in succession, the third one is colored red, rather than orange.

import pandas as pd
from gt_extras import gt_plt_summary

df = pd.DataFrame({"numeric": [1, 1, 2, 2, 3]})
gt_plt_summary(df)
df = pd.DataFrame({"numeric": [1, 1, 2, 2, 3]})
gt_plt_summary(df, new_color_mapping={"numeric":"red"})
df = pd.DataFrame({"numeric": [1, 1, 2, 2, 3]})
gt_plt_summary(df)
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must of commented in the wrong spot originally, so there may be a similar response in this PR thread.

I could make a Pytest fixture that resets the color mapping to the preset from the summary.py script prior to each test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you could, and it's true that it would make the tests pass. But I'm not sure it would solve the underlying problem.

Take the above example - the user just has to change the colors once and it changes the color mapping for all subsequent calls, which feels like unexpected behavior. For that reason, changing the actual behavior of the function seems like a better approach to me.

Introduces tests to verify gt_plt_summary correctly handles cases with two modes and more than two modes in both pandas and polars DataFrames.
@ScottFB101
Copy link
Contributor Author

ScottFB101 commented Aug 14, 2025

I can't get 2 of the 500ish lines of summary.py to get coverage using pytest.

# Limiting the number of modes displayed to two at maximum
elif len(mode_val) > 2:
    mode_val = "Greater than 2 Modes"
# Converting to string, then listing together
else:
    mode_val = ", ".join(str(i) for i in sorted(mode_val.to_list()))

We can use tests like:

@pytest.mark.parametrize("DataFrame", [pd.DataFrame, pl.DataFrame])
def test_gt_plt_summary_two_modes(DataFrame):
    df = DataFrame({"numeric": [1, 1, 2, 2, 3]})

    result = gt_plt_summary(df, add_mode=True)
    html = result.as_raw_html()

    assert '<td class="gt_row gt_left">1, 2</td>' in html

@pytest.mark.parametrize("DataFrame", [pd.DataFrame, pl.DataFrame])
def test_gt_plt_summary_greater_than_two_modes(DataFrame):
    df = DataFrame({"numeric": [1, 1, 2, 2, 3, 4, 4]})

    result = gt_plt_summary(df, add_mode=True)
    html = result.as_raw_html()

    assert '<td class="gt_row gt_left">Greater than 2 Modes</td>' in html

Unfortunately, Pandas handles the Mode column as an object. I guess there's no native string column type? Polars can handle string columns, so the alignment in the data cell varies depending on dataframe type. For example, Pandas results in "gt_right", Polars in "gt_left".

For the time being, I strictly checked that the correct values were in the data cell, but didn't check alignment.

@juleswg23
Copy link
Collaborator

Unfortunately, Pandas handles the Mode column as an object. I guess there's no native string column type? Polars can handle string columns, so the alignment in the data cell varies depending on dataframe type. For example, Pandas results in "gt_right", Polars in "gt_left".

For the time being, I strictly checked that the correct values were in the data cell, but didn't check alignment.

Yeah, I'm seeing this too. I'd guess there's some workaround within the dataframe, but we could supersede it and set the following:

gt = gt.cols_align(align="right", columns="Mode")

Just note that's not going to work if the mode column isn't present.

I also don't think it's that bad to leave it as is. In that case, the user can edit the table if they see fit.

Changed assertions in test_gt_plt_summary_two_modes and test_gt_plt_summary_greater_than_two_modes to verify the exact HTML table cell output instead of just substring matches. This ensures the tests are more robust and accurately reflect the rendered HTML structure.
Adds right alignment to the 'Mode' column in the summary table when add_mode is enabled
@ScottFB101
Copy link
Contributor Author

Unfortunately, Pandas handles the Mode column as an object. I guess there's no native string column type? Polars can handle string columns, so the alignment in the data cell varies depending on dataframe type. For example, Pandas results in "gt_right", Polars in "gt_left".
For the time being, I strictly checked that the correct values were in the data cell, but didn't check alignment.

Yeah, I'm seeing this too. I'd guess there's some workaround within the dataframe, but we could supersede it and set the following:

gt = gt.cols_align(align="right", columns="Mode")

Just note that's not going to work if the mode column isn't present.

I also don't think it's that bad to leave it as is. In that case, the user can edit the table if they see fit.

I added that code to summary.py in e3d54d8

@juleswg23 juleswg23 merged commit 704e7b1 into posit-dev:main Aug 21, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants