Skip to content

Conversation

@jordandeklerk
Copy link
Member

@jordandeklerk jordandeklerk commented Oct 14, 2025

This significantly refactors and expands the test suite for arviz-stats. The goal here was to test core functionality and relevant edge cases for areas that either lacked test coverage or had very minimal exposure. I also moved all loo based testing into its own testing directory so we could expand on those tests and separate the modules out.

While this PR primarily focuses on robustifying the test suite, there were several changes to source code as well along the way. Some of these are (hopefully) small improvements, while others came up from some edge-case tests.

Source Changes

I think all fixes should be backward compatible. I know this is quite a large PR. Things escalated quickly to say the least 😄.


Resolves (first pass at least): #218


📚 Documentation preview 📚: https://arviz-stats--220.org.readthedocs.build/en/220/

@jordandeklerk jordandeklerk marked this pull request as ready for review October 15, 2025 02:35
@OriolAbril
Copy link
Member

I will try to review later today, on a quick look it looks good. I think the main challenge will be having our CI jobs run the relevant tests in representative enough environments given the complex relations between the optional dependencies.

It is also possible we have dead code laying around from discarded experiments

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have started going over the tests I will continue as it will take a while. So far the most recurring comment is adding tests of axis behaviour in array functions. The xarray related tests are definitely passing the argument to the array function, so it might not have any effect on actual coverage if looked at from a global perspective.

However, there are already many functions with tests for axis and given we have a minimal test job that runs tests on an env without xarray nor arviz-base I think it would be nice to try and get the global coverage of the library and the coverage of minimal test job (counted with respect to things inside base/* except the dataarray file).

Second common comment is try to avoid np.random.function for new code and use the default_rng() then rng.function

All source code related changes look good.

@jordandeklerk
Copy link
Member Author

I have started going over the tests I will continue as it will take a while. So far the most recurring comment is adding tests of axis behaviour in array functions. The xarray related tests are definitely passing the argument to the array function, so it might not have any effect on actual coverage if looked at from a global perspective.

However, there are already many functions with tests for axis and given we have a minimal test job that runs tests on an env without xarray nor arviz-base I think it would be nice to try and get the global coverage of the library and the coverage of minimal test job (counted with respect to things inside base/* except the dataarray file).

Second common comment is try to avoid np.random.function for new code and use the default_rng() then rng.function

All source code related changes look good.

Think I have all of the changes related to axes testing you mentioned, and fixed the np.random.function issues. Thanks for bringing that up, I always forget about this.

Appreciate the review so far as well. Going to continue looking for places to test axis behavior where it makes sense. I feel pretty good about the loo tests in terms of coverage and core testing functionality. For functions or modules with no coverage (something like survival.py, for example), I tried to just keep it simple for now and was hoping for feedback on anything specific we wanted to test.

@jordandeklerk jordandeklerk changed the title Refactor and expand test suite Refactor and expand tests Oct 19, 2025
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge already as it is a clear improvement on the current situation, but I think we still need to work on that, not so much in terms of coverage number or tests themselves but more organizational level things. This was already an issue before, but I am still not sure if I make a change to the kde where should new tests go, or without running them even where tests to update will be.

Comment on lines +68 to +69
@pytest.mark.filterwarnings("ignore::UserWarning")
@pytest.mark.filterwarnings("ignore::RuntimeWarning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the first I have seem but these are great, thanks! For legacy arviz we missed some warnings because the tests were too noisy and didn't see some deprecations until the function was removed for example

Do you know if it is possible to restrict ignores to matching errors too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning messages were getting wild in the test output because we're explicitly trying to raise a lot of these warnings. So I tried to go through and make sure we're not ignoring organic warnings that surface normally.

from arviz_stats.utils import ELPDData


@pytest.mark.filterwarnings("ignore:Estimated shape parameter:UserWarning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like the pattern to ignore warnings with matching message, TIL ❤️

@jordandeklerk
Copy link
Member Author

jordandeklerk commented Oct 24, 2025

I think we can merge already as it is a clear improvement on the current situation, but I think we still need to work on that, not so much in terms of coverage number or tests themselves but more organizational level things. This was already an issue before, but I am still not sure if I make a change to the kde where should new tests go, or without running them even where tests to update will be.

Agreed, I think the organization could definitely be better. Also, the workflows you mentioned in slack for running different tests suites based on dependencies would be very nice. The tests take slightly longer to run now given the new volume of tests, but I don't think it's too crazy.

I imagine the organization of where to put new tests can evolve more. I think for certain things it's obvious, like LOO-based tests, but to your point about KDE tests, I think right now there could be a couple places where they could live. So we can definitely centralize that more so it's more obvious.

@OriolAbril OriolAbril merged commit b962b0e into main Oct 27, 2025
4 checks passed
@OriolAbril OriolAbril deleted the improve-tests branch October 27, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants