Refactor and expand tests #220

jordandeklerk · 2025-10-14T03:24:01Z

This significantly refactors and expands the test suite for arviz-stats. The goal here was to test core functionality and relevant edge cases for areas that either lacked test coverage or had very minimal exposure. I also moved all loo based testing into its own testing directory so we could expand on those tests and separate the modules out.

While this PR primarily focuses on robustifying the test suite, there were several changes to source code as well along the way. Some of these are (hopefully) small improvements, while others came up from some edge-case tests.

Source Changes

src/arviz_stats/accessors.py:276
- Fixed typo in error message: changed self._obs.name to self._obj.name
src/arviz_stats/numba/intervals.py
- Fixed missing return statement in quantile() function (line 31)
- Fixed in-place assignment in _quantile() (line 21): changed result = ... to result[:] = ...
src/arviz_stats/numba/diagnostics.py
- Fixed DataTree group access in rhat() (line 121): changed ds[group] to ds[group].dataset
- Fixed DataTree group access in ess() (line 286): changed ds[group] to ds[group].dataset
src/arviz_stats/sampling_diagnostics.py:388
- Fixed recursive call: changed rhat to rhat_nested in data.map() call
src/arviz_stats/base/dataarray.py:325-328
- Added defensive fallback in thinning factor calculation to ensure factor >= 1
- Prevents issues when ESS exceeds n_samples (probably unlikely in practice with real data, but is possible with synthetic data)
src/arviz_stats/survival.py (lines 25-27, lines 106-108)
- Improved var_names parameter in kaplan_meier() and generate_survival_curves()
- Now accepts both str and list[str] and is not listed as optional anymore
src/arviz_stats/base/dataarray.py (line 115, lines 125-129)
- Added method parameter to rhat_nested() with default "rank"
src/arviz_stats/sampling_diagnostics.py (lines 378-416)
- Propagated method parameter throughout all rhat_nested() calls
- Ensures consistent behavior across DataArray, Dataset, and DataTree interfaces
src/arviz_stats/numba/diagnostics.py:69
- Switched from raw_rearrange to rearrange (was getting an error for raw_rearrange)

I think all fixes should be backward compatible. I know this is quite a large PR. Things escalated quickly to say the least 😄.

Resolves (first pass at least): #218

📚 Documentation preview 📚: https://arviz-stats--220.org.readthedocs.build/en/220/

… inputs and add tests

…antile handling; add tests for new functionality

OriolAbril · 2025-10-15T14:59:03Z

I will try to review later today, on a quick look it looks good. I think the main challenge will be having our CI jobs run the relevant tests in representative enough environments given the complex relations between the optional dependencies.

It is also possible we have dead code laying around from discarded experiments

OriolAbril

I have started going over the tests I will continue as it will take a while. So far the most recurring comment is adding tests of axis behaviour in array functions. The xarray related tests are definitely passing the argument to the array function, so it might not have any effect on actual coverage if looked at from a global perspective.

However, there are already many functions with tests for axis and given we have a minimal test job that runs tests on an env without xarray nor arviz-base I think it would be nice to try and get the global coverage of the library and the coverage of minimal test job (counted with respect to things inside base/* except the dataarray file).

Second common comment is try to avoid np.random.function for new code and use the default_rng() then rng.function

All source code related changes look good.

tests/base/test_array.py

jordandeklerk · 2025-10-17T03:35:44Z

I have started going over the tests I will continue as it will take a while. So far the most recurring comment is adding tests of axis behaviour in array functions. The xarray related tests are definitely passing the argument to the array function, so it might not have any effect on actual coverage if looked at from a global perspective.

However, there are already many functions with tests for axis and given we have a minimal test job that runs tests on an env without xarray nor arviz-base I think it would be nice to try and get the global coverage of the library and the coverage of minimal test job (counted with respect to things inside base/* except the dataarray file).

Second common comment is try to avoid np.random.function for new code and use the default_rng() then rng.function

All source code related changes look good.

Think I have all of the changes related to axes testing you mentioned, and fixed the np.random.function issues. Thanks for bringing that up, I always forget about this.

Appreciate the review so far as well. Going to continue looking for places to test axis behavior where it makes sense. I feel pretty good about the loo tests in terms of coverage and core testing functionality. For functions or modules with no coverage (something like survival.py, for example), I tried to just keep it simple for now and was hoping for feedback on anything specific we wanted to test.

OriolAbril

I think we can merge already as it is a clear improvement on the current situation, but I think we still need to work on that, not so much in terms of coverage number or tests themselves but more organizational level things. This was already an issue before, but I am still not sure if I make a change to the kde where should new tests go, or without running them even where tests to update will be.

tests/base/test_core.py

tests/loo/test_loo_ap.py

OriolAbril · 2025-10-24T14:38:19Z

tests/loo/test_loo_expectations.py

+@pytest.mark.filterwarnings("ignore::UserWarning")
+@pytest.mark.filterwarnings("ignore::RuntimeWarning")


not the first I have seem but these are great, thanks! For legacy arviz we missed some warnings because the tests were too noisy and didn't see some deprecations until the function was removed for example

Do you know if it is possible to restrict ignores to matching errors too?

The warning messages were getting wild in the test output because we're explicitly trying to raise a lot of these warnings. So I tried to go through and make sure we're not ignoring organic warnings that surface normally.

OriolAbril · 2025-10-24T14:41:46Z

tests/loo/test_reloo.py

+from arviz_stats.utils import ELPDData
+
+
+@pytest.mark.filterwarnings("ignore:Estimated shape parameter:UserWarning")


this looks like the pattern to ignore warnings with matching message, TIL ❤️

jordandeklerk · 2025-10-24T15:44:36Z

I think we can merge already as it is a clear improvement on the current situation, but I think we still need to work on that, not so much in terms of coverage number or tests themselves but more organizational level things. This was already an issue before, but I am still not sure if I make a change to the kde where should new tests go, or without running them even where tests to update will be.

Agreed, I think the organization could definitely be better. Also, the workflows you mentioned in slack for running different tests suites based on dependencies would be very nice. The tests take slightly longer to run now given the new volume of tests, but I don't think it's too crazy.

I imagine the organization of where to put new tests can evolve more. I think for certain things it's obvious, like LOO-based tests, but to your point about KDE tests, I think right now there could be a couple places where they could live. So we can definitely centralize that more so it's more obvious.

jordandeklerk added 10 commits October 9, 2025 19:40

tests: add more tests for bayes_factor

9c94967

tests: add more tests for ecdf_utils

071389b

tests: supress warnings and expand tests in test_metrics

64ccff5

tests: fix rhat_nested method and add more tests

bede343

tests: refactor Kaplan-Meier functions to accept both string and list…

ca652bd

… inputs and add tests

tests: add more tests for array functions and base functions

159054b

fix: update diagnostics and intervals to use rearrange and improve qu…

e16cdd7

…antile handling; add tests for new functionality

refactor: create loo testing directory and refactor

0267f36

tests: robustify loo related tests; fix minimal test env error

139153a

tests: fix import issues

089bcb1

jordandeklerk marked this pull request as ready for review October 15, 2025 02:35

jordandeklerk requested review from OriolAbril and aloctavodia October 15, 2025 02:36

OriolAbril reviewed Oct 15, 2025

View reviewed changes

tests: add axes testing and fix rng

843de82

jordandeklerk added 2 commits October 17, 2025 23:43

tests: add more axis testing

6671a3e

feat: add inf handling for numpy logsumexp and tests

10ba5ca

jordandeklerk changed the title ~~Refactor and expand test suite~~ Refactor and expand tests Oct 19, 2025

OriolAbril approved these changes Oct 24, 2025

View reviewed changes

refactor: standardize tests and remove linting disables

5037e30

OriolAbril merged commit b962b0e into main Oct 27, 2025
4 checks passed

OriolAbril deleted the improve-tests branch October 27, 2025 15:32

jordandeklerk mentioned this pull request Oct 27, 2025

Add Jacobian tests for PSIS-LOO-CV subsampling #231

Merged

		@pytest.mark.filterwarnings("ignore::UserWarning")
		@pytest.mark.filterwarnings("ignore::RuntimeWarning")

		from arviz_stats.utils import ELPDData


		@pytest.mark.filterwarnings("ignore:Estimated shape parameter:UserWarning")

Uh oh!

Refactor and expand tests #220

Refactor and expand tests #220

Uh oh!

Conversation

jordandeklerk commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Source Changes

Uh oh!

OriolAbril commented Oct 15, 2025

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jordandeklerk commented Oct 17, 2025

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

OriolAbril Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

jordandeklerk Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

OriolAbril Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

jordandeklerk commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jordandeklerk commented Oct 14, 2025 •

edited

Loading

jordandeklerk commented Oct 24, 2025 •

edited

Loading