Due to time constraints, our CI tests only compare the first five hypotheses that are generated during learning. It would help to have some testing workflows that execute infrequently and periodically (e.g., once every week), which check models generated deeper in the learning process (e.g., after 20 rounds). There are guides on how this could be done using GitHub actions, for example, here.