Skip to content

Conversation

gowerc
Copy link
Collaborator

@gowerc gowerc commented Aug 18, 2025

Closes #145

Hey @danielinteractive / @tobiasmuetze,

This contains my initial code proposal for implementing multiarm ancova support (note that it is currently missing documentation and tests).

Some key points:

  • To preserve backwards compatibility I've left the original ancova function unchanged. To support multiple groups a new naming scheme is required for the parameters which would break any existing code which is looking for the current names like trt / lsm_rf.

  • That is, if users want to use the new multi-group ancova they will have to explicitly set fun=ancova_m_groups in the call to analyse()

  • For the new naming convention I've just used the level numbers e.g. lsm_L0 = least squared means for the reference, lsm_L1 = least squared means for the first offset / coefficient. I've also used a more explicit trt_L1_L0 to mean the coefficient of the first offset from the reference.

  • As I think we agreed I'm currently only extracting the treatment effects against the reference e.g. L1 - L0 & L2 - L0. I'm not calculating / extracting all pairwise combinations.

  • For the naming convention I didn't want to use the user-provided group/level values as that can lead to a ton of special edge cases with whitespaces / special characters that are hard to predict and make value-extraction very error prone. Instead I opted for just renaming the group variable to rbmiGroup and re-leveling it to be L0, L1, etc. Only complication here was that the formula provided in theory can have interaction terms so had to create a function to recursively traverse the AST updating any references of group to rbmiGroup. I will need to put in extensive unit testing to make sure theres no edge cases here with things like I() functions.

If there any no main objections here I'll start trying to add docs and tests.

@gowerc gowerc linked an issue Aug 18, 2025 that may be closed by this pull request
@gowerc gowerc marked this pull request as ready for review August 18, 2025 11:43
@danielinteractive
Copy link
Collaborator

Hi @gowerc , cool, thanks for working on this!

Sorry if I had a look too late - my only initial high level comment would be that I don't really "like" that we would split out ANCOVA with multiple arms from the version with two arms. I would hope that we could extend the current naming scheme from ancova also to multiple arms. This would make maintenance as well as usage easier long term.

As a possible inspiration, please see https://github.com/johnsonandjohnson/junco/blob/main/R/ancova_rbmi.R where I implemented an ancova function with multiple arms (not focused on backwards compatibility for the naming scheme).

I think in the "worst" case one could just use a condition in the ancova function and have a different naming schedule for 2 arms vs. multiple arms.

What do you think?

@gowerc
Copy link
Collaborator Author

gowerc commented Aug 19, 2025

Hi @gowerc , cool, thanks for working on this!

Sorry if I had a look too late - my only initial high level comment would be that I don't really "like" that we would split out ANCOVA with multiple arms from the version with two arms. I would hope that we could extend the current naming scheme from ancova also to multiple arms. This would make maintenance as well as usage easier long term.

As a possible inspiration, please see https://github.com/johnsonandjohnson/junco/blob/main/R/ancova_rbmi.R where I implemented an ancova function with multiple arms (not focused on backwards compatibility for the naming scheme).

I think in the "worst" case one could just use a condition in the ancova function and have a different naming schedule for 2 arms vs. multiple arms.

What do you think?

Interestingly despite what their documentation says I don't think they actually support more than 2 arms, in particular they have near the top of the function checkmate::assert_factor(data[[group]], n.levels = 2L) which would error if there were more than 2 arms.

I am very conflicted here, I agree with everything you said and I am also not happy with splitting it out but that being said I don't see an easy way of merging them without breaking backwards compatibility as the naming scheme has to be different in order for it to make sense we currently have:

lsm_alt_visit_1
lsm_ref_visit_1
trt_visit_1

In hindsight these are already bad names that confuse users. If we were to extend this whilst maintaining backwards compatibility we'd end up with something like:

lsm_alt_visit_1
lsm_alt2_visit_1
lsm_ref_visit_1
trt_visit_1
trt_alt2_visit_1

Which I would argue is even worse.

Alternatively we could just have some code that dispatches to the different ANCOVA function based on the number of levels in group variable but this makes the documentation / explanation clunky e.g.

"If you have 2 groups then it will be formatted like this, O but if you have more than 2 it would be formatted like this even though half the values are the same"

Which is why I settled on just a separate function.

One option though to minimise maintenance is that we could deprecate anocva with a note that it will be removed in the future and replaced with ancova_m_groups then in 6 months to a years time we could fully remove it, we could also create an alias of ancova = ancova_m_groups after removing so we don't have such an awkward name.

(Despite my tone I am not confident on the best path forward here as all options appear bad to me so please do challenge if you still disagree)

@danielinteractive
Copy link
Collaborator

Interestingly despite what their documentation says I don't think they actually support more than 2 arms, in particular they have near the top of the function checkmate::assert_factor(data[[group]], n.levels = 2L) which would error if there were more than 2 arms.

Hmm... ok weird. But I am pretty sure we wanted this to work with multiple arms, that is also what the other code suggests... I will need to check in September when I am back.

I am very conflicted here, I agree with everything you said and I am also not happy with splitting it out but that being said I don't see an easy way of merging them without breaking backwards compatibility as the naming scheme has to be different in order for it to make sense we currently have:

lsm_alt_visit_1
lsm_ref_visit_1
trt_visit_1

Yeah I understand the need for backwards compatibility. But I wonder if an alternative solution here could be to have this old naming scheme used for 2 arms, and the new naming scheme used for more than 2 arms?

In hindsight these are already bad names that confuse users. If we were to extend this whilst maintaining backwards compatibility we'd end up with something like:

lsm_alt_visit_1
lsm_alt2_visit_1
lsm_ref_visit_1
trt_visit_1
trt_alt2_visit_1

Which I would argue is even worse.

Personally I think that would also be acceptable. The user will not see these names too much anyway if they just use the rbmi functions.

Alternatively we could just have some code that dispatches to the different ANCOVA function based on the number of levels in group variable but this makes the documentation / explanation clunky e.g.

Yeah I would not have two functions necessarily but just two naming schemes. Or just go with the naming scheme just described which I think is fine.

One option though to minimise maintenance is that we could deprecate anocva with a note that it will be removed in the future and replaced with ancova_m_groups then in 6 months to a years time we could fully remove it, we could also create an alias of ancova = ancova_m_groups after removing so we don't have such an awkward name.

(Despite my tone I am not confident on the best path forward here as all options appear bad to me so please do challenge if you still disagree)

Personally I would just go with the naming scheme you suggested, I think that is reasonable. Just for the sake of a better naming scheme I would not go the route of two functions or function deprecation. Just my 2 cents 😄

@tobiasmuetze
Copy link

Hi @gowerc / @danielinteractive,
I finally got around to looking at this.

I'd prefer to have a single ancova() function.

As for the naming convention, I agree with Daniel that the following

lsm_alt_visit_1
lsm_alt2_visit_1
lsm_ref_visit_1
trt_visit_1
trt_alt2_visit_1

is acceptable.

I am wondering if long term, it would be better to allow the user to match the group levels to ref, alt, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ancova to support > 2 arms
3 participants