Skip to content

Conversation

atharvas
Copy link
Contributor

This adds an abstract type AbstractPopMember. Right now, there is no clean way to store additional metadata that is relevant for calculating statistics of how each population is evolving. This PR serves as a first step towards a provenance framework. It enables subtyping the pop-member object to track additional information about each population member node.

Some issues (this PR should NOT be merged until we have concrete answers for these questions)

  • AbstractPopMember: Is this functionality desired in SR.jl?
  • The current code does not yield concrete types for many SR.jl functions (get_pareto_frontier, extract_from_worker, etc.). I've marked them as unstable right now. I'm hoping they don't have a big hit in benchmarking but if they do, then we'll need to take a closer look.
  • Alternatively, we can use the metadata entry in SymbolicExpression objects instead. This is slightly unwieldy as I'll need to unpack and repack the PopMember objects during the mutate! calls when the metadata is updated. I want to consider this before moving forward.

atharvas added 5 commits April 7, 2025 23:13
This commit introduces a new abstract type () for Population Members. This allows us to track additional information for each PopMember item. For instance, if we want to track the type of mutation that contributes most to each PopMember's performance.
…tPopMember around.

Long-term fix seems to be to move information in the metadata field of expression instead.
Copy link
Contributor

github-actions bot commented Apr 21, 2025

Benchmark Results

master e1feb44... master / e1feb44...
search/multithreading 14.9 ± 0.51 s 19 ± 0.51 s 0.784
search/serial 26.7 ± 0.39 s 33.6 ± 0.13 s 0.794
utils/best_of_sample 1.54 ± 0.29 μs 1.88 ± 0.34 μs 0.819
utils/check_constraints_x10 11.7 ± 3.1 μs 11.8 ± 3.2 μs 0.991
utils/compute_complexity_x10/Float64 2.11 ± 0.14 μs 2.15 ± 0.15 μs 0.981
utils/compute_complexity_x10/Int64 2.07 ± 0.13 μs 2.06 ± 0.14 μs 1
utils/compute_complexity_x10/nothing 1.54 ± 0.14 μs 1.48 ± 0.16 μs 1.04
utils/insert_random_op_x10 4.93 ± 1.9 μs 5.64 ± 1.9 μs 0.874
utils/next_generation_x100 0.346 ± 0.018 ms 0.35 ± 0.026 ms 0.988
utils/optimize_constants_x10 0.0339 ± 0.0085 s 0.0347 ± 0.008 s 0.976
utils/randomly_rotate_tree_x10 5.29 ± 0.66 μs 5.36 ± 0.62 μs 0.987
time_to_load 2.24 ± 0.0055 s 2.35 ± 0.012 s 0.952

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@MilesCranmer
Copy link
Owner

  • AbstractPopMember: Is this functionality desired in SR.jl?

Sounds good to me!

  • The current code does not yield concrete types for many SR.jl functions (get_pareto_frontier, extract_from_worker, etc.). I've marked them as unstable right now. I'm hoping they don't have a big hit in benchmarking but if they do, then we'll need to take a closer look.

Hm. There's probably a way to fix this. @unstable should only be used as a last resort.

  • Alternatively, we can use the metadata entry in SymbolicExpression objects instead. This is slightly unwieldy as I'll need to unpack and repack the PopMember objects during the mutate! calls when the metadata is updated. I want to consider this before moving forward.

I think we should just pick whatever is most semantically correct, and make it work. For AbstractExpression objects, they tend to only hold information that is independent of the dataset and search. So this is why the loss and cost are stored in the PopMember - because they are properties of the dataset and search, and should be updated if you pass a new dataset in, or restart the search. But expression objects are kind of independent of that sorta thing.

So with this in mind, I guess PopMember makes more sense for the types of metadata you'd want to attach?

) where {D<:Dataset}
_validate_options(datasets, ropt, options)
state = _create_workers(datasets, ropt, options)
state = _create_workers(PopMember, datasets, ropt, options)
Copy link
Owner

@MilesCranmer MilesCranmer Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. This might be why its unstable. This should probably be the fully parametrized type. Like PopMember{T,L,N} or whatever.

I would actually recommend storing the pop member type in options, then you can just get it directly from there without needing to pass it around. So you would write this as options.popmember_type{T,L,E} or whatever.

Though ideally you would have a dedicated function that turns PopMember into PopMember{T,L,N} like with_type_parameters{::Type{<:AbstractPopMember}, options, dataset, ropt) so that if someone's type has additional type parameters, they can override the behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is similar to how node_type and expresison_type are passed around. I can rework the code with this in mind.

@atharvas
Copy link
Contributor Author

atharvas commented Apr 25, 2025

Sounds good. I'll make the necessary changes and push again. I wasn't impressed with the benchmarking results either so I definitely need to rework this to remove the @unstable's.

@MilesCranmer
Copy link
Owner

Ping on this; want any help just let me know

@atharvas
Copy link
Contributor Author

atharvas commented Aug 5, 2025

Hey sorry for the delay on this. I'm going to restart work in this and hope to have it in a merge-able state end of the week.

@atharvas
Copy link
Contributor Author

atharvas commented Aug 29, 2025

@MilesCranmer . Apologies but I don't think I'll have the bandwidth to finish this PR until the end of September (hopefully in time for v2.1, if I can't make v2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants