Skip to content

Output numer of observations aggregated from scanpy.get.aggregate #3822

@ilan-gold

Description

@ilan-gold

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

The current aggregate function does not output how many things were aggregated to get the aggregation i.e., cells into metacells, replicates in conditions, etc.

Offhand, since by can be a list, I would think we want at least two obs columns added:

  1. n_obs_aggregated represents the total number of observations that have been aggregated into a given row of the returned AnnData object
  2. f"n_{by[i]}_aggregated" for eachby counts how many of a given subgroup are present if by is a list. So if you only aggregate by cell type, for example, i.e., by="cell_type", this column would not be present because it would not make sense - that value is simply n_obs_aggregated. But if it wereby=["patient", "cell_type"] then you would have n_obs_aggregated is the number of cells present in each patient-celltype row, n_cell_type_aggregated is the number of cells of that cell type present in the row, nad n_patient_aggregated is the number of patients

If we do point 2., we should better settle on the naming convention because it would basically represent a breaking change if we were to alter it down the line (until scanpy 2.0)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions