-
Notifications
You must be signed in to change notification settings - Fork 48
temp PR: Visium HD 4.0 #333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
| # Cell Segmentation keys | ||
| CELL_SEG_KEY_HD = 'cell_segmentations' | ||
| NUCLEUS_SEG_KEY_HD = 'nucleus_segmentations' | ||
| CELL_SEG_KEY_HD = "cell_segmentations" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and there you will see some code formatting changes done automatically from the pre-commit. Running pre-commit install on your machine should enable pre-commit and automatically change the code upon committing it.
| load_nucleus_segmentations | ||
| If `True` and nucleus segmentation files are present, load nucleus segmentation polygons and the corresponding | ||
| nucleus-filtered count table. The counts are aggregated from the 2 µm binned matrix using the provided | ||
| barcode mappings so that only bins under segmented nuclei contribute to each cell’s counts. Requires all of: | ||
| nucleus segmentation GeoJSON, barcode_mappings.parquet, and the 2 µm filtered_feature_bc_matrix.h5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter description was missing.
| def _extract_geometries_from_geojson( | ||
| adata: AnnData, | ||
| geojson_path: Path, | ||
| ) -> GeoDataFrame: | ||
| """Extract geometries and create a GeoDataFrame from a GeoJSON features map. | ||
| Parameters | ||
| ---------- | ||
| cell_adata : anndata.AnnData | ||
| cell_adata | ||
| AnnData object containing cell data. | ||
| geojson_features_map : dict[str, Any] | ||
| Dictionary mapping cell IDs to GeoJSON features. | ||
| geojson_path | ||
| Path to the GeoJSON file containing cell segmentation geometries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged the functions _extract_geometries_from_geojson() and _make_geojson_features_map() by removing the second function. In fact, the return of that function was used exclusively by the first function.
|
@stephenwilliams22 I finished the code review. My changes to your PR can be seen in this diff view https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c. In general the PR looks great, thanks for the contribution! My comments above describe my code changes and there is only one missing thing (I cannot reproduce the tests), described in this comment here: https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c#r2467148123. To proceed, please merge the branch of this PR directly to the branch of your PR as I cannot write to yours (since it's a fork from the 10x Genomics org and not a personal fork). I will then merge back your branch to this branch. |
|
Thanks for going over this @LucaMarconato ! For my unit tests, how do you all typically run them? Is there built in infrastructure? My testing bundle is certainly smaller than a normal HD run but is not "tiny" at close to 300Mb. I don't want this to mess up any automated testing you might have. |
@LucaMarconato I have merged this branch into my branch. I'll have a look at tests and hopefully we'll be good to go after that. |
…aldata-io into stephen/spaceranger4.0
We have 2 types of tests. The tests for the "tiny" datasets are here in GitHub. As you observed, ideally the datasets would be smaller, but since we don't have many of them, we can proceed with them even if they are 300MB. Then we have a second set of tests, with larger datasets, that we run only right before making a release. They currently run on a AirFlow Pipeline that is only accessible to us. We are looking into way to change this so that we can also have that public, but for the moment the current setup is the most cost effective. |
Temporary PR to contribute to #328 (for which I have no push permission).
Instead of merging the current PR, we can merge this into the remote fork from which #328 is opened from. Merge #328 and close this PR.