-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
We probably want to archive Babel files as generated using the UMLS Level 0 data to avoid some copyright/licensing issues.
I propose to archive Babel outputs in the following files:
- config.yaml (used to generate this Babel run): 7K
- compendia.tar.gz (Compendia files): 20G
- synonyms.tar.gz (Synonym files): 34G
- conflation.tar.gz (Conflation files): 171M
- reports.tar.gz (Reports): 124M
- intermediate.tar.gz (Intermediate files): 8.5G
- parquet.tar.gz (Parquet files from duckdb/parquet): 126G
- kgx.tar.gz (KGZ files): 27G
- metadata.tar.gz (top-level metadata files): 11K
Alternatively, we could combine the core Babel outputs as:
- babel-outputs.tar.gz (Compendia, synonyms and conflation files):
We should not archive the following directories:
- duckdb/duckdbs (DuckDB files): 171G
- duckdb/parquet (Parquet files): 146G
- sapbert-training-data/ (SAPBERT training data): 24G
- logs/ (Logs from errors in previous runs): 2.6G
- .snakemake/ (Logs, config, etc.): 16M
Metadata
Metadata
Assignees
Labels
No labels