Skip to content

Figure out how to archive Babel outputs #623

@gaurav

Description

@gaurav

We probably want to archive Babel files as generated using the UMLS Level 0 data to avoid some copyright/licensing issues.

I propose to archive Babel outputs in the following files:

  • config.yaml (used to generate this Babel run): 7K
  • compendia.tar.gz (Compendia files): 20G
  • synonyms.tar.gz (Synonym files): 34G
  • conflation.tar.gz (Conflation files): 171M
  • reports.tar.gz (Reports): 124M
  • intermediate.tar.gz (Intermediate files): 8.5G
  • parquet.tar.gz (Parquet files from duckdb/parquet): 126G
  • kgx.tar.gz (KGZ files): 27G
  • metadata.tar.gz (top-level metadata files): 11K

Alternatively, we could combine the core Babel outputs as:

  • babel-outputs.tar.gz (Compendia, synonyms and conflation files):

We should not archive the following directories:

  • duckdb/duckdbs (DuckDB files): 171G
  • duckdb/parquet (Parquet files): 146G
  • sapbert-training-data/ (SAPBERT training data): 24G
  • logs/ (Logs from errors in previous runs): 2.6G
  • .snakemake/ (Logs, config, etc.): 16M

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions