This code is used to benchmark model performance for high throughput cell line screens.
drugfeats.pklcontains mOrdred descritpors without 3D features, that are scaled and imputedimputer.pklcontains a dictionary {imputer, scaler} to be used for on the fly feature generationrnaseq.pklcontains columns with label combo_auc.DRUG, combo_auc.AUC, combo_auc.CELLextended_combined_smileswhich matches smiles to combo_auc.DRUGcellpickle.pklcontains RNAseq data frame with label lincs.CELL for mergingextended_combined_smilescontains smiles matching combo_auc.DRUGextended_combined_mordred_descriptorscontains precomputed mordred descriptors for combo_auc.DRUGtestsmiles.smiis a SMILES formated file with 100k random sample smiles for testing
Training is done either with --mode [graph, desc, image] (RNN SMILES coming soon). Use python train.py -h for options.
For this benchmark the following commands were used:
python train.py --mode graph -o saved_models/graph_model.pt -w 32 -s cell
python train.py --mode desc -o saved_models/desc_model.pt -w 32 -s cell
python train.py --mode image -o saved_models/image_model.pt -w 32 -s cellAgain use python infer.py -h to see all options.
For this benchmark the following commands were used:
python infer.py --mode graph -o saved_models/graph_model.pt -w 32 -g 2 --smiles_file data/testsmiles.smi --output_file saved_models/graph_infers.txt
python infer.py --mode desc -o saved_models/desac_model.pt -w 32 -g 2 --smiles_file data/testsmiles.smi --output_file saved_models/desc_infers.txt
python infer.py --mode image -o saved_models/image_model.pt -w 32 -g 2 --smiles_file data/testsmiles.smi --output_file saved_models/image_infers.txt