Skip to content
Armen edited this page Oct 23, 2024 · 1 revision

make_cloud

Featurize SMILES strings, perform dimensionality reduction using UMAP and save the resulting chemical point cloud. Mordred descriptors that result in errors or otherwise non-numeric values are dropped with drop_non_numeric_columns

Parameters

path : str
The path to the file containing SMILES strings. Should include file extension.

path_out : str, default='cloud_out'
The output file name for the resulting chemical point cloud. Default is 'cloud_out'.

sep : str, default=','
The separator used in the file containing SMILES strings. Default is ','.

position : int, default=0
The index of the column containing the SMILES strings. Default is 0.

exl : bool, default=False
Flag indicating whether the input file is an Excel sheet.

head : bool, default=False
Indicates if there is a header in the input SMILES file.

Returns

cloud : numpy.ndarray of shape (N, 3) Coordinates for the resulting chemical point cloud (UMAP projection).

see_cloud

Visualize the chemical point cloud in a 3D scatter plot.

Parameters

f_out (str): The output file name or path (excluding file extension) for the plot.

points : numpy.ndarray of shape (N, 3)
The array of points representing the chemical point cloud.

save : bool, default=False
Flag indicating whether to save the plot. Default is False.

save_cloud

Save the chemical point cloud with corresponding SMILES strings to a human readable file.

Parameters

smiles_loc : str
The path/name of the file containg the SMILES to be saved.

f_out : str
The path/name of the file to be saved.

points : str, default='chem_cloud.npy'
The file path/name for the chemical point cloud.

sep : str, default=','
The delimitor to be used when parsing the file with SMILES.

position : int, default=0
The index location to be used when reading SMILES.

indx : str or None, default=None
The .npy file path/name of the downsampling indexs. If the '-d' or '--down' flags were not used for the comand line, or
if otherwise left as None, then the full point cloud is saved.

exl : bool, default=False
Flag indicating wether the input file is an Excel sheet. If set to True, the resulting output file will aslo be an Excel sheet.

heady : bool, default=False
Flag to indicate if the SMILES file contains a header line or not.

Returns

f_out : pd.DataFrame of shape (points, 4)
Dataframe to be saved containing SMILES and 3-D UMAP embeddings of a chemical point cloud.

drop_non_numeric_columns

Drop non-numeric columns from a DataFrame.

Arguments

df : pd.DataFrame
The input DataFrame.

Returns

df_out : pd.DataFrame
The modified DataFrame with non-numeric columns dropped.

Clone this wiki locally