-
Notifications
You must be signed in to change notification settings - Fork 11
Description
This is a follow-up to the previous issue I reported. It turns out that the problem isn't with molecules that have multiple fragments. The problem is that a few of the RDKit descriptors return nan when they encounter atom types that are not parameterized. The impacted descriptors are listed below. I'm willing to bet you could remove these from the descriptors you're currently using without impacting performance. Then again, these molecules are probably outside your applicability domain, and MolSkillScorer.score should return nan.
from rdkit.Chem.Descriptors import BCUT2D_MWHI, MaxPartialCharge
from rdkit import Chem
a = BCUT2D_MWHI(Chem.MolFromSmiles("CCC[Se]CCC"))
b = MaxPartialCharge(Chem.MolFromSmiles("CCC[Se]CCC"))
a,b
(nan, nan)
Here are the problematic descriptors
BCUT2D_MWHI
BCUT2D_MWLOW
BCUT2D_CHGHI
BCUT2D_CHGLO
BCUT2D_LOGPHI
BCUT2D_LOGPLOW
BCUT2D_MRHI
BCUT2D_MRLOW
MaxPartialCharge
MinPartialCharge
MaxAbsPartialCharge
MinAbsPartialCharge