Hydration free energies from kernel-based machine learning: Compound-database bias

The Journal of Chemical Physics 153 (2020)
Author

Rauer, Bereau

Published

2020-07-01

Doi



We consider the prediction of a basic thermodynamic property-hydration free energies-across a large subset of the chemical space of small organic molecules. Our in silico study is based on computer simulations at the atomistic level with implicit solvent. We report on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties but differs in key aspects: The representation is averaged over several conformers to account for the statistical ensemble. We also include an atomic-decomposition ansatz, which offers significant added transferability compared to molecular learning. Finally, we explore the existence of severe biases from databases of experimental compounds. By performing a combination of dimensionality reduction and cross-learning models, we show that the rate of learning depends significantly on the breadth and variety of the training dataset. Our study highlights the dangers of fitting machine-learning models to databases of a narrow chemical range.

from Orcid & CrossRef

This study considers the prediction of a basic thermodynamic property-hydration free energies-across a large subset of the chemical space of small organic molecules and reports on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties but differs in key aspects.

from Semantic Scholar
 @article{Rauer_2020, title={Hydration free energies from kernel-based machine learning: Compound-database bias}, volume={153}, ISSN={1089-7690}, url={http://dx.doi.org/10.1063/5.0012230}, DOI={10.1063/5.0012230}, number={1}, journal={The Journal of Chemical Physics}, publisher={AIP Publishing}, author={Rauer, Clemens and Bereau, Tristan}, year={2020}, month=jul }
from doi2bib
Webpage PDF

from Unpaywall