Skip to content

andreagrisafi/SALTED

Repository files navigation

SALTED: Symmetry-Adapted Learning of Three-dimensional Electron Densities

This repository contains an implementation of symmetry-adapted Gaussian Process Regression suitable to perform equivariant learning and prediction of the electron density of molecular and condensed-phase systems, together with its static linear response function to applied electric fields. This is done by representing the continuous scalar (density) and vector (density-response) fields on a linear basis of atom-centered radial functions and spherical harmonics.

Documentation

A quick-start guide is provided below; full documentation is also available.

References

  1. Andrea Grisafi, Alberto Fabrizio, David M. Wilkins, Benjamin A. R. Meyer, Clemence Corminboeuf, Michele Ceriotti, "Transferable Machine-Learning Model of the Electron Density", ACS Central Science 5, 57 (2019) [https://pubs.acs.org/doi/10.1021/acscentsci.8b00551]
  2. Alberto Fabrizio, Andrea Grisafi, Benjamin A. R. Meyer, Michele Ceriotti, Clemence Corminboeuf, "Electron density learning of non-covalent systems", Chemical Science 10, 9424 (2019) [https://pubs.rsc.org/en/content/articlelanding/2019/sc/c9sc02696g]
  3. Alan M. Lewis, Andrea Grisafi, Michele Ceriotti, Mariana Rossi, "Learning electron densities in the condensed-phase", Journal of Chemical Theory and Computation 17, 7203 (2021) [https://pubs.acs.org/doi/10.1021/acs.jctc.1c00576]
  4. Andrea Grisafi, Alan M. Lewis, Mariana Rossi, Michele Ceriotti, "Electronic-Structure Properties from Atom-Centered Predictions of the Electron Density", Journal of Chemical Theory and Computation 19, 4451 (2023) [https://pubs.acs.org/doi/10.1021/acs.jctc.2c00850]

Installation

In the SALTED directory, simply run make, followed by pip install .

Dependencies

--> featomic: featomic installation requires a RUST compiler. To install a RUST compiler, run: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh && source "$HOME/.cargo/env" featomic can then be installed using pip install git+https://github.com/metatensor/featomic.git

--> mpi4py: mpi4py is required to use MPI parallelisation; SALTED can nonetheless be run without this. A parallel h5py installation is required to use MPI parellelisation. This can be installed by running: HDF5_MPI="ON" CC=mpicc pip install --no-cache-dir --no-binary=h5py h5py provided HDF5 has been compiled with MPI support.

--> pip install meson ninja to run f2py using meson backend following versions of Python >= 3.12.

Input file

SALTED input is provided in a inp.yaml file, which is structured in the following sections:

  • salted (required): define root storage directory, workflow label and learning target
  • system (required): define system parameters
  • qm (required): define information about quantum-mechanical reference
  • descriptor (required): define parameters of symmetry-adapted descriptors
  • gpr (required): define Gaussian Process Regression parameters
  • prediction (optional): manage predictions on unseen datasets

Input Dataset

Input structures are required in extXYZ format; the corresponding filename must be specified in the inp.system.filename. Training data consists in the expansion coefficients of the scalar/vector field over atom-centered basis functions made of radial functions and spherical harmonics. These coefficients are computed following density-fitting (DF), a.k.a. resolution of the identity, approximations, commonly applied in electronic-structure codes. We assume to work with orthonormalized real spherical harmonics defined with the Condon-Shortley phase convention. No restriction is instead imposed on the nature of the radial functions. Because of the non-orthogonality of the basis functions, the 2-center electronic integral matrices associated with the given density-fitting approximation are also required as input. The electronic-structure codes that are to date interfaced with SALTED are:

  • FHI-aims
  • CP2K
  • PySCF

We refer to the code-specific examples for how to produce the required quantum-mechanical data.

Usage

The root directory used for storing SALTED data is specified in inp.salted.saltedpath. Depending on the chosen input parameters, a SALTED workflow can be labelled adding a coherent string in the inp.salted.saltedname variable; in turn, this defines the name of the output folders that are automatically generated during the program execution. The type of SALTED target can be selected by specifying inp.salted.saltedtype: density, when asking to learn electron density, or inp.salted.saltedtype: density-response, when asking to learn the electron-density linear response to applied electric fields. SALTED functions can be run either by importing the corresponding modules in Python, or directly from command line. MPI parallelization can be activated by setting inp.system.parallel as True, and can be used, whenever applicable, to parallelize the calculation of SALTED functions over training data. In what follows, we report an example of a general command line workflow:

  1. Initialize structural features defined from 3-body symmetry-adapted descriptors, $P^L$, as computed following PRL 120, 036002 (2018):

    python3 -m salted.initialize

    An optional sparsify subsection can be added to the inp.descriptor input section in order to reduce the feature space size down to ncut sparse features selected using a "farthest point sampling" (FPS) algorithm. To facilitate this procedure, it is possible to perform the FPS selection over a subset of nsamples configurations, selected at random from the entire training dataset.

  2. Find sparse set of inp.gpr.Menv atomic environments in order to recast the SALTED problem into a low dimensional space. The non-linearity degree of the model must be defined at this stage by setting the variable inp.gpr.z as a positive integer. z=1 corresponds to a linear model.

    python3 -m salted.sparse_selection

  3. Compute sparse vectors of descriptors $P^L_M$ for each atomic type and angular momentum:

    python3 -m salted.sparse_descriptor (MPI parallelizable)

  4. Compute sparse equivariant kernels $k^L_{MM}$ and find projector matrices over the Reproducing Kernel Hilbert Space (RKHS):

    python3 -m salted.rkhs_projector

  5. Compute equivariant kernels $k^L_{NM}$ over the entire dataset and project them on the RKHS to obtain the final SALTED input vectors:

    python3 -m salted.rkhs_vector (MPI parallelizable)

  6. Build the Hessian matrix of the quadratic RKHS problem over a maximum of inp.gpr.Ntrain training structures selected from the entire dataset; these can be either selected at random (inp.gpr.trainsel: random) or sequentially (inp.gpr.trainsel: sequential). The remaining structures will be automatically retained for validation. The variable inp.gpr.trainfrac can be used to define the fraction of the total training data to be used: this can go from 0 to 1 in order to make learning curves while keeping the validation set fixed.

    python3 -m salted.hessian_matrix (MPI parallelizable)

  7. Solve the regression problem with a given regularization parameter inp.gpr.regul.

    python3 -m salted.solve_regression

    NB: when the dimensionality exceeds $10^5$, it is recommended to perform a direct minimization of the SALTED loss function in place of an explicit matrix inversion (points 6 and 7). If the dimensionality exceeds $10^5$, the loss function must be minimized directly. This can be run as follows:

    python3 -m salted.minimize_loss (MPI parallelizable)

  8. Validate predictions over the structures that have not been retained for training by computing the root mean square error in agreement to the definition of the SALTED loss function.

    python3 -m salted.validation (MPI parallelizable)

  9. Once the SALTED model has been trained and validated, SALTED predictions for a new unseen dataset can be handled according to the inp.prediction section. For that, a inp.prediction.filename must be specified in XYZ format, while a inp.prediction.predname string can be defined to label the prediction directories. Equivariant predictions can then be run as follows:

    python3 -m salted.prediction (MPI parallelizable)

Contact

andrea.grisafi@ens.psl.eu

alan.m.lewis@york.ac.uk

About

Symmetry-Adapted Learning of Three-dimensional Electron Densities (and their electrostatic response)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy