datagen

Subpackage for Kohn-Sham calculations, density fitting and training data generation.

The subpackage provides all necessary functionalities to translate molecular geometry datasets into neural network training data.

Overview

The subpackage is divided into two parts: low-level modules and high-level modules. The low-level modules work on single molecules and do all the necessary calculations. The high-level modules scale these calculations to datasets and ensure correct saving of the data as well as parallelization.

Low-level Modules (Methods)

High-Level Modules

Datasets

Current Timings

The current timings on large machines using 64 cores, 128 threads in parallel.

  • QM9 Kohn-Sham: 8 seconds per iteration -> 300 CPU hours

  • QM9 Density Fitting: 10 seconds per iteration -> 360 CPU hours

Modules

datasets

Contains the data generation dataset classes.

generate_labels_dataset

Compute labels for molecules in the dataset and save them as zarr.zip files.

kohn_sham_dataset

Computing Kohn-Sham data for a dataset.

methods

Methods for Kohn-Sham calculations, density fitting and label saving.

transform_dataset

Applying a basis transformation to the dataset.