datagen
Subpackage for Kohn-Sham calculations, density fitting and training data generation.
The subpackage provides all necessary functionalities to translate molecular geometry datasets into neural network training data.
Overview
The subpackage is divided into two parts: low-level modules and high-level modules. The low-level modules work on single molecules and do all the necessary calculations. The high-level modules scale these calculations to datasets and ensure correct saving of the data as well as parallelization.
Low-level Modules (Methods)
mldft.datagen.methods.ksdft_calculationwraps and patches pyscf to save every iteration of the Kohn-Sham computation.mldft.datagen.methods.density_fittingfits coefficients of a new basis to the coefficients of the Kohn-Sham basis.mldft.datagen.methods.label_generationcalculates labels for energies and gradients.mldft.datagen.methods.save_labels_in_zarr_filesaves computed labels in a .zarr file.
High-Level Modules
mldft.datagen.kohn_sham_datasethandles Kohn-Sham calculations on a dataset.mldft.datagen.generate_labels_datasethandles density fitting, label generation and saving of labels on a dataset.
Datasets
mldft.datagen.datasets.datasetdefines the interface for a dataset.mldft.datagen.datasets.qm9provides the QM9 dataset.mldft.datagen.datasets.qmugsprovides the QMUGS dataset.mldft.datagen.datasets.miscprovides a general dataset for arbitrary xyz files.
Current Timings
The current timings on large machines using 64 cores, 128 threads in parallel.
QM9 Kohn-Sham: 8 seconds per iteration -> 300 CPU hours
QM9 Density Fitting: 10 seconds per iteration -> 360 CPU hours
Modules
Contains the data generation dataset classes. |
|
Compute labels for molecules in the dataset and save them as zarr.zip files. |
|
Computing Kohn-Sham data for a dataset. |
|
Methods for Kohn-Sham calculations, density fitting and label saving. |
|
Applying a basis transformation to the dataset. |