compute_dataset_statistics

Entry point for computing dataset statistics.

The statistics are computed from the training set and saved to disk, at the output_dir of the run, as a .zarr file. Additionally, if extra_save_path is specified, the statistics are also saved to that path. By default, this is set to the location of where the statistics are loaded from in the given training configuration.

By default, no transforms are applied to the dataset.

To run e.g. with local frames, specify the following in configs/ml/statistics.yaml (or override from any other configuration file):

data:
 datamodule:
   transforms:
    - _target_: mldft.ml.data.components.convert_transforms.ToTorch
    - _target_: mldft.ml.data.components.convert_transforms.AddAtomCooIndices
    - _target_: mldft.ml.data.components.convert_transforms.ToLocalFrames
      matrices: (,)
    - _target_: mldft.ml.data.components.convert_transforms.ToNumpy

Warning

Make sure that the transforms are consistent with what you plan to do during training!

main(cfg: DictConfig) → DatasetStatistics[source]

Entry point for computing dataset statistics.

Parameters:: cfg – DictConfig configuration composed by Hydra.
Returns:: Dataset statistics object.