small_dataset

class SmallDataset(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str, molecules: list[M], num_processes: int = 1)[source]

Class for small dataset.

name

Name of the dataset.

raw_data_dir

Path to the directory containing the raw data.

kohn_sham_data_dir

Path to the directory containing the Kohn-Sham data.

num_processes

Number of processes to use for the computation.

num_molecules

Number of molecules in the dataset.

molecules

List of molecules in the dataset.

__init__(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str, molecules: list[M], num_processes: int = 1)[source]

Define molecules and initialize using parent class.

Parameters:
  • raw_data_dir – Path to the directory containing the raw data.

  • kohn_sham_data_dir – Path to the directory containing the Kohn-Sham data.

  • label_dir – Path to the directory containing the labels.

  • filename – The filename to use for the output files.

  • name – Name of the dataset, used as the folder name.

  • molecules – List of molecules in the dataset.

  • num_processes – Number of processes to use for dataset verifying or loading.

download() None[source]

Create the raw data directory and save the molecules as npz files.

get_all_atomic_numbers() ndarray[source]

Get all atomic numbers in the dataset.

get_ids() ndarray[source]

Get the indices of the molecules in the dataset.

get_num_molecules() int[source]

Get the number of molecules in the dataset.

load_charges_and_positions(id: int) tuple[ndarray, ndarray][source]

Load nuclear charges and positions for the given molecule indices.

Parameters:

id – Index of the molecules to compute.