small_dataset

class SmallDataset(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str, molecules: list[M], num_processes: int = 1)[source]

Class for small dataset.

name: Name of the dataset.

raw_data_dir: Path to the directory containing the raw data.

kohn_sham_data_dir: Path to the directory containing the Kohn-Sham data.

num_processes: Number of processes to use for the computation.

num_molecules: Number of molecules in the dataset.

molecules: List of molecules in the dataset.

__init__(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str, molecules: list[M], num_processes: int = 1)[source]

Define molecules and initialize using parent class.

Parameters:

raw_data_dir – Path to the directory containing the raw data.
kohn_sham_data_dir – Path to the directory containing the Kohn-Sham data.
label_dir – Path to the directory containing the labels.
filename – The filename to use for the output files.
name – Name of the dataset, used as the folder name.
molecules – List of molecules in the dataset.
num_processes – Number of processes to use for dataset verifying or loading.

download() → None[source]: Create the raw data directory and save the molecules as npz files.

get_all_atomic_numbers() → ndarray[source]: Get all atomic numbers in the dataset.

get_ids() → ndarray[source]: Get the indices of the molecules in the dataset.

get_num_molecules() → int[source]: Get the number of molecules in the dataset.

load_charges_and_positions(id: int) → tuple[ndarray, ndarray][source]

Load nuclear charges and positions for the given molecule indices.

Parameters:: id – Index of the molecules to compute.