qm9
QM9 dataset.
Contains 129,133 molecules from the QM9 dataset. The ids of the molecules are given by the index of the xyz file.
- class QM9(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'QM9', num_processes: int = 1)[source]
Class for the QM9 dataset.
- name
Name of the dataset.
- raw_data_dir
Path to the raw data directory.
- kohn_sham_data_dir
Path to the kohn-sham data directory.
- __init__(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'QM9', num_processes: int = 1)[source]
Initialize the QM9 dataset.
- Parameters:
raw_data_dir – Path to the raw data directory.
kohn_sham_data_dir – Path to the kohn-sham data directory.
label_dir – Path to the directory containing the labels.
filename – The filename to use for the output files.
name – Name of the dataset.
num_processes – Number of processes to use for dataset verifying or loading.
- Raises:
AssertionError – If the subset is not in the list of available subsets.
- convert_xyz_files() None[source]
Convert the xyz files from QM9 to have the format 1e-6 instead of 1*^-6 which can’t be read by pyscf.
- get_all_atomic_numbers() ndarray[source]
Get the atomic numbers of all atoms in the dataset.
- Returns:
Array of atomic numbers.
- Return type:
np.ndarray
- get_ids() ndarray[source]
Get the indices of the molecules in the dataset.
- Returns:
Array of indices of the molecules in the dataset.
- Return type:
np.ndarray
- get_num_molecules() int[source]
Get the number of molecules in the dataset.
- Returns:
Number of molecules in the dataset.
- Return type:
int
- load_charges_and_positions(id: int) tuple[list, list][source]
Load nuclear charges and positions for the given molecule indices from the .xyz files. :param ids: Array of indices of the molecules to compute.
- Returns:
Array of atomic numbers (A). np.ndarray: Array of atomic positions (A, 3).
- Return type:
np.ndarray
- class QM9Test(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'QM9', num_processes: int = 1)[source]
-
- load_charges_and_positions(id: int) tuple[list, list][source]
Load nuclear charges and positions for the given molecule indices from the .xyz files. :param ids: Array of indices of the molecules to compute.
- Returns:
Array of atomic numbers (A). np.ndarray: Array of atomic positions (A, 3).
- Return type:
np.ndarray
- convert_folder_sorted_parallel(in_folder: Path, out_folder: Path, num_processes: int) None[source]
Apply the conversion function to all xyz files in the folder in parallel.
- Parameters:
in_folder – Path to the input folder
out_folder – Path to the output folder
num_processes – Number of processes to use