qm9

QM9 dataset.

Contains 129,133 molecules from the QM9 dataset. The ids of the molecules are given by the index of the xyz file.

class QM9(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'QM9', num_processes: int = 1)[source]

Class for the QM9 dataset.

name

Name of the dataset.

raw_data_dir

Path to the raw data directory.

kohn_sham_data_dir

Path to the kohn-sham data directory.

__init__(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'QM9', num_processes: int = 1)[source]

Initialize the QM9 dataset.

Parameters:
  • raw_data_dir – Path to the raw data directory.

  • kohn_sham_data_dir – Path to the kohn-sham data directory.

  • label_dir – Path to the directory containing the labels.

  • filename – The filename to use for the output files.

  • name – Name of the dataset.

  • num_processes – Number of processes to use for dataset verifying or loading.

Raises:

AssertionError – If the subset is not in the list of available subsets.

convert_xyz_files() None[source]

Convert the xyz files from QM9 to have the format 1e-6 instead of 1*^-6 which can’t be read by pyscf.

download() None[source]

Download the raw data.

get_all_atomic_numbers() ndarray[source]

Get the atomic numbers of all atoms in the dataset.

Returns:

Array of atomic numbers.

Return type:

np.ndarray

get_ids() ndarray[source]

Get the indices of the molecules in the dataset.

Returns:

Array of indices of the molecules in the dataset.

Return type:

np.ndarray

get_num_molecules() int[source]

Get the number of molecules in the dataset.

Returns:

Number of molecules in the dataset.

Return type:

int

load_charges_and_positions(id: int) tuple[list, list][source]

Load nuclear charges and positions for the given molecule indices from the .xyz files. :param ids: Array of indices of the molecules to compute.

Returns:

Array of atomic numbers (A). np.ndarray: Array of atomic positions (A, 3).

Return type:

np.ndarray

class QM9Test(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'QM9', num_processes: int = 1)[source]
download() None[source]

Download the raw data.

load_charges_and_positions(id: int) tuple[list, list][source]

Load nuclear charges and positions for the given molecule indices from the .xyz files. :param ids: Array of indices of the molecules to compute.

Returns:

Array of atomic numbers (A). np.ndarray: Array of atomic positions (A, 3).

Return type:

np.ndarray

convert_folder_sorted_parallel(in_folder: Path, out_folder: Path, num_processes: int) None[source]

Apply the conversion function to all xyz files in the folder in parallel.

Parameters:
  • in_folder – Path to the input folder

  • out_folder – Path to the output folder

  • num_processes – Number of processes to use

convert_string_format(xyz_file_path: Path, out_folder: Path) None[source]

Convert the xyz files from QM9 to have the format 1e-6 instead of 1*^-6 which can’t be read by pyscf.

Parameters:
  • xyz_file_path – Path to the xyz file

  • out_folder – Path to the output folder