misc

Arbitrary dataset of molecules in .xyz format.

This dataset is itended for small case tests of molecules that are not part of specific datasets.

class MiscXYZ(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'MiscXYZ', num_processes: int = 1)[source]

Class for the arbitrary molecules.

name

Name of the dataset.

raw_data_dir

Path to the raw data directory.

kohn_sham_data_dir

Path to the kohn-sham data directory.

__init__(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'MiscXYZ', num_processes: int = 1)[source]

Initialize the MiscXYZ dataset.

Parameters:
  • raw_data_dir – Path to the raw data directory.

  • kohn_sham_data_dir – Path to the kohn-sham data directory.

  • label_dir – Path to the directory containing the labels.

  • filename – The filename to use for the output files.

  • name – Name of the dataset.

  • num_processes – Number of processes to use for dataset verifying or loading.

  • external_potential_modification – configuration for external potential modification.

Raises:

AssertionError – If the subset is not in the list of available subsets.

download() None[source]

This is just a stub as this kind of dataset can not be downloaded.

get_all_atomic_numbers() ndarray[source]

Get all atomic numbers present in the dataset.

Iterates over all molecules in the dataset and collects all atomic numbers.

Returns:

Array of atomic numbers present in the dataset.

Return type:

np.ndarray

get_ids() ndarray[source]

Get the indices of the molecules in the dataset.

Returns:

Array of indices of the molecules in the dataset.

Return type:

np.ndarray

get_num_molecules() int[source]

Get the number of molecules in the dataset.

Returns:

Number of molecules in the dataset.

Return type:

int

load_charges_and_positions(id: int) tuple[list, list][source]

Load nuclear charges and positions for the given molecule indices from the .xyz files. :param ids: Array of indices of the molecules to compute.

Returns:

Array of atomic numbers (A). np.ndarray: Array of atomic positions (A, 3).

Return type:

np.ndarray