misc
Arbitrary dataset of molecules in .xyz format.
This dataset is itended for small case tests of molecules that are not part of specific datasets.
- class MiscXYZ(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'MiscXYZ', num_processes: int = 1)[source]
Class for the arbitrary molecules.
- name
Name of the dataset.
- raw_data_dir
Path to the raw data directory.
- kohn_sham_data_dir
Path to the kohn-sham data directory.
- __init__(raw_data_dir: str, kohn_sham_data_dir: str, label_dir: str, filename: str, name: str = 'MiscXYZ', num_processes: int = 1)[source]
Initialize the MiscXYZ dataset.
- Parameters:
raw_data_dir – Path to the raw data directory.
kohn_sham_data_dir – Path to the kohn-sham data directory.
label_dir – Path to the directory containing the labels.
filename – The filename to use for the output files.
name – Name of the dataset.
num_processes – Number of processes to use for dataset verifying or loading.
external_potential_modification – configuration for external potential modification.
- Raises:
AssertionError – If the subset is not in the list of available subsets.
- get_all_atomic_numbers() ndarray[source]
Get all atomic numbers present in the dataset.
Iterates over all molecules in the dataset and collects all atomic numbers.
- Returns:
Array of atomic numbers present in the dataset.
- Return type:
np.ndarray
- get_ids() ndarray[source]
Get the indices of the molecules in the dataset.
- Returns:
Array of indices of the molecules in the dataset.
- Return type:
np.ndarray
- get_num_molecules() int[source]
Get the number of molecules in the dataset.
- Returns:
Number of molecules in the dataset.
- Return type:
int
- load_charges_and_positions(id: int) tuple[list, list][source]
Load nuclear charges and positions for the given molecule indices from the .xyz files. :param ids: Array of indices of the molecules to compute.
- Returns:
Array of atomic numbers (A). np.ndarray: Array of atomic positions (A, 3).
- Return type:
np.ndarray