of_data

Definition of the OFData class, whose instances are the inputs to the model.

class OFData(x: Tensor | None = None, edge_index: Tensor | None = None, edge_attr: Tensor | None = None, y: Tensor | int | float | None = None, pos: Tensor | None = None, time: Tensor | None = None, **kwargs)[source]

Pytorch geometric data object for OF-DFT data. By pytorch-geometric magic (dynamic inheritance), the properties and methods of OFData are also available for torch_geometric.data.Batch objects constructed out of them. .. attribute:: pos

Atomic positions. Shape (n_atom, 3).

atomic_numbers

Atomic numbers. Shape (n_atom,).

atom_ind

Indices of the atoms in the basis. Shape (n_atom,).

n_atom

Number of atoms in the molecule.

n_basis

Number of basis functions.

n_electron

Number of electrons in the molecule.

n_basis_per_atom

Number of basis functions per atom. Shape (n_atom,). Can be used with torch.split().

atom_ptr

Pointer tensor that can be used to split basis functions into atoms using np.split(). See OFData.split_field_by_atom. Shape (n_atom,).

basis_function_ind

Array holding the data fields basis function indices. Can be used to map atomic number specific quantities to the data fields. Shape (n_basis,).

coeff_ind_to_node_ind

Array mapping coefficient indices to node (=atom) indices (we use ‘node’ instead of ‘atom’ to avoid confusion with atom_ind, which indexes different atomic numbers). Shape (n_basis,).

dual_basis_integrals

Dual basis integrals, required to compute the projected gradient. Shape (n_basis,).

coeffs

Density coefficients in the linear OF Ansatz. Shape (n_basis,).

Note

Optional Attributes: (might not be present or None)
  • irreps_per_atom: Irreps of the basis functions per atom. Shape (n_atom,).

  • ground_state_coeffs: Ground-state density coefficients in the linear OF Ansatz. Shape (n_basis,).

  • gradient_label: Gradient of the (by default) kinetic energy w.r.t. the coefficients. Shape (n_basis,).

  • energy_label: Total energy of the molecule.

  • has_energy_label: Whether the energy label is present. (False for data from the initialisation.)

  • mol_id: ID of the molecule.

  • scf_iteration: Index of the SCF iteration.

  • atom_coo_indices: Indices of the basis functions in the molecule. Shape (2, n_basis). Only present if add_atom_coo_indices() was applied.

__cat_dim__(key: str, value: Any, *args, **kwargs) Any[source]

Concatenate the attribute key along dimension value when creating a batch.

We change the default behavior for the key “atom_coo_indices”, of shape (2, n_basis), which should be concatenated along the first dimension.

__inc__(key: str, value: Any, *args, **kwargs) Any[source]

Increment the attribute key by value when creating a batch. We change the default behavior for two keys, “atom_ptr” and “atom_coo_indices”, both of which consist of indices of different basis functions. Hence, they must be incremented by the number of basis functions to correctly reference basis functions in the batch. Furthermore, we need to increment “coeff_ind_to_node_ind” by the number of atoms in the batch, because its values are node (=atom) indices.

See the docstring of torch_geometric.data.Data for general information on __inc__().

__setattr__(key: str, value: Any)[source]

Overwrite the default __setattr__ to check if fields are added without their transformation order being set.

__setitem__(key: str, value: Any)[source]

Overwrite the default __setitem__ to check if fields are added without their transformation order.

add_item(key: str, value: Any, representation: Representation | str)[source]

Add an item to the data object. The transformation order specifies how the item should be transformed.

Parameters:
  • key – Key of the item to add.

  • value – Value of the item to add.

  • representation – Transformation order of the item.

classmethod construct_new(basis_info: BasisInfo, pos: ndarray, atomic_numbers: ndarray, coeffs: ndarray, ground_state_coeffs: ndarray = None, gradient_label: ndarray = None, energy_label: list[float] | ndarray = None, has_energy_label: bool = None, dual_basis_integrals: ndarray | str = None, add_irreps: bool = False, mol_id: str = None, scf_iteration: int = None, additional_representations: dict[str, Representation] = None, **kwargs) OFData[source]

Construct a new OFData object from the given data, inferring additional fields using the given BasisInfo. We do not override the __init__ because that messes with pytorch geometric magic (somewhere in the loader).

Parameters:
  • basis_info – Basis information for the data.

  • pos – Atomic positions. Shape (n_atom, 3).

  • atomic_numbers – Atomic numbers. Shape (n_atom,).

  • coeffs – Density coefficients in the linear OF Ansatz. Shape (n_basis,).

  • ground_state_coeffs – Ground-state density coefficients in the linear OF Ansatz. Shape (n_basis,). Optional.

  • gradient_label – Gradient of the (by default) kinetic energy w.r.t. the coefficients. Shape (n_basis,). Optional.

  • energy_label – Total energy of the molecule. Optional.

  • has_energy_label – Whether the energy label is present. (False for data from the initialisation.) Optional.

  • dual_basis_integrals – Dual basis integrals, required to compute the projected gradient. If set to infer_from_basis, it will be computed from the basis and be in the untransformed basis.

  • add_irreps – Whether to add the irreps of the basis functions per atom to the sample.

  • mol_id – ID of the molecule. Optional.

  • scf_iteration – Index of the SCF iteration. Optional.

  • additional_representations – Dictionary specifying the transformation order of the data fields.

  • kwargs – Additional keyword arguments to be passed to the init of Data from pytorch_geometric.

delete_item(key: str) None[source]

Remove an item from the data object.

Parameters:

key – Key of the item to remove.

classmethod from_file(path: str | Path, scf_iteration: int, basis_info: BasisInfo, energy_key: str = 'e_kin', gradient_key: str = 'grad_kin', add_irreps: bool = False, additional_keys_at_scf_iteration: dict[str, Representation] = None, additional_keys_at_ground_state: dict[str, Representation] = None, additional_keys_per_geometry: dict[str, Representation] = None) OFData[source]

Load a sample from the DFT data set.

Parameters:
  • path – Path to the .zarr file.

  • scf_iteration – Index of the SCF iteration to load.

  • basis_infoBasisInfo object containing information about the OF basis.

  • energy_key – Key of the energy to load.

  • gradient_key – Key of the gradient to load.

  • add_irreps – Whether to add the irreps of the basis functions per atom to the sample.

  • additional_keys_at_scf_iteration – List of additional keys to load from the zarr group. The arrays corresponding to these keys will be indexed at the current SCF iteration. Optional.

  • additional_keys_at_ground_state – List of additional keys to load from the zarr group. The arrays corresponding to these keys will be indexed at the final SCF iteration, i.e. the ground state. Optional.

  • additional_keys_per_geometry – List of additional keys to load from the zarr group. The arrays corresponding to these keys will not be indexed. Optional.

Returns:

pytorch geometric data object. It has the following fields:
  • pos: Atomic positions. Shape (n_atom, 3).

  • atomic_numbers: Atomic numbers. Shape (n_atom,).

  • atom_ind: Indices of the atoms in the basis. Shape (n_atom,).

  • n_basis_per_atom: Number of basis functions per atom. Shape (n_atom,). Can be used with torch.split().

  • atom_ptr: Indices of the atoms that belong to the molecule. Shape (n_atom,). Can be used with np.split().

  • coeffs: Density coefficients in the linear OF Ansatz. Shape (n_basis,).

  • ground_state_coeffs: Ground-state density coefficients in the linear OF Ansatz. Shape (n_basis,).

  • gradient_label: Gradient of the (by default) kinetic energy w.r.t. the coefficients. Shape (n_basis,).

  • energy_label: Total energy of the molecule.

  • has_energy_label: Whether the energy/gradient label is meaningful. (set to 0 for data from the initialisation.)

Return type:

sample

classmethod from_file_with_all_gradients(path: str | Path, scf_iteration: int, basis_info: BasisInfo, energy_key: str = 'e_kin', gradient_key: str = 'grad_kin', add_irreps: bool = True) OFData[source]

Load a sample containing all the keys in the zarr file.

get_n_atom_per_molecule()[source]

Get the number of atoms per molecule.

Only available for Batch objects.

get_n_basis_per_molecule()[source]

Get the number of basis functions per molecule.

Only available for Batch objects.

classmethod minimal_sample_from_mol(mol: Mole, basis_info: BasisInfo | None = None, add_transformation_matrix: bool = False) OFData[source]

Construct an OFData sample from a molecule.

Used for running (classical) OFDFT calculation on a molecule from scratch. Assumes the basis of the molecule is the one used for the OFDFT calculation and the basis functions present in the molecule are sufficient information (as is the case for classical ofdft but unlikely in case of ML functionals).

Parameters:
  • mol – The molecule.

  • basis_info – Basis information for the data. If None, a minimal basis info object is created.

  • add_transformation_matrix – Whether to add identity matrices for the transformation and inverse transformation matrices. Default is False.

Returns:

The OFData sample.

Return type:

OFData

split_field_by_atom(array_or_tensor: ndarray | Tensor, axis: int = 0) tuple[ndarray | Tensor, ...][source]

Split the given array or tensor into the individual atoms. The array or tensor is not required to be part of the OFData object, and if it is, the corresponding attribute will not be overridden.

Parameters:
  • array_or_tensor – Array or tensor to split.

  • axis – Axis along which to split.

Returns:

tuple of arrays or tensors, one for each atom.

class Representation(*values)[source]

Enum for the representations of the fields in the OFData object.

  • NONE: Quantities that are no geometric objects and are thus not transformed

  • SCALAR: Scalar quantities (e.g. energies). Stay constant under basis transformations.

  • VECTOR: Vector quantities that transform with the basis transformation matrix.

  • DUAL_VECTOR: Vectors that are applied to vectors using the scalar product. Gradients are dual vectors, but we

    differentiate between gradients and dual vectors because the ProjectGradients is only applied to gradients.

  • GRADIENT: Dual vectors that are gradients and as such can be projected to the electron preserving manifold.

  • ENDOMORPHISM: Endomorphisms that are applied to vectors.

  • BILINEAR_FORM: Bilinear forms that are applied to vectors.

  • AO: Atomic orbitals.

For exact transformation formulas see transform_tensor().

__format__(format_spec, /)

Return a formatted version of the string as described by format_spec.

__new__(value)
__str__()

Return str(self).

static _generate_next_value_(name, start, count, last_values)

Return the lower-cased version of the member name.

class StrEnumwithCheck(new_class_name, /, names, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Enum with a check if a value is in the enum.

__format__(format_spec, /)

Return a formatted version of the string as described by format_spec.

__new__(value)
__str__()

Return str(self).

static _generate_next_value_(name, start, count, last_values)

Return the lower-cased version of the member name.

classmethod check_key(value)[source]

Check if a value is in the enum.

If not this will throw a value error.