miniff package¶

Submodules¶

miniff.ewald module¶

miniff.ewald.ewald_cutoffs(eta, volume, charge, scale=1, eps=1e-07)¶

Compute the required Ewald cutoffs for the given tolerance.

Parameters

eta (float) – Gaussian charge screening parameter.
volume (float) – Unit cell volume.
charge (float) – Sum of individual charges squared.
scale (float) – The scale: e^2 / 4 / π / ε0 = Hartree * aBohr
eps (float) – The required tolerance.

Returns

r_cut (float) – Cutoff value in real space.
k_cut (float) – Cutoff value in reciprocal space.

miniff.ewald.ewald_k_cutoff_error(k_cut, eta, volume, charge, scale=1)¶

An error associated with the k-space cutoff of Ewald summations. Eq. 5 in doi:10.1016/j.cplett.2005.05.106

Parameters

k_cut (float) – k-space cutoff.
eta (float) – Gaussian charge screening parameter.
volume (float) – Unit cell volume.
charge (float) – Sum of individual charges squared.
scale (float) – The scale: e^2 / 4 / π / ε0 = Hartree * aBohr

Returns

result – The error estimate.

Return type

float

miniff.ewald.ewald_real_cutoff_error(r_cut, eta, volume, charge, scale=1)¶

An error associated with the real cutoff of Ewald summations. Eq. 4 in doi:10.1016/j.cplett.2005.05.106

Parameters

r_cut (float) – Real-space cutoff.
eta (float) – Gaussian charge screening parameter.
volume (float) – Unit cell volume.
charge (float) – Sum of individual charges squared.
scale (float) – The scale: e^2 / 4 / π / ε0 = Hartree * aBohr

Returns

result – The error estimate.

Return type

float

miniff.ewald.stat_cell(cells, squeeze=True)¶

Computes statistics for cell(s): volumes and charges.

Parameters

cells (list, tuple, Cell) – Cell(s) to compute cutoffs for.
squeeze (bool) – If True, squeezes the output for a single cell.

Returns

volume (np.ndarray, float) – Volume(s).
charge (np.ndarray, float) – Sum(s) of charges squared.

miniff.kernel module¶

class miniff.kernel.Cell(vectors: numpy.ndarray, coordinates: numpy.ndarray, values: numpy.ndarray, meta: dict = <factory>, _vectors_inv: Optional[dataclasses.InitVar] = None)¶

Bases: object

Describes a unit cell.

cartesian¶

coordinates: numpy.ndarray¶

copy(vectors=None, coordinates=None, cartesian=None, values=None, meta=None, proto=None)¶: Creates a copy with optional modifications.

distances(cutoff=None, other=None)¶

Computes inter-point distances.

Parameters

cutoff (float) – Cutoff for obtaining distances.
other (Cell, np.ndarray) – Other cell to compute distances to

Returns

result – The resulting distance matrix.

Return type

np.ndarray, csr_matrix

classmethod from_cartesian(vectors, cartesian, values, *args, proto=None, _vectors_inv=None, **kwargs)¶: Constructs an instance from cartesian coordinates.

classmethod from_state_dict(data)¶

classmethod load(f)¶

Load Cell(s) from stream.

Parameters: f (file) – File-like object.
Returns: result – The resulting Cell(s).
Return type: list, Cell

meta: dict¶

normalized()¶: Puts all points inside box boundaries and returns a copy.

classmethod random(density, atoms, shape=None)¶

Prepares a unit cell with random atomic positions.

Parameters

density (float) – Atomic density.
atoms (dict) – A dictionary with specimen-count pairs.
shape ({"box"}) – The shape of the resulting cell.

Returns

result – The resulting unit cell.

Return type

UnitCell

repeated(*args)¶

Prepares a supercell.

Parameters: *args – Repeat counts along each vector.
Returns: supercell – The resulting supercell.
Return type: Cell

static save(cells, f)¶

Saves cells.

Parameters

cells (list, Cell) – Cells to save.
f (file) – File-like object.

size¶

state_dict()¶

values: numpy.ndarray¶

values_encoded¶

values_lookup¶

values_uq¶

vectors: numpy.ndarray¶

vectors_inv¶

volume¶

class miniff.kernel.CellImages(cell: miniff.kernel.Cell, cartesian: numpy.ndarray, shift: numpy.ndarray, distances: scipy.sparse.csr.csr_matrix, cutoff: float, reciprocal_grid: numpy.ndarray = None)¶

Bases: object

adf(theta, sigma, cutoff, fmt='{}-[{},{}]')¶

Computes the angular distribution function.

Parameters

theta (np.ndarray, float) – Radius values.
sigma (float) – Smearing.
cutoff (float) – Radial cutoff value.
fmt (str) – A format string for keys.

Returns

result – Radial distribution function values.

Return type

dict

cartesian: numpy.ndarray¶

cell: miniff.kernel.Cell¶

cutoff: float¶

distances: scipy.sparse.csr.csr_matrix¶

eval(potentials, kname, squeeze=True, ignore_missing_species=False, out=None, **kwargs)¶

Computes potentials: values, gradients and more.

Parameters

images (CellImages) – Cell and its images.
potentials (list, LocalPotential) – A list of potentials or a single potential.
kname (str, None) – Function to evaluate: ‘kernel’, ‘kernel_gradient’ or whatever other kernel function set for all potentials in the list.
squeeze (bool) – If True, returns a single array whenever a single potential is passed.
ignore_missing_species (bool) – If True, no error is raised whenever a specimen in the potential description is not found in the cell.
out (np.ndarray) – The output buffer [n_potentials, n_atoms] for kname == “kernel” and [n_potentials, n_atoms, n_atoms, 3] for kname == “kernel_gradient”. Any kind of reduction including resolved=False and calls self.total, self.grad calls will use the buffer for intermediate results but will still allocate a new array for the output.
kwargs – Other arguments to eval_potentials.

Returns

result – The result of the potential computation given the cell data.

Return type

np.ndarray

grad(potentials, kname='kernel_gradient', **kwargs)¶

Total energy gradients with respect to cartesian atomic coordinates.

Similarly to self.total, this function totally ignores any symmetry issues related to double-counting, etc.

Parameters

images (CellImages) – Cell and its images.
potentials (list, LocalPotential) – A list of potentials or a single potential.
kname (str, None) – Function to evaluate: ‘kernel’, ‘kernel_gradient’ or whatever other kernel function set for all potentials in the list.
kwargs – Other arguments to total.

Returns

gradients – Total energy gradients.

Return type

np.ndarray

grad_cell(potentials, kname='kernel_cell_gradient', **kwargs)¶

Total energy gradients with respect to cell vectors assuming cartesian atomic coordinates are fixed.

Parameters

images (CellImages) – Cell and its images.
potentials (list, LocalPotential) – A list of potentials or a single potential.
kname (str, None) – Function to evaluate: ‘kernel’, ‘kernel_gradient’ or whatever other kernel function set for all potentials in the list.
kwargs – Other arguments to total.

Returns

gradients – Total energy gradients.

Return type

np.ndarray

n_images¶

pair_reduction_function(f, fmt='{}-{}')¶

Pair reduction function.

Parameters

f (Callable) – A function reducing pair-specific distances, see self.rdf for an example.
fmt (str) – A format string for keys.

Returns

result – Pair function values.

Return type

dict

rdf(r, sigma, fmt='{}-{}')¶

Computes the radial distribution function.

Parameters

r (np.ndarray, float) – Radius values.
sigma (float) – Smearing.
fmt (str) – A format string for keys.

Returns

result – Radial distribution function values.

Return type

dict

reciprocal_grid: numpy.ndarray = None¶

shift: numpy.ndarray¶

size¶

total(potentials, kname='kernel', squeeze=False, resolving=False, **kwargs)¶

Total energy as a sum of all possible potential terms.

Note that this function totally ignores any symmetry issues related to double-counting, etc.

Parameters

images (CellImages) – Cell and its images.
potentials (list, LocalPotential) – A list of potentials or a single potential.
kname (str, None) – Function to evaluate: ‘kernel’, ‘kernel_gradient’ or whatever other kernel function set for all potentials in the list.
squeeze (bool) – If True, returns a single array whenever a single potential is passed.
resolving (bool) – If True, runs species-resolving kernels.
kwargs – Other arguments to eval.

Returns

energy – The total energy value.

Return type

float

class miniff.kernel.ScalarFunctionWrapper(sample, potentials, include_coordinates=True, include_vectors=False, normalize=None, prefer_parallel=None, cell_logger=None, track_potential_fidelity=False, **kwargs)¶

Bases: object

eval(coordinates, vectors)¶

Computes function and gradients.

Parameters

coordinates (np.ndarray) – Cell (crystal) coordinates.
vectors (np.ndarray) – Cell vectors.

Returns

cell (Cell) – The resulting cell with energy and gradients set.
f (float) – The energy value.
gc (np.ndarray) – Cartesian gradients.
gv (np.ndarray) – Vector gradients.

eval_(coordinates, vectors)¶

Computes function and gradients. (Skips saving into history.)

Parameters

coordinates (np.ndarray) – Cell (crystal) coordinates.
vectors (np.ndarray) – Cell vectors.

Returns

cell (Cell) – The resulting cell with energy and gradients set.
f (float) – The energy value.
gc (np.ndarray) – Cartesian gradients.
gv (np.ndarray) – Vector gradients.

eval_to_cell(coordinates: numpy.ndarray, vectors: numpy.ndarray) → miniff.kernel.Cell¶

f(coordinates: numpy.ndarray, vectors: numpy.ndarray) → float¶

gc(coordinates: numpy.ndarray, vectors: numpy.ndarray) → numpy.ndarray¶

gv(coordinates: numpy.ndarray, vectors: numpy.ndarray) → numpy.ndarray¶

make_cell(coordinates, vectors)¶

start_recording()¶: Starts recording of coordinates passed.

stop_recording() → miniff.kernel.SnapshotHistory¶: Stops the recording of coordinates and returns all cells recorded.

class miniff.kernel.SnapshotHistory(iterable=(), /)¶: Bases: list

miniff.kernel.batch_rdf(cells, *args, inner=<function CellImages.rdf>, **kwargs)¶

Averaged radial distribution function.

Parameters

cells (list, tuple) – A collection of wrapped cells to process.
inner (Callable) – The function computing distribution for a single cell.
args –
kwargs – Arguments to inner.

Returns

result – Radial distribution function values.

Return type

dict

miniff.kernel.common_cutoff(potentials)¶

The maximal (common) cutoff of many potentials.

Parameters: potentials (Iterable) – Potentials to compute the cutoff for.
Returns: result – The resulting cutoff.
Return type: float

miniff.kernel.compute_images(cell, cutoff, reciprocal_cutoff=None, pbc=True)¶

Compute cell images given image shift vectors and cutoff distance.

Parameters

cell (Cell) – Cell to process.
cutoff (float) – The distance cutoff value.
reciprocal_cutoff (float) – Optional reciprocal cutoff for the reciprocal grid.
pbc (bool) – If True, assumes periodic boundary conditions.

Returns

images – Images with neighbor and distance information.

Return type

CellImages

miniff.kernel.compute_reciprocal_grid(cell, cutoff)¶

Computes the reciprocal grid.

Parameters

cell (Cell) – Cell to process.
cutoff (float) – The value of the reciprocal cutoff.

Returns

grid – A 2D array with reciprocal grid points.

Return type

np.ndarray

miniff.kernel.compute_shift_vectors(cell, cutoff=None, pbc=True)¶

Computes shift vectors given a cell and its environment.

Parameters

cell (Cell) – Cell to process.
cutoff (float) – Maximal distance computed (smaller=faster). Ignored if x is specified.
pbc (bool) – If True, assumes periodic boundary conditions. Otherwise returns a single zero shift vector.

Returns

shift_vectors – A 2D array with shift vectors.

Return type

np.ndarray

miniff.kernel.encode_potentials(potentials, lookup, default=None)¶

Encodes potentials to have species as integers.

Parameters

potentials (list, tuple) – Potentials to encode.
lookup (dict) – A lookup dictionary.
default (int) – The default value to replace non-existing entries. If None, raises KeyError.

Returns

result – The resulting list of potentials.

Return type

list

miniff.kernel.encode_species(species, lookup, default=None)¶

Transforms species into an array of integers encoding species.

Parameters

species (list, tuple, np.ndarray) – Species to encode.
lookup (dict) – A lookup dictionary.
default (int) – The default value to replace non-existing entries. If None, raises KeyError.

Returns

result – The resulting integer array.

Return type

np.ndarray

miniff.kernel.eval(images, potentials, kname, squeeze=True, ignore_missing_species=False, out=None, **kwargs)¶

Computes potentials: values, gradients and more.

Parameters

images (CellImages) – Cell and its images.
potentials (list, LocalPotential) – A list of potentials or a single potential.
kname (str, None) – Function to evaluate: ‘kernel’, ‘kernel_gradient’ or whatever other kernel function set for all potentials in the list.
squeeze (bool) – If True, returns a single array whenever a single potential is passed.
ignore_missing_species (bool) – If True, no error is raised whenever a specimen in the potential description is not found in the cell.
out (np.ndarray) – The output buffer [n_potentials, n_atoms] for kname == “kernel” and [n_potentials, n_atoms, n_atoms, 3] for kname == “kernel_gradient”. Any kind of reduction including resolved=False and calls self.total, self.grad calls will use the buffer for intermediate results but will still allocate a new array for the output.
kwargs – Other arguments to eval_potentials.

Returns

result – The result of the potential computation given the cell data.

Return type

np.ndarray

miniff.kernel.grad(images, potentials, kname='kernel_gradient', **kwargs)¶

Total energy gradients with respect to cartesian atomic coordinates.

Similarly to self.total, this function totally ignores any symmetry issues related to double-counting, etc.

Parameters

images (CellImages) – Cell and its images.
potentials (list, LocalPotential) – A list of potentials or a single potential.
kname (str, None) – Function to evaluate: ‘kernel’, ‘kernel_gradient’ or whatever other kernel function set for all potentials in the list.
kwargs – Other arguments to total.

Returns

gradients – Total energy gradients.

Return type

np.ndarray

miniff.kernel.grad_cell(images, potentials, kname='kernel_cell_gradient', **kwargs)¶

Total energy gradients with respect to cell vectors assuming cartesian atomic coordinates are fixed.

Parameters

images (CellImages) – Cell and its images.
potentials (list, LocalPotential) – A list of potentials or a single potential.
kname (str, None) – Function to evaluate: ‘kernel’, ‘kernel_gradient’ or whatever other kernel function set for all potentials in the list.
kwargs – Other arguments to total.

Returns

gradients – Total energy gradients.

Return type

np.ndarray

miniff.kernel.profile(potentials, f, *args, **kwargs)¶

Profiles a collection of potentials.

Parameters

potentials (list) – Potentials to profile.
f (Callable) – A function f(x1, …) -> UnitCell preparing a unit cell for the given set of parameters.
args – Sampling of x1, … arguments of the function f.
kwargs – Arguments to compute_images.

Returns

energy – Energies on the multidimensional grid defined by args.

Return type

np.ndarray

miniff.kernel.profile_directed_strain(potentials, cell, strain, direction, **kwargs)¶

Profiles a collection of potentials by applying strain.

Parameters

potentials (list) – Potentials to profile.
cell (UnitCell) – The original cell.
strain (Iterable) – The relative strain.
direction (list, tuple, np.ndarray) – The strain direction.
kwargs – Arguments to compute_images.

Returns

energy – Energies of strained cells.

Return type

np.ndarray

miniff.kernel.profile_strain(potentials, cell, *args, **kwargs)¶

Profiles a collection of potentials by applying strain.

Parameters

potentials (list) – Potentials to profile.
cell (UnitCell) – The original cell.
args – Relative strains along all vectors.
kwargs – Arguments to compute_images.

Returns

energy – Energies of strained cells.

Return type

np.ndarray

miniff.kernel.total(images, potentials, kname='kernel', squeeze=False, resolving=False, **kwargs)¶

Total energy as a sum of all possible potential terms.

Note that this function totally ignores any symmetry issues related to double-counting, etc.

Parameters

images (CellImages) – Cell and its images.
potentials (list, LocalPotential) – A list of potentials or a single potential.
kname (str, None) – Function to evaluate: ‘kernel’, ‘kernel_gradient’ or whatever other kernel function set for all potentials in the list.
squeeze (bool) – If True, returns a single array whenever a single potential is passed.
resolving (bool) – If True, runs species-resolving kernels.
kwargs – Other arguments to eval.

Returns

energy – The total energy value.

Return type

float

miniff.ml module¶

class miniff.ml.Dataset(per_cell_dataset, *per_point_datasets)¶

Bases: Generic[torch.utils.data.dataset.T_co]

static assert_compatible(items)¶

Checks whether input datasets are compatible to be merged into one.

Parameters

items (list, tuple) – Items to merge.

Returns

n_samples (int)
n_atoms (int)
n_species (tuple)
n_features (tuple)
n_coords (int)
dtype (torch.dtype) – Resulting dataset dimensions and dtype.

static cat(items)¶

Merges multiple datasets into a single one.

Parameters: items (list, tuple) – Items to merge.
Returns: result – The resulting contiguous dataset.
Return type: Dataset

property dtype¶

static from_tensors(tensors, like=None)¶

Constructs a dataset from nested tensor structure.

Parameters

tensors (list, tuple) – Tensors (energy, features, etc).
like (Dataset) – If set, copies tags from the dataset provided.

Returns

dataset – The resulting dataset.

Return type

Dataset

property per_cell_dataset: miniff.ml.PerCellDataset¶

property per_point_datasets: Tuple[miniff.ml.PerPointDataset, ...]¶

class miniff.ml.MergedDataset(*datasets)¶

Bases: Generic[torch.utils.data.dataset.T_co]

to(dtype)¶

Converts this dataset to the provided type.

Parameters: dtype – The data type to convert to.
Returns: result – The dataset of the given type.
Return type: MergedDataset

class miniff.ml.NNPotentialFamily(parameters, cutoff, kernels, parameter_defaults=None, tag=None, proto=None, pre_compute_r=None, additional_inputs=None, complement_accumulating=True, complement_num_grad=True, doc=None)¶

Bases: miniff.potentials.NestedLocalPotentialFamily

get_state_dict(potential)¶

Retrieves a state dict.

Parameters: potential (NNPotential) – A potential to represent.
Returns: result – Potential parameters and other information.
Return type: dict

instance_from_state_dict(data)¶

Restores a potential from its dict representation. This routine attempt to guess the context of the serialized data: only potentials created using ml_util.behler_nn and default arguments can be properly restored.

Parameters: data (dict) – A dict with the data.
Returns: result – The restored potential.
Return type: NNPotential

class miniff.ml.NoneTolerantTensorDataset(*tensors)¶: Bases: Generic[torch.utils.data.dataset.T_co]

class miniff.ml.Normalization(energy_scale, features_scale, energy_offsets, features_offsets, length_scale=None, charges_scale=None, charges_offsets=None)¶

Bases: object

apply_to_module(module, specimen, fw=True, output='energy', simplify=True)¶

Wraps a module into normalization layers (input and output).

Parameters

module (torch.nn.Module) – The potential turning normalized descriptors into normalized energies.
specimen (int) – The specimen handle.
fw (bool) – If True, performs a “forward” operation: assuming the module accepts plain features and outputs plain energies, returns another module accepting normalized features and returning normalized energies. Otherwise performs the inverse.
output ({'energy', 'charge'}) – The output to scale: energy or charge.
simplify (bool) – Attempt to simplify the resulting module.

Returns

result – The resulting module with normalization layers added.

Return type

torch.nn.Sequential

static atom_counts(dataset)¶

Calculates atoms per each unit cell and assembles counts into a single tensor.

Parameters: dataset (Dataset) – The dataset to process.
Returns: result – A 2D tensor [n_samples, len(per_point_datasets)] with counts.
Return type: torch.Tensor

bw(dataset, inplace=False)¶

Rescales dataset back to original values.

Parameters

dataset (Dataset) – The dataset to rescale.
inplace (bool) – If True, performs the operation in-place and returns the same dataset.

Returns

result – The original dataset.

Return type

Dataset

bw_charges(charges, specimen)¶

Rescales charges back to their original values.

Parameters

charges (torch.Tensor) – Charges to rescale.
specimen (int) – The index of the dataset charges belong to.

bw_energy(energy, atom_counts)¶

Rescales the energy back to its original values.

Parameters

energy (torch.Tensor) – Energy to rescale.
atom_counts (torch.Tensor) – A 2D matrix with atom counts per cell.

bw_energy_components(energy, specimen)¶

Rescales energy per-atom components back to their original values.

Parameters

energy (torch.Tensor) – Energy to rescale.
specimen (int) – The index of the dataset energies belong to (specimen index).

bw_energy_g(energy_g)¶

Rescales the energy gradients back to their original values.

Parameters: energy_g (torch.Tensor) – Energy gradients to rescale.

bw_features(features, specimen)¶

Rescales features back to their original values.

Parameters

features (torch.Tensor) – Features to rescale.
specimen (int) – The index of the dataset gradients belong to.

bw_features_g(features_g, specimen)¶

Rescales features’ gradients back to their original values.

Parameters

features_g (torch.Tensor) – Features’ gradients to rescale.
specimen (int) – The index of the dataset gradients belong to.

classmethod from_dataset(dataset, ignore_normalization_errors=False, pad=True, offset_energy=False, offset_features='mean', offset_charges=False, scale_energy=1, scale_features=2, scale_charges=1, scale_energy_gradients=1, pca_features=False)¶

Prepares normalization based on the dataset provided.

Parameters

dataset (Dataset) – The dataset to pick normalization for.
ignore_normalization_errors (bool) – Forces to ignore dataset parts which cannot be normalized.
pad (bool) – Stabilize least-squares problem when determining per-specimen energy offsets.
offset_energy (bool) –
offset_features ({False, True, "mean"}) –
offset_charges (bool) – If True, offsets energies, descriptors, and/or charges.
scale_energy (float) –
scale_features (float) –
scale_charges (float) –
scale_energy_gradients (float) – If set scales energies, descriptors, and/or charges to the value specified.
pca_features (float, int, bool, Callable) – If set, performs principal component analysis (e.g. SVD) and prepares a truncated linear transformation of descriptors as a part of normalization. Float value has the meaning of a relative cutoff of singular values with respect to the maximal singular value. Integer value corresponds to the number of highest singular values to chose. Callable is expected to take the output of torch.svd and to return the bool mask of singular entries chosen.

Returns

normalization – The resulting normalization.

Return type

Normalization

fw(dataset, inplace=False)¶

Rescales dataset to ranges suitable for machine learning.

Parameters

dataset (Dataset) – The dataset to rescale.
inplace (bool) – If True, performs the operation in-place and returns the same dataset.

Returns

result – The resulting scaled dataset.

Return type

Dataset

fw_charges(charges, specimen)¶

Rescales charges.

Parameters

charges (torch.Tensor) – Charges to rescale.
specimen (int) – The index of the dataset charges belong to.

fw_energy(energy, atom_counts)¶

Rescales the energy.

Parameters

energy (torch.Tensor) – Energy to rescale.
atom_counts (torch.Tensor) – A 2D matrix with atom counts per cell.

fw_energy_components(energy, specimen)¶

Rescales energy per-atom components.

Parameters

energy (torch.Tensor) – Energy to rescale.
specimen (int) – The index of the dataset energies belong to (specimen index).

fw_energy_g(energy_g)¶

Rescales the energy gradients.

Parameters: energy_g (torch.Tensor) – Energy gradients to rescale.

fw_features(features, specimen)¶

Rescales features.

Parameters

features (torch.Tensor) – Features to rescale.
specimen (int) – The index of the dataset gradients belong to (specimen index).

fw_features_g(features_g, specimen)¶

Rescales features’ gradients.

Parameters

features_g (torch.Tensor) – Features’ gradients to rescale.
specimen (int) – The index of the dataset gradients belong to.

is_gradient_available() → bool¶: Determines whether gradient normalization data is present.

load_state_dict(d)¶

Loads state dictionary.

Parameters: d (dict) – Dictionary to load.

static lsq_energy_offsets(dataset, pad=True)¶

Solves a least-squares problem for the best representation of cell energies as a sum of per-atom components.

Parameters

dataset (Dataset) – The dataset to process.
pad (bool) – If True, stabilizes energy padding by minimizing padding values together with the residuals.

Returns

energy_offsets (torch.Tensor) – A 1D tensor with per-specimen energy offsets.
residuals (torch.Tensor) – A 2D tensor [n_samples, 1] with energy residuals after offsets have been subtracted.

state_dict()¶

Returns a dict of parameters describing this normalization. No copies are made.

Returns: params – A dict of parameters.
Return type: dict

to(dtype)¶

Converts this normalization to the provided type.

Parameters: dtype – The data type to convert to.
Returns: result – The normalization of the given type.
Return type: Normalization

class miniff.ml.PerCellDataset(energy, mask=None, energy_g=None, reference=None)¶

Bases: Generic[torch.utils.data.dataset.T_co]

static assert_compatible(items)¶

Checks whether input datasets are compatible to be merged into one.

Parameters

items (list, tuple) – Items to merge.

Returns

n_samples (int)
n_atoms (int)
n_coords (int)
dtype (torch.dtype) – Resulting dataset dimensions and dtype.

static cat(items)¶

Merges multiple datasets into a single one.

Parameters: items (list, tuple) – Items to merge.
Returns: result – The resulting contiguous dataset.
Return type: PerCellDataset

property energy: torch.Tensor¶

property energy_g: torch.Tensor¶

static from_cells(cells, values, grad=False, **kwargs)¶

Prepares a per-cell dataset with total energies and energy gradients.

Parameters

cells (Iterable) – Cells to process.
values (torch.Tensor) – A matrix [n_samples, n_atoms] with encoded species.
grad (bool) – Include gradients.
kwargs – Arguments to empty tensor construction.

Returns

result – The resulting dataset.

Return type

PerCellDataset

is_gradient_available() → bool¶: Determines whether energy gradients data is present.

property mask: torch.Tensor¶

property reference: torch.Tensor¶

to(dtype)¶

Converts this dataset to the provided type.

Parameters: dtype – The data type to convert to.
Returns: result – The dataset of the given type.
Return type: PerCellDataset

class miniff.ml.PerPointDataset(features, mask, features_g=None, charges=None, energies_p=None, tag=None)¶

Bases: Generic[torch.utils.data.dataset.T_co]

static assert_compatible(items)¶

Checks whether input datasets are compatible to be merged into one.

Parameters

items (list, tuple) – Items to merge.

Returns

n_samples (int)
n_atoms (int)
n_species (int)
n_features (int)
n_coords (int)
dtype (torch.dtype) – Resulting dataset dimensions and dtype.

static cat(items, tag=None)¶

Merges multiple datasets into a single one.

Parameters

items (list, tuple) – Items to merge.
tag – An optional tag for this dataset.

Returns

result – The resulting contiguous dataset.

Return type

PerPointDataset

property charges: torch.Tensor¶

property energies_p: torch.Tensor¶

property features: torch.Tensor¶

property features_g: torch.Tensor¶

static from_cells(cells, descriptors, specimen, values, grad=False, charge=False, energies_p=False, tag=None, dtype=torch.float64, **kwargs)¶

Prepares a per-point dataset with features, feature gradients, energy gradients and partial energies.

Parameters

cells (Iterable) – ``CellImage``s to process.
descriptors (list) – A plain list of tagged descriptors.
specimen – The specimen this dataset is calculated for.
values (torch.Tensor) – A matrix [n_samples, n_atoms] with encoded species.
grad (bool) – Include gradients.
charge (bool) – Include atomic charges.
energies_p (bool) – Include partial energies.
tag – Optional tag for the dataset.
dtype – Tensor data type.
kwargs – Additional arguments to prepare_descriptor_data.

Returns

result – The resulting dataset.

Return type

PerPointDataset

get_features_hist(bins=100, margin=0)¶

Computes the histogram of features.

Parameters

bins (int) – Bin count.
margin (float) – Margins for binning range.

Returns

result – The resulting histogram as a [n_features, 2, bins + 1] tensor where result[:, 0] are bin edges and result[:, 1] are feature occurrence counts.

Return type

torch.Tensor

is_gradient_available() → bool¶: Determines whether features gradients data is present.

property mask: torch.Tensor¶

to(dtype)¶

Converts this dataset to the provided type.

Parameters: dtype – The data type to convert to.
Returns: result – The dataset of the given type.
Return type: PerPointDataset

exception miniff.ml.PotentialExtrapolationWarning¶: Bases: miniff.potentials.PotentialRuntimeWarning

miniff.ml.collect_atoms(cells)¶

Collects all atoms from all cells into a masked array.

Parameters: cells (tuple) – Cells to process.
Returns: values – An [n_samples, n_atoms] matrix with atoms.
Return type: np.ma.masked_array

miniff.ml.collect_charges(cells, out, mask=None)¶

Collects cell charges into a tensor.

Parameters

cells (Iterable) – ``CellImage``s to process.
out (torch.Tensor) – The output tensor.
mask (np.ndarray) – An optional mask array for a particular specimen.

Returns

out – A [n_samples, n_atoms] tensor with charges per atom per cell.

Return type

torch.Tensor

miniff.ml.collect_energies(cells, out, mask=None)¶

Collects cell energies into a tensor.

Parameters

cells (Iterable) – ``CellImage``s to process.
out (torch.Tensor) – The output tensor.
mask (np.ndarray) – An optional mask array for a particular specimen.

Returns

out – A [n_samples, 1] tensor with energies per each cell.

Return type

torch.Tensor

miniff.ml.collect_forces(cells, out, mask=None)¶

Collects cell forces into a tensor.

Parameters

cells (Iterable) – ``CellImage``s to process.
out (torch.Tensor) – The output tensor.
mask (np.ndarray) – An optional mask array for a particular specimen.

Returns

out – A [n_samples, n_atoms, n_coords] tensor with forces per atom per cell.

Return type

torch.Tensor

miniff.ml.collect_meta(field, cells, out, mask=None)¶

Collects cell metadata into a tensor.

Parameters

field (str) – The field to collect.
cells (Iterable) – ``CellImage``s to process.
out (torch.Tensor) – The output tensor.
mask (np.ndarray) – An optional mask array for a particular specimen.

Returns

out – A [n_samples, *] tensor with floats per each cell.

Return type

torch.Tensor

miniff.ml.collect_partial_energies(cells, out, mask=None)¶

Collects partial energies into a tensor.

Parameters

out (torch.Tensor) – The output tensor.
mask (np.ndarray) – An optional mask array for a particular specimen.

Returns

out – A [n_samples, n_atoms] tensor with energies per atom per cell.

Return type

torch.Tensor

miniff.ml.cpu_copy(model)¶

Creates a CPU copy of the model.

Parameters: model (torch.nn.Module) – The module to copy.
Returns: model_copy – The COU copy.
Return type: torch.nn.Module

miniff.ml.descriptor_fidelity_middleware(descriptor_values, descriptor_fidelity_histograms)¶

Evaluates descriptor fidelity based on how much descriptors are presented in the histogram data.

Parameters

descriptor_values (np.ndarray) – Descriptor values.
descriptor_fidelity_histograms (np.ndarray) – A 3-tensor with histograms (bins and values) representing the occurrence of descriptor values in the training data.

Returns

result – The resulting fidelity, one per atom.

Return type

np.ndarray

miniff.ml.encode_species(cells)¶

Encodes species into integers.

Parameters

cells (tuple, list, np.ndarray) – A collection of cells to process .

Returns

values_torch (torch.tensor) – An [n_samples, n_atoms] torch matrix with integers encoding species.
key (np.ndarray) – A 1D array with the key to values_torch.

miniff.ml.energy_gradients(net_output, features_g, resolve=False)¶

Computes total energy gradients.

Parameters

net_output (torch.Tensor, np.ndarray) – Output from the energy learning network: a 3D tensor of shape [n_samples, n_species, n_descriptors] with per-cell per-point energy gradients wrt descriptors.
features_g (torch.Tensor, np.ndarray) – A 5D tensor of shape [n_samples, n_species, n_descriptors, n_atoms, n_coords] with per-cell per-point descriptor gradients.
resolve (bool) – If True, returns energies per specimen.

Returns

result – A tensor of shape [n_samples, n_atoms, n_coords] if per_sample == False or a 4D tensor [n_samples, n_points, n_atoms, n_coords] if per_sample == True.

Return type

torch.Tensor

miniff.ml.eval_descriptors(r_indptr, r_indices, r_data, cartesian_row, cartesian_col, shift_vectors, descriptors, species_row, species_mask, kernel='kernel')¶

Computes descriptors or their gradients.

Parameters

r_indptr (np.ndarray) –
r_indices (np.ndarray) –
r_data (np.ndarray) –
cartesian_row (np.ndarray) –
cartesian_col (np.ndarray) –
shift_vectors (np.ndarray) – Common arguments to descriptor kernels specifying coordinates and neighbor relations.
descriptors (list) – A list of descriptors.
species_row (np.ndarray) –
species_mask (np.ndarray) –
kernel (str) – The desired descriptor kernel.

Returns

descriptor_values (np.ndarray) – A dense array with descriptors for matching atoms only.
descriptor_gradient_values (np.ndarray, optional) – Descriptor gradient values.

miniff.ml.forward(module, x, grad=False)¶

Computes energies and gradients.

Parameters

module (torch.nn.Module) – Module to propagate.
x (torch.Tensor) – Features input.
grad (bool) – If True, outputs gradients as well.

Returns

energies (torch.Tensor)
gradients (torch.Tensor)

miniff.ml.fw_cauldron(modules, dataset, grad=False, energies_p=False, normalization=None)¶

Propagates modules forward and assembles the total energy and gradients. This function will take care of all masking and padding.

Parameters

modules (list, tuple) – A list of modules mapping descriptors onto local energies.
dataset (Dataset, list, tuple) – The dataset with descriptors or tensors to assemble the dataset from.
grad (bool) – If True, computes gradients wrt descriptors.
energies_p (bool) – If True, presents total energy as a sum of per-atom contributions.
normalization (Normalization) – Optional normalization to apply (backward).

Returns

energy (Tensor, list) – A [n_samples, 1] tensor with total energies or a list of [n_samples, n_species, 1] tensors with per-atom contributions.
gradients (Tensor, optional) – A [n_samples, n_atoms, 3] tensor with total energy gradients.

miniff.ml.fw_cauldron_charges(modules, dataset, normalization=None)¶

Propagates modules forward and assembles atomic charges.

Parameters

modules (list, tuple) – A list of modules mapping descriptors onto atomic charges.
dataset (Dataset, list, tuple) – The dataset with descriptors or tensors to assemble the dataset from.
normalization (Normalization) – Optional normalization to apply (backward).

Returns

charges – A list of [n_samples, n_species] tensors with atomic charges.

Return type

list

miniff.ml.inplace_options(f)¶

Decorates functions accepting inplace options only towards making a copy.

Parameters: f (Callable) – A function to decorate.
Returns: result – The decorated function.
Return type: Callable

miniff.ml.kernel_g_nn(r_indptr, r_indices, r_data, cartesian_row, cartesian_col, shift_vectors, descriptors, nn, descriptor_fidelity_histograms, species_row, species_mask, out)¶