Toy function fitting ==================== ``miniff`` is built around the idea of using neural-network fits to describe inter-atomic interactions. This is done in three basic steps: 1. constructing the dataset; 2. fitting the data (= finding optimal map parameters); 3. using the fit. This tutorial demonstrates the first two steps in the context of simple scalar function fitting. We will pick a function and will try to perform a least-squares fit. Then, we will repeat the procedure with the help of ``miniff.ml`` and ``miniff.ml_util`` modules to introduce the key concepts of dataset construction and fitting in ``miniff``. The toy problem --------------- Neural-network fitting is useful for interpolating black-box functions which otherwise require intensive computations. For pedagogical reasons, let us instead try to fit a simple analytic function defined for positive ``r`` and resembling some features of realistic interatomic potentials: .. math:: f \left ( r \right ) = \left ( r^2 - r + 1 \right ) e^{-r}. Let's first plot the function: .. jupyter-execute:: import numpy as np from matplotlib import pyplot def f(r): return (r ** 2 - r + 1) * np.exp(-r) r = np.linspace(0.1, 10, 100) f_r = f(r) pyplot.plot(r, f_r) The function has two extrema at ``r = 1, 2`` and decays rapidly: these are the features which we will attempt to reproduce. When fitting, we only have access to discretized sampling values ``f_r``, but not to the functional form ``f(r)``. In principle, we may still use ``f(r)`` as a black box to produce more sampling points but we consider this operation to be computationally expensive, thus, to only be used *before* the fitting. 1. Constructing the dataset --------------------------- The fitting data (or the dataset) includes two pieces of information: blackbox outputs and descriptors. They can be also seen as right-hand and left-hand sides of the map ``g(r)->f(r)`` respectively, where ``f(r)`` are previously computed blackbox function values ``f_r``. What are descriptors? ..................... The purpose of descriptors ``g(r)`` is to describe values of ``r`` where ``f(r)`` was computed in a physically reasonable way. Can we use values of ``g(r)=r`` directly? Yes, but ``g(r)`` play the role of pre-conditioners: we input some intuition about how ``f(r)`` would most reasonably look like while leaving particular details to be determined during the fitting process. Additionally, we allow the blackbox function to have an arbitrary number of inputs such as .. math:: f \left ( \vec r \right ) = \sum_i { \left ( r_i^2 - r_i + 1 \right ) e^{-r_i}}. In this case we may want to construct the dataset from vectors of a non-constant dimension ``r = ([0], [0.1, 0.2], [2.2, 3, 2], ...)``. To employ neural network machinery, however, we usually need a constant number of inputs. This can be achieved by using a descriptor of the form .. math:: g \left ( \vec r \right ) = \sum_i r_i in place of ``r``. Importantly, the above descriptor also respect the permutation symmetry of ``f(r)``: ``f(x, y) = f(y, x)``, ``g(x, y) = g(y, x)``. Thus, the follow-up fitting process does not need to deduce this symmetry in a hard way numerically. How to choose descriptors? .......................... - ``g(r)`` should include as much intuition as possible (symmetries, asymptotic behavior, ...); - ``g(r)`` should be simple enough functions fast to compute. Here we choose the following family of descriptors which obviously satisfies both recommendations: .. math:: g_i (r) = r^i e^{-r}. We also know the exact form of the map to be fitted: .. math:: f(r) = g_2(r) - g_1(r) + g_0(r). So, to complete the dataset we compute ``g_0, g_1, g_2`` at previously chosen sampling points. .. jupyter-execute:: def g(r, i): return r ** i * np.exp(-r) g_r = np.array(tuple(g(r, i) for i in range(3))) pyplot.plot(r, f_r, label="target function") for i, _g in enumerate(g_r): pyplot.plot(r, _g, ls="--", label=f"g_{i}") pyplot.legend() 2. Fitting the data ------------------- For this example whe know there exists a linear combination of descriptors perfectly reproducing the black-box function .. math:: f(r) = \sum_i a_i \cdot g_i(r). Thus, the least-squares fit will find map parameters ``a_i``. .. jupyter-execute:: gf_map, res, rank, s = np.linalg.lstsq(g_r.T, f_r) print(gf_map) from numpy import testing testing.assert_allclose(gf_map, (1, -1, 1)) According to the last formula in the previous section, the answer is ``[1, -1, 1]``. miniff setting -------------- The above solution can be also obtained from ``miniff`` with the same workflow. 1. Constructing the dataset (miniff) .................................... ``miniff.ml`` contains handy pipelines for dataset construction and pre-processing. ``PerCellDataset`` and ``PerPointDataset`` are the two key containers where descriptors and target function values are stored. Some data pieces (such as descriptors, but also atomic charges and more) can be assigned to individual points (atoms) and are stored in ``PerPointDataset``. Other data pieces (such as total interatomic energies) are cumulative values that are stored in ``PerCellDataset``. Due to the nature of force-field fitting, there can be only one ``PerCellDataset`` but one or more ``PerPointDataset``s. In this case we store ``f_r`` into the total interaction energy slot of ``PerCellDataset`` to benefit from the energy fitting pipeline. .. jupyter-execute:: import torch from miniff import ml rhs = ml.PerCellDataset( energy=torch.tensor(f_r, dtype=torch.float32)[:, None], ) All dataset pieces have to be ``torch`` tensors of a fixed shape. In this case "energy" is required to be a column array with the second dimension equal to 1: otherwise a ``ValueError`` will be raised. Descriptors are stored in ``PerPointDataset`` as a standard. .. jupyter-execute:: lhs = ml.PerPointDataset( features=torch.tensor(g_r.T, dtype=torch.float32)[:, None, :], mask=torch.ones(len(r), 1), ) Here, we additionally specify the mask which weights individual data points. Again, the dimensions of input tensors are well-defined. As a final step, we collect the two pieces of fitting data into the ``Dataset`` object. .. jupyter-execute:: dataset = ml.Dataset(rhs, lhs) At this point all tensor shapes are checked and we are ready to set up the fitting. 2. Fitting the data (miniff) ............................ First, we need to define the functional form of the map. In this case the map is linear which can be declared using ``torch.nn.Linear``. .. math:: f = W \cdot g, where `W` stands for weights. .. jupyter-execute:: nn = torch.nn.Linear(in_features=3, out_features=1, bias=False) Here, ``3`` stands for the descriptor count, ``1`` is output count (single energy), ``bias=False`` tells that no constant is added to the linear form. To obtain the best fit we use energy fit optimisation pipeline defined in ``miniff.ml_util``. First, we define the closure to be energy difference closure. In general, closure defines the loss function to optimize but the corresponding ``ml_util`` object also includes default optimization settings. .. jupyter-execute:: from miniff import ml_util tolerance = dict( tolerance_change=1e-14, tolerance_grad=1e-14, ) closure = ml_util.simple_energy_closure([nn], dataset, optimizer_kwargs=tolerance) Let's check that the initial (random) guess is non-optimal. .. jupyter-execute:: print("Initial loss:", closure.loss().loss_value.item()) Finally, let's run the optimization and check the loss function value becomes small. .. jupyter-execute:: closure.optimizer_init() closure.optimizer_step() # LBFGS algorithm by default loss = closure.last_loss.loss_value.item() print("Final loss:", loss) assert loss < 1e-8 Parameters are now stored in the neural network module. .. jupyter-execute:: weights = nn.weight.detach().squeeze().numpy() print(weights) testing.assert_allclose(weights.squeeze(), [1, -1, 1], atol=1e-3) Checking the result ................... Finally, let's check whether the fit is reasonable by plotting it. .. jupyter-execute:: def nn_fit(r, descriptor_fns, nn): # Compute descriptors descriptors = np.array(tuple(g(r) for g in descriptor_fns)) # Compute the map result = nn(torch.tensor(descriptors.T, dtype=torch.float32)) return result.detach().numpy().squeeze() from functools import partial f_r_test = nn_fit( r, tuple(partial(g, i=i) for i in range(3)), nn, ) pyplot.plot(r, f_r) pyplot.scatter(r, f_r_test) The function ``nn_fit`` demonstrates how to propagate the map forward: one needs to (1) compute descriptors as a function of input ``r`` and (2) to feed them to the neural network. Advanced -------- It does not make sense to use anything but linear map for this particular problem with a known exact solution. However, if we chose to, for example, reduce the descriptors to ``g_0`` only or to chose different descriptors then the linear map becomes a crude approximation. In this case one typically switches to a more general neural-network setting with non-linear activation layers helping to map various curvature features in the dataset. One can use ``torch`` machinery for this or just otherwise increase ``n_layers`` when initializing the map. With ``n_layers=2``, for example, one sigmoid activation layer is inserted between two linear layers. The use of ``ml_util`` functions is entirely optional. They aim to provide default optimization setting to start with: loss functions (2-norm), optimizers (LBFGS), descriptor sets (Behler recipe). You are free to modify and replace these components at the point when you feel that the default choice is no longer optimal.