Get started#

This brief introduction will teach you in detail how to install NUBO from the GitHub repository and how to set up a Bayesian optimisation loop to maximise a toy function using NUBO’s predefined Gaussian process as the surrogate model. You can also use one of our off-the-shelf algorithm to get started quickly. For more details see the Off-the-shelf algorithms section in the menu on the left.

Installing NUBO#

Install NUBO and all its dependencies directly from the Python Package Index PyPI using the Python package manager pip with the following code. We recommend the use of a virtual environment.

pip install nubopy

Optimising a toy function with NUBO#

First, we set up the toy function we want to optimise. In this case, we choose the 6-dimensional Hartmann function, a multi-modal function with one global optimum. This synthetic test function acts as a substitute for a black-box objective function, such as an experiment or a simulation. The bounds of the input space are defined as a two-dimensional torch.Tensor where the first row gives the lower bounds for all input dimensions and the second row gives the corresponding upper bounds.

import torch
from nubo.test_functions import Hartmann6D


# test function
func = Hartmann6D(minimise=False)
dims = 6

# specify bounds
bounds = torch.tensor([[0., 0., 0., 0., 0., 0.], [1., 1., 1., 1., 1., 1.]])

Then, we generate some initial training data. We decide to generate 5 data points per input dimension resulting in a total of 30 data points.

from nubo.utils import gen_inputs


# training data
x_train = gen_inputs(num_points=dims*5,
                     num_dims=dims,
                     bounds=bounds)
y_train = func(x_train)

In NUBO, training inputs x_train should be a two-dimensional torch.Tensor (a matrix), where the rows are individual points and the columns are individual dimensions. In this example, our training data has size 30 x 6. The training outputs y_train should be a one-dimensional torch.Tensor (a vector) with one entry for each training input (here y_train has size 30).

Now we can prepare the Bayesian optimisation loop. We choose NUBO’s predefined Gaussian process that by default has a constant mean function and a Matern 5/2 kernel. We also use the Gaussian likelihood to estimate observational noise. We estimate the Gaussian processes hyper-parameters via maximum likelihood estimation (MLE) using the Adam optimiser. For the acquisition function, we implement the analytical upper confidence bound (UCB) with a trade-off parameter \(\beta = 1.96^2\) (corresponding to 95% confidence intervals for the Gaussian distribution) and optimise it with the L-BFGS-B algorithm using a multi-start approach with five starts. These multiple starts help to ensure that the optimiser does not get stuck in a local optimum. The Bayesian optimisation loop is run for 40 iterations, giving a total evaluation budget of 70.

from nubo.acquisition import UpperConfidenceBound
from nubo.models import GaussianProcess, fit_gp
from nubo.optimisation import single
from gpytorch.likelihoods import GaussianLikelihood


# Bayesian optimisation loop
iters = 40

for iter in range(iters):

    # specify Gaussian process
    likelihood = GaussianLikelihood()
    gp = GaussianProcess(x_train, y_train, likelihood=likelihood)

    # fit Gaussian process
    fit_gp(x_train, y_train, gp=gp, likelihood=likelihood, lr=0.1, steps=200)

    # specify acquisition function
    acq = UpperConfidenceBound(gp=gp, beta=1.96**2)

    # optimise acquisition function
    x_new, _ = single(func=acq, method="L-BFGS-B", bounds=bounds, num_starts=5)

    # evaluate new point
    y_new = func(x_new)

    # add to data
    x_train = torch.vstack((x_train, x_new))
    y_train = torch.hstack((y_train, y_new))

    # print new best
    if y_new > torch.max(y_train[:-1]):
        print(f"New best at evaluation {len(y_train)}: \t Inputs: {x_new.numpy().reshape(dims).round(4)}, \t Outputs: {-y_new.numpy().round(4)}")
New best at evaluation 31:   Inputs: [0.477  0.0444 0.0736 0.2914 0.3603 0.7323],    Outputs: [-1.9494]
New best at evaluation 34:   Inputs: [0.4453 0.0418 0.0483 0.3164 0.3478 0.6925],    Outputs: [-2.0684]
New best at evaluation 39:   Inputs: [0.4127 0.1638 0.     0.277  0.3385 0.679 ],    Outputs: [-2.1595]
New best at evaluation 40:   Inputs: [0.3715 0.1565 0.     0.3261 0.3372 0.7126],    Outputs: [-2.1843]
New best at evaluation 41:   Inputs: [0.3589 0.134  0.3895 0.2927 0.3222 0.7003],    Outputs: [-2.9809]
New best at evaluation 42:   Inputs: [0.2754 0.1478 0.425  0.2529 0.3054 0.6874],    Outputs: [-3.2027]
New best at evaluation 46:   Inputs: [0.1473 0.1864 0.427  0.2906 0.2993 0.666 ],    Outputs: [-3.2302]
New best at evaluation 51:   Inputs: [0.1764 0.1303 0.4576 0.3022 0.3029 0.6827],    Outputs: [-3.2657]
New best at evaluation 52:   Inputs: [0.2016 0.1447 0.4616 0.2798 0.3018 0.6716],    Outputs: [-3.31]
New best at evaluation 53:   Inputs: [0.2063 0.144  0.465  0.2787 0.3138 0.6519],    Outputs: [-3.3192]
New best at evaluation 58:   Inputs: [0.205  0.1516 0.4686 0.2725 0.3137 0.6614],    Outputs: [-3.3206]
New best at evaluation 66:   Inputs: [0.2096 0.142  0.4767 0.2757 0.3112 0.6573],    Outputs: [-3.3209]
New best at evaluation 70:   Inputs: [0.2076 0.1527 0.4728 0.2802 0.3109 0.6594],    Outputs: [-3.321]

Finally, we print the overall best solution: we get -3.3210 on evaluation 70, which approximates the true optimum of -3.3224.

# results
best_iter = int(torch.argmax(y_train))
print(f"Evaluation: {best_iter+1} \t Solution: {-float(y_train[best_iter]):.4f}")
Evaluation: 70       Solution: -3.3210

The estimated parameters of the Gaussian process can be viewed as follows:

# estimated parameters
print(f"Mean function constant: {gp.mean_module.constant.item()}")
print(f"Covariance kernel output-scale: {gp.covar_module.outputscale.item()}")
print(f"Covariance kernel length-scale: {gp.covar_module.base_kernel.lengthscale.detach()}")
print(f"Estimated noise/nugget: {likelihood.noise.item()}")
Mean function constant: 0.1073
Covariance kernel output-scale: 0.2943
Covariance kernel length-scale: tensor([[0.5552, 0.5305, 0.6730, 0.3610, 0.2741, 0.3786]])
Estimated noise/nugget: 0.0001