Practical considerations#

Below are some considerations that should be taken into account when deciding on the design of the Bayesian optimisation loop with NUBO. This section features some of the most common questions about Bayesian optimisation and NUBO and is frequently updated.

General#

How many initial data points do I need?

A rule of thumb for Gaussian process models is to have at least 10 points per input dimension [1] [4] [9]. However, empirical evidence shows that reducing this to 5 or even 1 point(s) per input dimension does not result in worse solutions for Bayesian optimisation [3].

How does NUBO optimise a mixed parameter space with continuous and discrete variables?

NUBO supports the optimisation over a mixed parameter space by fixing a combination of the discrete inputs and optimising over the remaining continuous inputs. The best point found over all possible discrete combinations is used. While this avoids issues due to rounding, it can be time-consuming for many discrete dimensions and possible values.

Gaussian process#

What prior mean function and prior covariance kernel should I use?

For practical Bayesian optimisation, a zero or constant mean function with a Matern 5/2 kernel is recommended [10]. Other kernels, such as the RBF kernel, might be too smooth to be able to represent realistic experiments and simulations.

What likelihood should I specify?

For exact Gaussian processes, GPyTorch provides two main options that differ with regards to their computation of the observational noise \(\sigma^2\): The GaussianLikelihood estimates the observation noise while the FixedNoiseGaussianLikelihood holds it fixed. If you cannot measure the observational noise, the former likelihood is recommended. If you have a clear idea of the observational noise the latter can also be used. Then, you can decide if you want the Gaussian process to also estimate any additional noise besides the observational noise [5].

Acquisition function#

Which acquisition function should I use?

NUBO supports two acquisition functions: Expected improvement (EI) [6] and upper confidence bound (UCB) [11]. While both are widely-used options that have proven to give good results, there is empirical evidence that UCB performs better on a wider range of synthetic test functions [3].

Should I use analytical or Monte Carlo acquisition functions?

We recommend using analytical acquisition functions for sequential single-point optimisation problems. Where it is advantageous to evaluate potential solutions in parallel, Monte Carlo acquisition functions allow the computation of batches. Furthermore, if you want to continue the optimisation loop while some potential solutions are still being evaluated, Monte Carlo acquisition functions enable asynchronous optimisation [10] [12].

Which optimiser should I choose?

We recommend L-BFGS-B [13] for analytical acquisition functions and SLSQP [8] for constrained analytical acquisition functions. For Monte Carlo acquisition functions, the stochastic optimiser Adam [7] should be used if the base samples are resampled. If you decide to fix the base samples, deterministic optimisers can be used in the same way as for the analytical acquisition functions. While fixing the base samples could introduce some sampling bias, there is empirical evidence that it does not affect performance negatively [2].