# Bayesian optimisation

 This tutorial uses a 3rd party toolbox (DACE) that has to be included in the src folder Available here: http://www.sumo.intec.ugent.be/ooDACE

# Introduction

In this tutorial we will follow the Bayesian approach to optimise a problem of engineering design [1]. Engineering systems are often modeled with complex objective functions with a high number of variables which can take a long time to evaluate.

We will evaluate a small number of design points to build a metamodel using Gaussian processes. This metamodel will be much faster to evaluate than the actual objective function.

However, as we won't sample the whole design space of our objective function (this is computationally expensive), this metamodel will be incomplete. We will build an expected improvement acquisition function to evaluate our metamodel and select the next design point to observe in the objective function to complete our metamodel.

The pseudo-code for the method (maximisation) can be written as:

1. Place a Gaussian process prior on f(x) (kernel function).

2. Observe f(x) at n_0 points.

3. while n <= N do:

Update the posterior probability distribution on f(x) using all available data

Let x_n be a maximiser of the acquisition function over x, where the acquisition function is computed using the current posterior distribution

Observe y_n = f(x_n)

Increment n

end while

4. return solution: either the point evaluated with the largest f(x) or the point with largest posterior mean.

The first point is available with an OpenCossan script and won't be necessary to rewrite it.

The things we will need to do are:

1. Define the problem and the objective function f(x).
2. Observe f(x) at n_0 points (we will use Latin Hypercube Sampling available with OpenCossan).
3. Create the metamodel using OpenCossan.
4. Write a script for the acquisition function (expected improvement).
5. Run the acquisition function and update the metamodel N times.
6. Show optimal.

# Problem definition

This method is best suited for optimisation over continuous domains of less than 20 dimensions, and Gaussian processes can tolerate noise in function evaluations. In this case we will take advantage of the cantilever beam problem defined in OpenCossan.

This is a cantilever beam of length L and rectangular cross section of width b and height h, loaded at the tip by a concentrated point load P.

The displacement w at the tip can be determined with the following equation:

$w = \frac{\rho g b h L^4}{8EI}+\frac{PL^3}{3EI}; \qquad I = \frac{bh^3}{12}$

where $\rho$ is the beam material density, g is the acceleration of gravity, E is the Young's modulus and I is the moment of inertia.

We will find the combination of width (b) and height (h) that minimises the displacement for a given length (L), load (P) and density ($\rho$).

The values to introduce are:

1. Width (b): uniform distribution [0.05, 0.3] m
2. Height (h): uniform distribution [0.2, 0.5] m
4. Length (L): 5 m
5. Young's modulus (E): 10.0e9 N/m^2
6. Density (rho): 600 kg/m^3
7. Moment of inertia (I): function b*h^(3/12)

## Input data

OpenCossan.resetRandomNumberGenerator(51120) % we get the same random numbers everytime

Xin = Input('Sdescription', 'input parameters of model');
Xb = RandomVariable('Sdescription', 'width', 'Sdistribution', 'uniform', 'par1', 0.05, 'par2', 0.3);
Xh = RandomVariable('Sdescription', 'height', 'Sdistribution', 'uniform', 'par1', 0.2, 'par2', 0.5);
Xrvset = RandomVariableSet('Cmembers', {'Xb','Xh'});
Xin = Xin.add('Xmember', Xrvset, 'Sname', 'Xrvset');
XP = Parameter('Sdescription', 'load', 'value', 1.0e6);
Xin = Xin.add('Xmember', XP, 'Sname', 'XP');
XL = Parameter('Sdescription', 'length', 'value', 5);
Xin = Xin.add('Xmember', XL, 'Sname', 'XL');
XE = Parameter('Sdescription', 'Youngs modulus', 'value', 10e9);
Xin = Xin.add('Xmember', XE, 'Sname', 'XE');
Xrho = Parameter('Sdescription', 'density', 'value', 600);
Xin = Xin.add('Xmember', Xrho, 'Sname', 'Xrho');
XI = Function('Sdescription', 'moment of inertia', 'Sexpression', '<%Xb%>.*<%Xh%>.^3/12');
Xin = Xin.add('Xmember', XI, 'Sname', 'XI');


## Construct objective function

We first need to create the MIO object containing the displacement equation

Xmio = Mio('Sdescription', 'displacement', 'SScript',...
'for i=1:length(Tinput), Toutput(i).disp=(Tinput(i).Xrho*9.81*Tinput(i).Xb*Tinput(i).Xh*Tinput(i).XL^4)/(8*Tinput(i).XE*Tinput(i).XI) +
(Tinput(i).XP*Tinput(i).XL^3)/(3*Tinput(i).XE*Tinput(i).XI);end',...
'Cinputnames', {'Xb', 'Xh', 'XP', 'XL', 'XE', 'Xrho', 'XI'}, 'Coutputnames', {'disp'}, 'Liostructure', true, 'Lfunction', false);


Now we create the evaluator and add it to a model

Xev = Evaluator('Sdescription', 'displacement evaluator', 'XMio', Xmio);
Xmod = Model('XEvaluator', Xev, 'Xinput', Xin);


# Metamodel

To initialise our metamodel we need to first evaluate the objective function a number of times. The more evaluations we run, the more complete our metamodel will be. However, as we increase the number of evaluations, we increase the computational time required. This is a trade-off.

We want to sample our design space in an intelligent way; for example, by not evaluating the same point more than one time, or not evaluating points that are very close in the design space.

There are many techniques available for sampling such as: Sobol sequences, Halton sampling, Latin hypercube, uniform or even random sampling. We will use Latin Hypercube, as it is already implemented in OpenCossan.

With random sampling we may end up evaluating points that are very close each other, missing some regions of the design space. Latin hypercube takes into account the location in the design space of each previous evaluation to avoid sample clustering and make sure that the whole design space is equally evaluated.

Once we sample our objective function n times, we will build our metamodel using the Kriging matlab script available with OpenCossan. This metamodel will describe potential values for f(x) at a candidate point x, providing a Bayesian posterior probability distribution. In our metamodel, the value of the mean and variance of the n points from the sampling will be exactly the obtained after evaluating the objective function. The mean and variance of any other value will be inferred from them.

## Sampling and building the metamodel

We first need to create the metamodel object using KrigingModel in OpenCossan, with Xb and Xh being the inputs and displacement being the output.

We will use a zero order polynomial regression function and initial hyperparameters of [0.1 0.1].

Xkriging = KrigingModel('Sdescription', 'metamodel', 'SregressionType', 'regpoly0',...
'Coutputnames', {'disp'}, 'Cinputnames', {'Xb' 'Xh'}, 'Xfullmodel', Xmod,...
'VcorrelationParameter', [0.1 0.1]);


Now we can apply latin hypercube sampling to train our GP model. We will start with 8 samples:

Xlhs = LatinHypercubeSampling('Nsamples', 8);
Xkriging = Xkriging.calibrate('XSimulator', Xlhs);


## Validation

Now we can run a validation to check the regression error (R^2) of our model.

Xmc = MonteCarlo('Nsamples', 100);
Xkriging = Xkriging.validate('XSimulator', Xmc);
display(Xkriging);


We should get a R-squared of 0.85. If we increase the number of samples used in the latin hypercube sampling, we will increase the R-squared value. The same way we increase the R-squared value of our model by increasing the number of samples used to create it, we get a better estimate of the R-squared value if we increase the number of samples in the MonteCarlo validation (usually lower R-squared). With 8 latin hypercube samples, if we validate the model with 10000 MonteCarlo samples, we get a R-squared of 0.77483.

# Optimisation

Once we have built our metamodel, it is possible to visualise how it differs from the real objective function. It is possible to create a mesh and evaluate both metamodel and objective function on these points to create two figures.

MXX1 = repmat(linspace(0.05,0.3,201)',1,201);
MXX2 = repmat(linspace(0.2,0.5,201),201,1);
Vx1 = MMX1(:); Vx2 = MMX2(:);
Minput = [Vx1, Vx2];

Xs = Samples('Xrvset', Xrvset, 'MsamplesPhysicalSpace', Minput);
Xin = Xin.add('Xmember', Xs, 'Sname', 'Xs');
Xoutreal = Xmio.run(Xin);
Xoutkr = Xkriging.apply(Xin);

f1 = figure(1);
mesh(MXX1, MXX2, reshape(Xoutreal.getValues('Sname', 'disp'), 201, 201));
f2 = figure(2);
mesh(MXX1, MXX2, reshape(Xoutkr.getValues('Sname', 'disp'), 201, 201));


We should get something like this:

We see our current metamodel differs from the actual objective function. It locates both minimum and maximum in different areas. This is because we created the model with only 8 evaluations of the objective function. The acquisition function we are implementing will update our metamodel trying to gain knowledge in areas of the design space that would improve the prediction of the metamodel.

## Metamodel updating

Before applying the acquisition function we need to learn how to train a metamodel with a given sample of points (instead of using the LatinHypercubeSampling method embedded in OpenCossan), and update it with a new extra point added to the sample. First we collect in an array the points created with the LatinHypercubeSampling method (we can get the output too):

input_points = Xkriging1.XcalibrationInput.Xsamples.MsamplePhysicalSpace;
output_points = Xkriging1.XcalibrationOutput.getValues('Sname', 'disp');


Now let's train a metamodel with input_points instead of the LHS method:

Xs = Samples('Xrvset', Xrvset, 'MsamplesPhysicalSpace', input_points);
Xin = Xin.add('Xmember', Xs, 'Sname', 'Xs');
Xkriging1 = Xkriging.calibrate('XcalibrationInput', Xin);

Xmc = MonteCarlo('Nsamples', 10000);
Xkriging1 = Xkriging1.validate('XSimulator', Xmc);
display(Xkriging1);


We select a point arbitrarily (for the sake of the demonstration) and check the value and error we get from our metamodel, and compare it with the real value using the real objective function:

new_point = [0.1, 0.35];
[y, s2] = Xkriging1.TdaceModel.predict(new_point)

Xs = Samples('Xrvset', Xrvset, 'MsamplesPhysicalSpace', new_point);
Xin = Xin.add('Xmember', Xs, 'Sname', 'Xs');
Xoutreal = Xmio.run(Xin);
j = Xoutreal.Tvalues(9); %because this is the 9th point we added
display(j(1))


The value our metamodel gave us (y) was 14.7194 m (with std = 76.2263), and the real value (j) was 11.6663 m. This is because our metamodel was trained with only 8 points and doesn't predicts correctly the displacement for the whole design space. However, we can train the metamodel now with this extra point and we will see that the new metamodel will predict the real value:

Xkriging1 = Xkriging.calibrate('XcalibrationInput', Xin);
[y, s2] = Xkriging1.TdaceModel.predict(new_point)


Now we see that the metamodel gets the actual value with a very low standard deviation (y = 11.6663 and std=7.7517e^-13).

## Acquisition function

Before applying the acquisition function iteratively, we will see how it works with only one iteration. We will need the mean prediction of the points we used to train our metamodel, as we will use the minimum to find the optimum (this will be the current optimum):

output_points = Xkriging1.XcalibrationOutput.getValues('Sname', 'disp');
yPlug = min(output_points);


Now let's apply the expected improvements to two arbitrarily chosen points (again for the sake of the demonstration). This should be made with, for example, a MonteCarlo method and a larger number of points:

points = [0.22, 0.45; 0.27, 0.49];
[gpMean, gpStd] = Xkriging1.TdaceModel.predict(points);
gpVar = sqrt(gpStd);

inds = gpVar > 10^-16;

EI = (yPlug-gpMean).*(0.5+0.5*erf((yPlug-gpMean)./(sqrt(2)*gpStd)))+1/sqrt(2*pi)*gpVar.*exp(-(yPlug-gpMean).^2./(2*gpVar.^2));
EI(~inds) = 0;


EI will tell us the expected improvement of each point. To maximise the learning of our metamodel we should train it again in the point with the maximum expected improvement.

[maxEI, location] = max(EI);
display(points(location,:))


The expected improvement function is telling us that, from the pool of points we arbitrarily created, the one we should use to train the metamodel is the second one [0.27, 0.49].

new_point = points(location,:);
Xs = Samples('Xrvset', Xrvset, 'MsamplesPhysicalSpace', new_point);
Xin = Xin.add('Xmember', Xs, 'Sname', 'Xs');
Xkriging1 = Xkriging.calibrate('XcalibrationInput', Xin);


Now we have applied the expected improvement acquisition function to calibrate our metamodel. We should apply this iteratively N times and then find the optimum from the metamodel.

## Iterate

Let's just rewrite the previous lines of code in a loop, to automatise it for 200 points:

rng(1) % matlab random number generator seed, so we have the same results
N = 200;
n = 1;
while n <= N
output_points = Xkriging1.XcalibrationOutput.getValues('Sname', 'disp');
yPlug = min(output_points);
b_points = 0.05 + (0.3-0.05)*rand(200,1); % random numbers between 0.05 and 0.3
h_points = 0.2 + (0.5-0.2)*rand(200,1); % random numbers between 0.2 and 0.5
points = [b_points, h_points];
[gpMean, gpStd] = Xkriging1.TdaceModel.predict(points);
gpVar = sqrt(gpStd);
inds = gpVar > 10^-16;
EI = (yPlug-gpMean).*(0.5+0.5*erf((yPlug-gpMean)./(sqrt(2)*gpStd)))+1/sqrt(2*pi)*gpVar.*exp(-(yPlug-gpMean).^2./(2*gpVar.^2));
EI(~inds) = 0;
[maxEI, location] = max(EI);
display(points(location,:))
new_point = points(location,:);
Xs = Samples('Xrvset', Xrvset, 'MsamplesPhysicalSpace', new_point);
Xin = Xin.add('Xmember', Xs, 'Sname', 'Xs');
Xkriging1 = Xkriging.calibrate('XcalibrationInput', Xin);
n = n + 1;
end


The metamodel will look now similar to the real model:

## Solution

We can now just find the minimum of our metamodel (since evaluating this will be much faster than the actual objective function) from, lets say, 10^6 evaluations.

MXX1 = repmat(linspace(0.05,0.3,1000)',1,1000); % 1000 points from 0.05 to 0.3
MXX2 = repmat(linspace(0.2,0.5,1000),1000,1); % 1000 points from 0.2 to 0.5
Vx1 = MXX1(:); Vx2 = MXX2(:);
Minput = [Vx1, Vx2];
[gpMean, gpStd] = Xkriging1.TdaceModel.predict(Minput);
[optimum, optimum_location] = min(gpMean)


The minimum is 1.3016 m with a stardard deviation of 0.7526. Corresponding to b=0.2945 m and h=0.4979 m.

We can plot the points found by the acquisition function to train the metamodel:

This method can be improved in many ways, but one easy way to do it would be by increasing the number of points generated with MonteCarlo in the iteration part (we used 200, but if we use 10000 the metamodel looks different).

As the objective is to find the minimum, with 10000 random points the expected improvement acquisition function seems to identify better where the minimum should be. This is why we can see the cluster of sampling points in the top right part of the contour plot. The acquisition function is telling the GP method to sample there to get a better estimate of the minimum.

With 10000 points in the MonteCarlo section the minimum is 1.3353 m with a standard deviation of 0.0322 (much lower, as we have many more points around the area where the minimum is). Corresponding to b=0.2995 m and h=0.4997 m (closer to the actual minimum than the previous result).

# References

1. Jones, D.R., Schonlau, M. and Welch, W.J., 1998. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), pp.455-492.