## Random forests surrogate tutorial

Random forests is a supervised learning algorithm that randomly creates and merges multiple decision trees into one forest.

We are going to use a Random forests surrogate to optimize $f(x)=sin(x)+sin(10/3 * x)$.

First of all import `Surrogates`

and `Plots`

.

```
using Surrogates
using Plots
default()
```

### Sampling

We choose to sample f in 4 points between 0 and 1 using the `sample`

function. The sampling points are chosen using a Sobol sequence, this can be done by passing `SobolSample()`

to the `sample`

function.

```
f(x) = sin(x) + sin(10/3 * x)
n_samples = 5
lower_bound = 2.7
upper_bound = 7.5
x = sample(n_samples, lower_bound, upper_bound, SobolSample())
y = f.(x)
scatter(x, y, label="Sampled points", xlims=(lower_bound, upper_bound))
plot!(f, label="True function", xlims=(lower_bound, upper_bound), legend=:top)
```

### Building a surrogate

With our sampled points we can build the Random forests surrogate using the `RandomForestSurrogate`

function.

`randomforest_surrogate`

behaves like an ordinary function which we can simply plot. Addtionally you can specify the number of trees created using the parameter num_round

```
num_round = 2
randomforest_surrogate = RandomForestSurrogate(x ,y ,lower_bound, upper_bound, num_round = 2)
plot(x, y, seriestype=:scatter, label="Sampled points", xlims=(lower_bound, upper_bound), legend=:top)
plot!(f, label="True function", xlims=(lower_bound, upper_bound), legend=:top)
plot!(randomforest_surrogate, label="Surrogate function", xlims=(lower_bound, upper_bound), legend=:top)
```

### Optimizing

Having built a surrogate, we can now use it to search for minimas in our original function `f`

.

To optimize using our surrogate we call `surrogate_optimize`

method. We choose to use Stochastic RBF as optimization technique and again Sobol sampling as sampling technique.

```
@show surrogate_optimize(f, SRBF(), lower_bound, upper_bound, randomforest_surrogate, SobolSample())
scatter(x, y, label="Sampled points")
plot!(f, label="True function", xlims=(lower_bound, upper_bound), legend=:top)
plot!(randomforest_surrogate, label="Surrogate function", xlims=(lower_bound, upper_bound), legend=:top)
```

## Random Forest ND

First of all we will define the `Bukin Function N. 6`

function we are going to build surrogate for.

```
function bukin6(x)
x1=x[1]
x2=x[2]
term1 = 100 * sqrt(abs(x2 - 0.01*x1^2));
term2 = 0.01 * abs(x1+10);
y = term1 + term2;
end
```

`bukin6 (generic function with 1 method)`

### Sampling

Let's define our bounds, this time we are working in two dimensions. In particular we want our first dimension `x`

to have bounds `-5, 10`

, and `0, 15`

for the second dimension. We are taking 50 samples of the space using Sobol Sequences. We then evaluate our function on all of the sampling points.

```
n_samples = 50
lower_bound = [-5.0, 0.0]
upper_bound = [10.0, 15.0]
xys = sample(n_samples, lower_bound, upper_bound, SobolSample())
zs = bukin6.(xys);
```

```
50-element Array{Float64,1}:
337.5008932148196
50.077894542763865
278.1482001224633
144.741627014022
297.34172797144976
236.82569468282983
364.77730599103734
107.31580991739821
288.43082001226617
199.40108703355227
⋮
245.71026580746002
379.2949065629214
224.57861320373783
346.1622092497987
107.16557646894123
294.1946966016903
128.62475225964087
301.415259049581
220.7829314530664
```

### Building a surrogate

Using the sampled points we build the surrogate, the steps are analogous to the 1-dimensional case.

`RandomForest = RandomForestSurrogate(xys, zs, lower_bound, upper_bound)`

`(::RandomForestSurrogate{Array{Tuple{Float64,Float64},1},Array{Float64,1},XGBoost.Booster,Array{Float64,1},Array{Float64,1},Int64}) (generic function with 2 methods)`

### Optimizing

With our surrogate we can now search for the minimas of the function.

Notice how the new sampled points, which were created during the optimization process, are appended to the `xys`

array. This is why its size changes.

`size(xys)`

`(50,)`

`surrogate_optimize(bukin6, SRBF(), lower_bound, upper_bound, RandomForest, SobolSample(), maxiters=20)`

`((8.9453125, 0.8203125), 14.376187341667872)`

`size(xys)`

`(50,)`