Random forests surrogate tutorial

Random forests is a supervised learning algorithm that randomly creates and merges multiple decision trees into one forest.

We are going to use a Random forests surrogate to optimize $f(x)=sin(x)+sin(10/3 * x)$.

First of all import Surrogates and Plots.

using Surrogates
using Plots
default()

Sampling

We choose to sample f in 4 points between 0 and 1 using the sample function. The sampling points are chosen using a Sobol sequence, this can be done by passing SobolSample() to the sample function.

f(x) = sin(x) + sin(10/3 * x)
n_samples = 5
lower_bound = 2.7
upper_bound = 7.5
x = sample(n_samples, lower_bound, upper_bound, SobolSample())
y = f.(x)
scatter(x, y, label="Sampled points", xlims=(lower_bound, upper_bound))
plot!(f, label="True function", xlims=(lower_bound, upper_bound), legend=:top)

Building a surrogate

With our sampled points we can build the Random forests surrogate using the RandomForestSurrogate function.

randomforest_surrogate behaves like an ordinary function which we can simply plot. Addtionally you can specify the number of trees created using the parameter num_round

num_round = 2
randomforest_surrogate = RandomForestSurrogate(x ,y ,lower_bound, upper_bound, num_round = 2)
plot(x, y, seriestype=:scatter, label="Sampled points", xlims=(lower_bound, upper_bound), legend=:top)
plot!(f, label="True function",  xlims=(lower_bound, upper_bound), legend=:top)
plot!(randomforest_surrogate, label="Surrogate function",  xlims=(lower_bound, upper_bound), legend=:top)

Optimizing

Having built a surrogate, we can now use it to search for minimas in our original function f.

To optimize using our surrogate we call surrogate_optimize method. We choose to use Stochastic RBF as optimization technique and again Sobol sampling as sampling technique.

@show surrogate_optimize(f, SRBF(), lower_bound, upper_bound, randomforest_surrogate, SobolSample())
scatter(x, y, label="Sampled points")
plot!(f, label="True function",  xlims=(lower_bound, upper_bound), legend=:top)
plot!(randomforest_surrogate, label="Surrogate function",  xlims=(lower_bound, upper_bound), legend=:top)

Random Forest ND

First of all we will define the Bukin Function N. 6 function we are going to build surrogate for.

function bukin6(x)
    x1=x[1]
    x2=x[2]
    term1 = 100 * sqrt(abs(x2 - 0.01*x1^2));
    term2 = 0.01 * abs(x1+10);
    y = term1 + term2;
end
bukin6 (generic function with 1 method)

Sampling

Let's define our bounds, this time we are working in two dimensions. In particular we want our first dimension x to have bounds -5, 10, and 0, 15 for the second dimension. We are taking 50 samples of the space using Sobol Sequences. We then evaluate our function on all of the sampling points.

n_samples = 50
lower_bound = [-5.0, 0.0]
upper_bound = [10.0, 15.0]

xys = sample(n_samples, lower_bound, upper_bound, SobolSample())
zs = bukin6.(xys);
50-element Array{Float64,1}:
 337.5008932148196
  50.077894542763865
 278.1482001224633
 144.741627014022
 297.34172797144976
 236.82569468282983
 364.77730599103734
 107.31580991739821
 288.43082001226617
 199.40108703355227
   ⋮
 245.71026580746002
 379.2949065629214
 224.57861320373783
 346.1622092497987
 107.16557646894123
 294.1946966016903
 128.62475225964087
 301.415259049581
 220.7829314530664

Building a surrogate

Using the sampled points we build the surrogate, the steps are analogous to the 1-dimensional case.

RandomForest = RandomForestSurrogate(xys, zs,  lower_bound, upper_bound)
(::RandomForestSurrogate{Array{Tuple{Float64,Float64},1},Array{Float64,1},XGBoost.Booster,Array{Float64,1},Array{Float64,1},Int64}) (generic function with 2 methods)

Optimizing

With our surrogate we can now search for the minimas of the function.

Notice how the new sampled points, which were created during the optimization process, are appended to the xys array. This is why its size changes.

size(xys)
(50,)
surrogate_optimize(bukin6, SRBF(), lower_bound, upper_bound, RandomForest, SobolSample(), maxiters=20)
((8.9453125, 0.8203125), 14.376187341667872)
size(xys)
(50,)