What information you have access to can determine what type of solver you should use. In general, most solvers will require you to provide a piece of code which computes $f(x)$ for any $x$ (and $c(x)$ for any constraints). Usually you will also need to be able to compute the gradient of the objective (and all constraints), that is provide code which returns the $n$-dimensional vector $\nabla f(x)$ for any $x$ (where the $i$-th entry of $\nabla f(x)$ is the derivative of $f$ with respect to the $i$-th entry of $x$). You can implement a gradient calculation by explicitly calculating the derivatives (if you know the explicit mathematical form of $f$), or use finite differencing techniques or algorithmic differentiation software packages.
For nonlinear least-squares problems and problems with multiple constraints, often you will need to write code to return a vector of results (e.g. given $x$, return the length-$m$ vector $r(x)=[r_1(x) \: \cdots \: r_m(x)]$ for nonlinear least-squares problems). In this case, instead of the gradient $\nabla f(x)$ you will need to provide the Jacobian matrix of first derivatives. For instance, if you have a function $r(x)=[r_1(x) \: \cdots \: r_m(x)]$ where $x$ has $n$ entries, the Jacobian matrix has $m$ rows and $n$ columns and the $i$-th row is $\nabla r_i(x)$.
Some solvers also allow you to provide code to calculate the second derivatives of $f$ (the Hessian of $f$), but this is usually not necessary. In general the more information you can provide, the better the optimization solver will perform, so it is sometimes worth the effort of writing Hessian code.
If you are writing code which returns very large matrices (e.g. Jacobians or Hessians), it is often useful to think about the sparsity of these matrices: which entries are always zero for every $x$? Many solvers allow you to provide Jacobians or Hessians as sparse matrices, which can give substantial savings on memory and computation time.
If your calculation of $f$ has some randomness (e.g. it requires a Monte Carlo simulation) or is very computationally expensive, then evaluating $\nabla f(x)$ may be very expensive or inaccurate. In this case, you may want to consider derivative-free solvers, which only require code to evaluate $f$.
These are some software packages which we have had some involvement with, directly or indirectly.
Software packages:
Other resources:
This section gives details of how to install the extensive CUTEst library of optimization test problems (paper) - earlier versions were called CUTE (1995) and CUTEr (2003) - as well as providing some standard collections of optimization test problems. CUTEst is a Fortran library, but we give instructions for installing wrappers for Matlab and Python. The other problem sets are implemented in Python.
mkdir cutest
cd cutest
git clone https://github.com/ralna/ARCHDefs ./archdefs
git clone https://github.com/ralna/SIFDecode ./sifdecode
git clone https://github.com/ralna/CUTEst ./cutest
git clone https://bitbucket.org/optrove/sif ./mastsif
Note that mastsif
contains all the test problem definitions and is therefore quite large.
If you're short on space you may want to copy only the *.SIF files for the problems you wish to test on.
Next set the following environment variables in your ~/.bashrc
to point to the installation directories used above:
# CUTEst
export ARCHDEFS=/path/to/cutest/archdefs/
export SIFDECODE=/path/to/cutest/sifdecode/
export MASTSIF=/path/to/cutest/mastsif/
export CUTEST=/path/to/cutest/cutest/
export MYARCH="pc64.lnx.gfo"
export MYMATLAB=/path/to/matlab/ # if using Matlab interface
Now you are ready to compile CUTEst using the interactive install script:
source ~/.bashrc # load above environment variables
cd ./cutest
$ARCHDEFS/install_optrove
Answer the questions as appropriate, in particular answer the following as shown:
Do you wish to install CUTEst (Y/n)? y
Do you require the CUTEst-Matlab interface (y/N)? y # if using Matlab interface (see below)
Select platform: # PC with generic 64-bit processor
Select operating system: # Linux
Select Matlab version: # R2016b or later
Would you like to compile SIFDecode ... (Y/n)? y
Would you like to compile CUTEst ... (Y/n)? y
CUTEst may be compiled in (S)ingle or (D)ouble precision or (B)oth.
Which precision do you require for the installed subset (D/s/b) ? D
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Then you can easily install CUTEst:
brew tap optimizers/cutest
brew install cutest --without-single --with-matlab # if using Matlab interface
brew install mastsif # if you want all the test problems
for f in "archdefs" "mastsif" "sifdecode" "cutest"; do \
echo ". $(brew --prefix $f)/$f.zshrc" >> ~/.zshrc; \
done
If you installed CUTEst with Matlab support, you also need to define an environment variable pointing to your local Matlab installation in your ~/.zshrc
, e.g.,
export MYMATLAB=/Applications/MATLAB_R2023b.app
PATH
. I like to create ~/bin
and add it to my PATH
. Make the script executable:
mkdir ~/bin # if not already done
cp Downloads/cutest2matlab_macos.sh ~/bin/
chmod +x ~/bin/cutest2matlab_macos.sh
export PATH=~/bin:$PATH # if not already done; add this to your ~/.zshrc
Now you have to compile the CUTEst problems you wish to use.
For example, if you wish to use the ROSENBR problem (Rosenbrock function), create a folder for it and run cutest2matlab_macos.sh
:
mkdir rosenbrock
cd rosenbrock
cutest2matlab_macos.sh ROSENBR
This should produce two shared libraries:
libROSENBR.dylib
mcutest.mexmaca64
In Matlab, add the CUTEst tools to your path and load up the problem:
addpath('/opt/homebrew/opt/cutest/libexec/src/matlab')
cd /the/directory/where/you/ran/cutest2matlab_macos.sh
prob = cutest_setup()
If that works, you should see a Matlab struct and you should be able to write, e.g.,
f = cutest_obj(prob.x)
~/Library/Application\ Support/Mathworks/MATLAB/R2018a/
(substitute your Matlab version).
Then download the shell script below
and place it somewhere on your PATH
. I like to create ~/bin
and add it to my PATH
. Make the script executable:
mkdir ~/bin # if not already done
cp Downloads/cutest2matlab_osx.sh ~/bin/
chmod +x ~/bin/cutest2matlab_osx.sh
export PATH=~/bin:$PATH # if not already done; add this to your ~/.zshrc
Now you have to compile the CUTEst problems you wish to use.
For example, if you wish to use the ROSENBR problem (Rosenbrock function), create a folder for it and run cutest2matlab_osx.sh
:
mkdir rosenbrock
cd rosenbrock
cutest2matlab_osx.sh ROSENBR
This should produce two shared libraries:
libROSENBR.dylib
mcutest.mexmaci64
In Matlab, add the CUTEst tools to your path and load up the problem:
addpath('/usr/local/opt/cutest/libexec/src/matlab')
cd /the/directory/where/you/ran/cutest2matlab_osx.sh
prob = cutest_setup()
If that works, you should see a Matlab struct and you should be able to write, e.g.,
f = cutest_obj(prob.x)
It is possible to run Linux GUI programs too, but this is not enabled by default. For this, you need to:
.bashrc
file in your Linux file system to include the lines
export DISPLAY=localhost:0.0
export LIBGL_ALWAYS_INDIRECT=1
apt-get install wslu
). wslusc
to create a shortcut in Windows to start your desired GUI application (run man wslusc
in bash for instructions). Alternatively, you can do this manually by creating a .bat
Windows script file which contains (if you have Ubuntu 18.04)
C:\Windows\System32\wscript.exe C:\\Users\\username\wslu\runHidden.vbs C:\\Users\\username\\AppData\\Local\\Microsoft\\WindowsApps\\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\\ubuntu1804.exe run "cd ~; . /usr/share/wslu/wsl-integration.sh; /path/to/gui/executable; "
making sure to include your C:\\Users\\username
directory and edit the /path/to/gui/executable
. You should double-check the path to your Linux executable to make sure this runs. Double-click on the .bat
file to execute it.
To obtain a Python interface to CUTEst, please install PyCUTEst: https://github.com/jfowkes/pycutest
from MGH import ExtendedRosenbrock
rb = ExtendedRosenbrock(n=4, m=4)
x0 = rb.initial # initial point
rb.r(x0) # residual vector at x0
rb.jacobian(x0) # Jacobian at x0
where n is the problem dimension and m the number of residuals in the objective.
from more_wild import *
f, x0, n, m = get_problem_as_scalar_objective(probnum)
or get the vector function of residuals:
from more_wild import *
r, x0, n, m = get_problem_as_residual_vector(probnum)
Outputs:
from more_wild import *
r, x0, n, m = get_problem_as_residual_vector(probnum, noise_type='multiplicative_gaussian', noise_level=1e-2)
f, x0, n, m = get_problem_as_scalar_objective(probnum, noise_type='multiplicative_gaussian', noise_level=1e-2)
where the different options for noise_type
are:
smooth
(default): no noise addedmultiplicative_deterministic
: each $r_i(\mathbf{x})$ is replaced by $\sqrt{1+\sigma \phi(\mathbf{x})} r_i(\mathbf{x})$, where $\sigma$ is equal to noise_level
and $\phi(\mathbf{x})$ is a deterministic high frequency function taking values in $[-1,1]$ (see equation (4.2) of the Moré & Wild paper). The resulting function is still deterministic, but it is no longer smooth.multiplicative_uniform
: each $r_i(\mathbf{x})$ is replaced by $(1+\sigma) r_i(\mathbf{x})$, where $\sigma$ is uniformly distributed between -noise_level
and noise_level
(sampled independently for each $r_i(\mathbf{x})$).multiplicative_gaussian
: each $r_i(\mathbf{x})$ is replaced by $(1+\sigma) r_i(\mathbf{x})$, where $\sigma$ is normally distributed with mean 0 and standard deviation noise_level
(sampled independently for each $r_i(\mathbf{x})$).additive_gaussian
: each $r_i(\mathbf{x})$ is replaced by $r_i(\mathbf{x}) + \sigma$, where $\sigma$ is normally distributed with mean 0 and standard deviation noise_level
(sampled independently for each $r_i(\mathbf{x})$).additive_chi_square
: each $r_i(\mathbf{x})$ is replaced by $\sqrt{r_i(\mathbf{x})^2 + \sigma^2}$, where $\sigma$ is normally distributed with mean 0 and standard deviation noise_level
(sampled independently for each $r_i(\mathbf{x})$).To compare solvers, we use data and performance profiles as defined by Moré & Wild (2009). First, for each solver $\mathcal{S}$, each problem $p$ and for an accuracy level $\tau\in(0,1)$, we determine the number of function evaluations $N_p(\mathcal{S};\tau)$ required for a problem to be `solved': $$ N_p(\mathcal{S}; \tau) := \text{# objective evals required to get $f(\mathbf{x}_k) \leq f^* + \tau(f(\mathbf{x}_0) - f^*)$,} $$ where $f^*$ is an estimate of the true minimum $f(\mathbf{x}^*)$. (Note: sometimes $f^*$ is taken to be the best value achieved by any solver.) We define $N_p(\mathcal{S}; \tau)=\infty$ if this was not achieved in the maximum computational budget allowed. This measure is suitable for derivative-free solvers (where no gradients are available), but alternative measures of success can be used if gradients are available (see next section).
We can then compare solvers by looking at the proportion of test problems solved for a given computational budget. For data profiles, we normalise the computational effort by problem dimension, and plot (for solver $\mathcal{S}$, accuracy level $\tau\in(0,1)$ and problem suite $\mathcal{P}$) $$ d_{\mathcal{S}, \tau}(\alpha) := \frac{|\{p\in\mathcal{P} : N_p(\mathcal{S};\tau) \leq \alpha(n_p+1)\}|}{|\mathcal{P}|}, \qquad \text{for $\alpha\in[0,N_g]$,} $$ where $N_g$ is the maximum computational budget, measured in simplex gradients (i.e. $N_g(n_p+1)$ objective evaluations are allowed for problem $p$).
For performance profiles (originally proposed by Dollan & Moré (2002)), we normalise the computational effort by the minimum effort needed by any solver (i.e. by problem difficulty). That is, we plot $$ \pi_{\mathcal{S},\tau}(\alpha) := \frac{|\{p\in\mathcal{P} : N_p(\mathcal{S};\tau) \leq \alpha N_p^*(\tau)\}|}{|\mathcal{P}|}, \qquad \text{for $\alpha\geq 1$,} $$ where $N_p^*(\tau) := \min_{\mathcal{S}} N_p(\mathcal{S};\tau)$ is the minimum budget required by any solver.
When multiple test runs are used, we take average data and performance profiles over multiple runs of each solver; that is, for each $\alpha$ we take an average of $d_{\mathcal{S},\tau}(\alpha)$ and $\pi_{\mathcal{S},\tau}(\alpha)$. When plotting performance profiles, we took $N_p^*(\tau)$ to be the minimum budget required by any solver in any run.
The above was designed comparing derivative-free solvers, where gradients are not available. If, however, derivative information is available, it may be better to replace the definition above with $$ N_p(\mathcal{S}; \tau) := \text{# objective evals required to get $\|\nabla f(\mathbf{x}_k)\| \leq \tau \|\nabla f(\mathbf{x}_0)\|$.} $$ The above script can handle this definition by a modification of the input files:
Problem number, evaluation number, objective value
As an example, the below files have the results for running the derivative-free solver BOBYQA (paper and code) on the Moré & Wild test set. The three files use three different settings (varying the number of interpolation points used to construct models of the objective: $n+2$, $2n+1$ and $(n+1)(n+2)/2$ respectively), and a budget of $10(n+1)$ objective evaluations.
For instance, the 'np2' file looks like:
1,1,71.999999999999986
1,2,72.410000006258514
...
53,89,1965397.5762139191
53,90,1953998.3397798422
showing that the first evaluation for the first problem had objective value 72, and the final (90th) evaluation for the last problem (53) had objective value 1953998.
from plotting import *
problem_info_file = '../mw/more_wild_info.csv' # problem suite information
infile = 'raw_bobyqa_budget10_np2.csv'
outfile = 'clean_bobyqa_budget10_np2.csv'
solved_times = get_solved_times_for_file(problem_info_file, infile)
solved_times.to_csv(outfile)
The output file has, for each test problem: dimension, minimum value found by the solver, objective evaluations used, and the number of evaluations required to achieve accuracy $\tau$ for $\tau\in\{0.1, 0.01, \ldots, 10^{-10}\}$, where a value of -1 indicates the solver did not achieve the required accuracy within the allowed budget.
Setting,n,fmin,nf,tau1,tau2,tau3,tau4,tau5,tau6,tau7,tau8,tau9,tau10
1,9,36.0000000021001,100,18,29,41,49,54,64,71,85,93,98
...
53,8,1953998.3397798422,90,12,12,21,23,-1,-1,-1,-1,-1,-1
# each entry is tuple = (filename_stem, label, colour, linestyle, [marker], [markersize])
solver_info = []
solver_info.append(('clean_bobyqa_budget10_np2', r'$n+2$ points', 'b', '-', '.', 12))
solver_info.append(('clean_bobyqa_budget10_2np1', r'$2n+1$ points', 'r', '--'))
solver_info.append(('clean_bobyqa_budget10_nsq', r'$(n+1)(n+2)/2$ points', 'k', '-.'))
In its simplest, we need an output file stem, the solver plotting information, problem suite information, the set of $\tau$ levels to plot, and the maximum budget (in simplex gradients, $N_g$):
from plotting import *
tau_levels = [1, 3]
budget = 10
outfile_stem = 'demo'
# solver_info defined as above
create_plots(outfile_stem, solver_info, tau_levels, budget)
This produces files demo_data1, demo_data3, demo_perf1 and demo_perf3 representing data and performance profiles for $\tau=10^{-1}$ and $\tau=10^{-3}$. In each case, we have image files and a csv containing the raw plot data. For the $\tau=10^{-1}$ case, the plots look like:
The create_plots function has several optional parameters: