Getting started

The methods implemented in ProximalAlgorithms are commonly referred to as (you've guessed it) proximal algorithms, in that they rely on the proximal operator (or mapping) to deal with non-differentiable terms in the objective. Loosely speaking, the algorithms in this package can be used to solve problems of the form

\[\operatorname*{minimize}_x\ \sum_{i=1}^N f_i(x)\]

where $N$ depends on the specific algorithm, together with specific assumptions on the terms $f_i$ (like smoothness, convexity, strong convexity). The problem above is solved by iteratively accessing specific first order information on the terms $f_i$, like their gradient $\nabla f_i$ or their proximal mapping $\operatorname{prox}_{f_i}$:

\[\mathrm{prox}_{\gamma f_i}(x) = \arg\min_z \left\{ f_i(z) + \tfrac{1}{2\gamma}\|z-x\|^2 \right\}\]

The literature on proximal operators and algorithms is vast: for an overview, one can refer to [1], [2].

To evaluate these first-order primitives, in ProximalAlgorithms:

$\nabla f_i$ falls back to using automatic differentiation (as provided by DifferentiationInterface.jl and all of its backends).
$\operatorname{prox}_{f_i}$ relies on the intereface of ProximalOperators (>= 0.15).

Both of the above can be implemented for custom function types, as documented here.

Note

Each of the implemented algorithms assumes a different structure of the objective to be optimized (e.g., a specific number $N$ of terms), with specific assumptions (e.g., smoothness, convexity). Furthermore, multiple algorithms can often be applied to the same problem (possibly through appropriate reformulation) and are expected to perform differently; sometimes, even the same algorithm can be applied in multiple ways to the same problem, by grouping (splitting) the terms in different ways.

Because of these reasons, ProximalAlgorithms does not offer a modeling language to automagically minimize any objective: rather, the user is expected to formulate their problem by providing the right objective terms to the algorithm of choice. Please refer to the this section of the manual for information on what terms can be provided and under which assumptions.

Interface to algorithms

At a high level, using algorithms from ProximalAlgorithms amounts to the following.

Instantiate the algorithm, with options like the termination tolerance, verbosity level, or other algorithm-specific parameters.
Call the algorithm on the problem description: this amounts to the initial point, the objective terms, and possibly additional required information (e.g. Lipschitz constants).

See here for the list of available algorithm constructors, for different types of problems. In general however, algorithms are instances of the IterativeAlgorithm type.

Example: box constrained quadratic

As a simple example, consider the minimization of a 2D quadratic function subject to box constraints, which we will solve using the fast proximal gradient method (also known as fast forward-backward splitting):

using LinearAlgebra
using Zygote
using DifferentiationInterface: AutoZygote
using ProximalOperators
using ProximalAlgorithms

quadratic_cost = ProximalAlgorithms.AutoDifferentiable(
    x -> dot([3.4 1.2; 1.2 4.5] * x, x) / 2 + dot([-2.3, 9.9], x),
    AutoZygote(),
)
box_indicator = ProximalOperators.IndBox(0, 1)

ffb = ProximalAlgorithms.FastForwardBackward(maxit = 1000, tol = 1e-5, verbose = true)

ProximalAlgorithms.IterativeAlgorithm{ProximalAlgorithms.FastForwardBackwardIteration, ProximalAlgorithms.var"#30#32"{Float64}, typeof(ProximalAlgorithms.default_solution), typeof(ProximalAlgorithms.default_display), Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}}(1000, ProximalAlgorithms.var"#30#32"{Float64}(1.0e-5), ProximalAlgorithms.default_solution, true, 100, ProximalAlgorithms.default_display, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}())

Here, we defined the cost function quadratic_cost, and the constraint indicator box_indicator. Then we set up the optimization algorithm of choice, FastForwardBackward, with options for the maximum number of iterations, termination tolerance, verbosity. Finally, we run the algorithm by providing an initial point and the objective terms defining the problem:

solution, iterations = ffb(x0 = ones(2), f = quadratic_cost, g = box_indicator)

([0.6764689822083225, 0.0], 31)

We can verify the correctness of the solution by checking that the negative gradient is orthogonal to the constraints, pointing outwards: for this, we just evaluate the second output of value_and_gradient.

last(ProximalAlgorithms.value_and_gradient(quadratic_cost, solution))

2-element Vector{Float64}:
 -5.460491703246362e-6
 10.711762778649987

Or by plotting the solution against the cost function and constraint:

using Plots

contour(
    -1:0.1:2,
    -1:0.1:2,
    (x, y) -> quadratic_cost([x, y]),
    fill = true,
    framestyle = :none,
    background = nothing,
)
plot!(Shape([0, 1, 1, 0], [0, 0, 1, 1]), opacity = 0.5, label = "feasible set")
scatter!(
    [solution[1]],
    [solution[2]],
    color = :red,
    markershape = :star5,
    label = "computed solution",
)

Iterator interface

Under the hood, algorithms are implemented in the form of standard Julia iterators: constructing such iterator objects directly, and looping over them, allows for more fine-grained control over the termination condition, or what information from the iterations get logged.

Each iterator is constructed with the full problem description (objective terms and, if needed, additional information like Lipschitz constats) and algorithm options (usually step sizes, and any other parameter or option of the algorithm), and produces the sequence of states of the algorithm, so that one can do (almost) anything with it.

Note

Iterators only implement the algorithm iteration logic, and not additional details like stopping criteria. As such, iterators usually yield an infinite sequence of states: when looping over them, be careful to properly guard the loop with a stopping criterion.

Warning

To save on allocations, most (if not all) algorithms re-use state objects when iterating, by updating the state in place instead of creating a new one. For this reason:

one should not mutate the state object in any way, as this may corrupt the algorithm's logic;
one should not collect the sequence of states, since this will result in an array of identical objects.

Iterator types are named after the algorithm they implement, so the relationship should be obvious:

the ForwardBackward algorithm uses the ForwardBackwardIteration iterator type;
the FastForwardBackward algorithm uses the FastForwardBackwardIteration iterator type;
the DouglasRachford algorithm uses the DouglasRachfordIteration iterator type;

and so on.

Let's see what this means in terms of the previous example.

Example: box constrained quadratic (cont)

Let's solve the problem from the previous example by directly interacting with the underlying iterator: the FastForwardBackward algorithm internally uses a FastForwardBackwardIteration object.

ffbiter = ProximalAlgorithms.FastForwardBackwardIteration(
    x0 = ones(2),
    f = quadratic_cost,
    g = box_indicator,
)

ProximalAlgorithms.FastForwardBackwardIteration{Float64, Vector{Float64}, ProximalAlgorithms.AutoDifferentiable{Main.var"#1#2", ADTypes.AutoZygote}, ProximalOperators.IndBox{Int64, Int64}, Nothing, Nothing, Nothing}(ProximalAlgorithms.AutoDifferentiable{Main.var"#1#2", ADTypes.AutoZygote}(Main.var"#1#2"(), ADTypes.AutoZygote()), ProximalOperators.IndBox{Int64, Int64}(0, 1), [1.0, 1.0], 0.0, nothing, nothing, true, 1.0e-7, 0.5, 1.0, nothing)

We can now perform anything we want throughout the iteration, by just looping over the iterator: for example, we can store the sequence of iterates from the algorithm, to later plot them, and stop whenever two successive iterates are closer than a given tolerance.

xs = []
for state in ffbiter
    push!(xs, copy(state.x))
    if length(xs) > 1 && norm(xs[end] - xs[end-1]) / (1 + norm(xs[end])) <= 1e-5
        break
    end
end

contour(
    -1:0.1:2,
    -1:0.1:2,
    (x, y) -> quadratic_cost([x, y]),
    fill = true,
    framestyle = :none,
    background = nothing,
)
plot!(Shape([0, 1, 1, 0], [0, 0, 1, 1]), opacity = 0.5, label = "feasible set")
plot!(
    [x[1] for x in xs],
    [x[2] for x in xs],
    markershape = :circle,
    label = "algorithm trajectory",
)
scatter!(
    [solution[1]],
    [solution[2]],
    color = :red,
    markershape = :star5,
    label = "computed solution",
)

Note

Since each algorithm iterator type has its own logic, it will also have its own dedicated state structure. Interacting with the state then requires being familiar with its structure, and with the nature of its attributes.

This page was generated using Literate.jl.