Getting started
The methods implemented in ProximalAlgorithms are commonly referred to as (you've guessed it) proximal algorithms, in that they rely on the proximal operator (or mapping) to deal with non-differentiable terms in the objective. Loosely speaking, the algorithms in this package can be used to solve problems of the form
\[\operatorname*{minimize}_x\ \sum_{i=1}^N f_i(x)\]
where $N$ depends on the specific algorithm, together with specific assumptions on the terms $f_i$ (like smoothness, convexity, strong convexity). The problem above is solved by iteratively accessing specific first order information on the terms $f_i$, like their gradient $\nabla f_i$ or their proximal mapping $\operatorname{prox}_{f_i}$:
\[\mathrm{prox}_{\gamma f_i}(x) = \arg\min_z \left\{ f_i(z) + \tfrac{1}{2\gamma}\|z-x\|^2 \right\}\]
The literature on proximal operators and algorithms is vast: for an overview, one can refer to [1], [2].
To evaluate these first-order primitives, in ProximalAlgorithms:
- $\nabla f_i$ falls back to using automatic differentiation (as provided by DifferentiationInterface.jl and all of its backends).
- $\operatorname{prox}_{f_i}$ relies on the intereface of ProximalOperators (>= 0.15).
Both of the above can be implemented for custom function types, as documented here.
Each of the implemented algorithms assumes a different structure of the objective to be optimized (e.g., a specific number $N$ of terms), with specific assumptions (e.g., smoothness, convexity). Furthermore, multiple algorithms can often be applied to the same problem (possibly through appropriate reformulation) and are expected to perform differently; sometimes, even the same algorithm can be applied in multiple ways to the same problem, by grouping (splitting) the terms in different ways.
Because of these reasons, ProximalAlgorithms does not offer a modeling language to automagically minimize any objective: rather, the user is expected to formulate their problem by providing the right objective terms to the algorithm of choice. Please refer to the this section of the manual for information on what terms can be provided and under which assumptions.
Interface to algorithms
At a high level, using algorithms from ProximalAlgorithms amounts to the following.
- Instantiate the algorithm, with options like the termination tolerance, verbosity level, or other algorithm-specific parameters.
- Call the algorithm on the problem description: this amounts to the initial point, the objective terms, and possibly additional required information (e.g. Lipschitz constants).
See here for the list of available algorithm constructors, for different types of problems. In general however, algorithms are instances of the IterativeAlgorithm
type.
Example: box constrained quadratic
As a simple example, consider the minimization of a 2D quadratic function subject to box constraints, which we will solve using the fast proximal gradient method (also known as fast forward-backward splitting):
using LinearAlgebra
using Zygote
using DifferentiationInterface: AutoZygote
using ProximalOperators
using ProximalAlgorithms
quadratic_cost = ProximalAlgorithms.AutoDifferentiable(
x -> dot([3.4 1.2; 1.2 4.5] * x, x) / 2 + dot([-2.3, 9.9], x),
AutoZygote(),
)
box_indicator = ProximalOperators.IndBox(0, 1)
ffb = ProximalAlgorithms.FastForwardBackward(maxit = 1000, tol = 1e-5, verbose = true)
ProximalAlgorithms.IterativeAlgorithm{ProximalAlgorithms.FastForwardBackwardIteration, ProximalAlgorithms.var"#30#32"{Float64}, typeof(ProximalAlgorithms.default_solution), typeof(ProximalAlgorithms.default_display), Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}}(1000, ProximalAlgorithms.var"#30#32"{Float64}(1.0e-5), ProximalAlgorithms.default_solution, true, 100, ProximalAlgorithms.default_display, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}())
Here, we defined the cost function quadratic_cost
, and the constraint indicator box_indicator
. Then we set up the optimization algorithm of choice, FastForwardBackward
, with options for the maximum number of iterations, termination tolerance, verbosity. Finally, we run the algorithm by providing an initial point and the objective terms defining the problem:
solution, iterations = ffb(x0 = ones(2), f = quadratic_cost, g = box_indicator)
([0.6764689822083225, 0.0], 31)
We can verify the correctness of the solution by checking that the negative gradient is orthogonal to the constraints, pointing outwards: for this, we just evaluate the second output of value_and_gradient
.
last(ProximalAlgorithms.value_and_gradient(quadratic_cost, solution))
2-element Vector{Float64}:
-5.460491703246362e-6
10.711762778649987
Or by plotting the solution against the cost function and constraint:
using Plots
contour(
-1:0.1:2,
-1:0.1:2,
(x, y) -> quadratic_cost([x, y]),
fill = true,
framestyle = :none,
background = nothing,
)
plot!(Shape([0, 1, 1, 0], [0, 0, 1, 1]), opacity = 0.5, label = "feasible set")
scatter!(
[solution[1]],
[solution[2]],
color = :red,
markershape = :star5,
label = "computed solution",
)
Iterator interface
Under the hood, algorithms are implemented in the form of standard Julia iterators: constructing such iterator objects directly, and looping over them, allows for more fine-grained control over the termination condition, or what information from the iterations get logged.
Each iterator is constructed with the full problem description (objective terms and, if needed, additional information like Lipschitz constats) and algorithm options (usually step sizes, and any other parameter or option of the algorithm), and produces the sequence of states of the algorithm, so that one can do (almost) anything with it.
Iterators only implement the algorithm iteration logic, and not additional details like stopping criteria. As such, iterators usually yield an infinite sequence of states: when looping over them, be careful to properly guard the loop with a stopping criterion.
To save on allocations, most (if not all) algorithms re-use state objects when iterating, by updating the state in place instead of creating a new one. For this reason:
- one should not mutate the state object in any way, as this may corrupt the algorithm's logic;
- one should not
collect
the sequence of states, since this will result in an array of identical objects.
Iterator types are named after the algorithm they implement, so the relationship should be obvious:
- the
ForwardBackward
algorithm uses theForwardBackwardIteration
iterator type; - the
FastForwardBackward
algorithm uses theFastForwardBackwardIteration
iterator type; - the
DouglasRachford
algorithm uses theDouglasRachfordIteration
iterator type;
and so on.
Let's see what this means in terms of the previous example.
Example: box constrained quadratic (cont)
Let's solve the problem from the previous example by directly interacting with the underlying iterator: the FastForwardBackward
algorithm internally uses a FastForwardBackwardIteration
object.
ffbiter = ProximalAlgorithms.FastForwardBackwardIteration(
x0 = ones(2),
f = quadratic_cost,
g = box_indicator,
)
ProximalAlgorithms.FastForwardBackwardIteration{Float64, Vector{Float64}, ProximalAlgorithms.AutoDifferentiable{Main.var"#1#2", ADTypes.AutoZygote}, ProximalOperators.IndBox{Int64, Int64}, Nothing, Nothing, Nothing}(ProximalAlgorithms.AutoDifferentiable{Main.var"#1#2", ADTypes.AutoZygote}(Main.var"#1#2"(), ADTypes.AutoZygote()), ProximalOperators.IndBox{Int64, Int64}(0, 1), [1.0, 1.0], 0.0, nothing, nothing, true, 1.0e-7, 0.5, 1.0, nothing)
We can now perform anything we want throughout the iteration, by just looping over the iterator: for example, we can store the sequence of iterates from the algorithm, to later plot them, and stop whenever two successive iterates are closer than a given tolerance.
xs = []
for state in ffbiter
push!(xs, copy(state.x))
if length(xs) > 1 && norm(xs[end] - xs[end-1]) / (1 + norm(xs[end])) <= 1e-5
break
end
end
contour(
-1:0.1:2,
-1:0.1:2,
(x, y) -> quadratic_cost([x, y]),
fill = true,
framestyle = :none,
background = nothing,
)
plot!(Shape([0, 1, 1, 0], [0, 0, 1, 1]), opacity = 0.5, label = "feasible set")
plot!(
[x[1] for x in xs],
[x[2] for x in xs],
markershape = :circle,
label = "algorithm trajectory",
)
scatter!(
[solution[1]],
[solution[2]],
color = :red,
markershape = :star5,
label = "computed solution",
)
Since each algorithm iterator type has its own logic, it will also have its own dedicated state structure. Interacting with the state then requires being familiar with its structure, and with the nature of its attributes.
This page was generated using Literate.jl.