Custom objective terms

ProximalAlgorithms relies on the first-order primitives defined in ProximalCore. While a rich library of function types, implementing such primitives, is provided by ProximalOperators, one may need to formulate problems using custom objective terms. When that is the case, one only needs to implement the right first-order primitive, $\nabla f$ or $\operatorname{prox}_{\gamma f}$ or both, for algorithms to be able to work with $f$.

Defining the proximal mapping for a custom function type requires adding a method for ProximalCore.prox!.

To compute gradients, ProximalAlgorithms provides a fallback definition for ProximalCore.gradient!, relying on Zygote to use automatic differentiation. Therefore, you can provide any (differentiable) Julia function wherever gradients need to be taken, and everything will work out of the box.

If however one would like to provide their own gradient implementation (e.g. for efficiency reasons), they can simply implement a method for ProximalCore.gradient!.

ProximalCore.proxFunction
prox(f, x, gamma=1)

Proximal mapping for f, evaluated at x, with stepsize gamma.

The proximal mapping is defined as

\[\mathrm{prox}_{\gamma f}(x) = \arg\min_z \left\{ f(z) + \tfrac{1}{2\gamma}\|z-x\|^2 \right\}.\]

Returns a tuple (y, fy) consisting of

  • y: the output of the proximal mapping of f at x with stepsize gamma
  • fy: the value of f at y

See also: prox!.

ProximalCore.prox!Function
prox!(y, f, x, gamma=1)

In-place proximal mapping for f, evaluated at x, with stepsize gamma.

The proximal mapping is defined as

\[\mathrm{prox}_{\gamma f}(x) = \arg\min_z \left\{ f(z) + \tfrac{1}{2\gamma}\|z-x\|^2 \right\}.\]

The result is written to the (pre-allocated) array y, which should have the same shape/size as x.

Returns the value of f at y.

See also: prox.

ProximalCore.gradientFunction
gradient(f, x)

Gradient (and value) of f at x.

Return a tuple (y, fx) consisting of

  • y: the gradient of f at x
  • fx: the value of f at x

See also: gradient!.

ProximalCore.gradient!Function
gradient!(y, f, x)

In-place gradient (and value) of f at x.

The gradient is written to the (pre-allocated) array y, which should have the same shape/size as x.

Returns the value f at x.

See also: gradient.

Example: constrained Rosenbrock

Let's try to minimize the celebrated Rosenbrock function, but constrained to the unit norm ball. The cost function is

rosenbrock2D(x) = 100 * (x[2] - x[1]^2)^2 + (1 - x[1])^2
rosenbrock2D (generic function with 1 method)

To enforce the constraint, we define the indicator of the unit ball, together with its proximal mapping: this is simply projection onto the unit norm ball, so it is sufficient to normalize any given point that lies outside of the set.

using LinearAlgebra
using ProximalCore

struct IndUnitBall end

(::IndUnitBall)(x) = norm(x) > 1 ? eltype(x)(Inf) : eltype(x)(0)

function ProximalCore.prox!(y, ::IndUnitBall, x, gamma)
    if norm(x) > 1
        y .= x ./ norm(x)
    else
        y .= x
    end
    return zero(eltype(x))
end

We can now minimize the function, for which we will use PANOC, which is a Newton-type method:

using ProximalAlgorithms

panoc = ProximalAlgorithms.PANOC()
solution, iterations = panoc(x0=-ones(2), f=rosenbrock2D, g=IndUnitBall())
([0.7864151541667157, 0.6176983125255732], 36)

Plotting the solution against the cost function contour and constraint, gives an idea of its correctness.

using Plots

contour(-2:0.1:2, -2:0.1:2, (x,y) -> rosenbrock2D([x, y]), fill=true, framestyle=:none, background=nothing)
plot!(Shape(cos.(0:0.01:2*pi), sin.(0:0.01:2*pi)), opacity=.5, label="feasible set")
scatter!([solution[1]], [solution[2]], color=:red, markershape=:star5, label="computed solution")

Example: counting operations

It is often interesting to measure how many operations (gradient- or prox-evaluation) an algorithm is taking. In fact, in algorithms involving backtracking or some other line-search logic, the iteration count may not be entirely representative of the amount of operations are being performed; or maybe some specific implementations require additional operations to be performed when checking stopping conditions. All of this makes it difficult to quantify the exact iteration complexity.

We can achieve this by wrapping functions in a dedicated Counting type:

mutable struct Counting{T}
    f::T
    gradient_count::Int
    prox_count::Int
end

Counting(f::T) where T = Counting{T}(f, 0, 0)
Main.Counting

Now we only need to intercept any call to gradient! and prox! and increase counters there:

function ProximalCore.gradient!(y, f::Counting, x)
    f.gradient_count += 1
    return ProximalCore.gradient!(y, f.f, x)
end

function ProximalCore.prox!(y, f::Counting, x, gamma)
    f.prox_count += 1
    return ProximalCore.prox!(y, f.f, x, gamma)
end

We can run again the previous example, this time wrapping the objective terms within Counting:

f = Counting(rosenbrock2D)
g = Counting(IndUnitBall())

solution, iterations = panoc(x0=-ones(2), f=f, g=g)
([0.7864151541667157, 0.6176983125255732], 36)

and check how many operations where actually performed:

println(f.gradient_count)
println(g.prox_count)
115
79

This page was generated using Literate.jl.