SlipGURU Dipartimento di Informatica e Scienze dell'Informazione Università Degli Studi di Genova

Algorithms (l1l2py.algorithms)

In order to describe the function implemented in this module, we have to assume some notation.

Assuming to have a centered data matrix \mathbf{X} \in \mathbb{R}^{n \times p} and a column vector of regression values \mathbf{Y} \in \mathbb{R}^n or binary labels \mathbf{Y} \in \{-1, 1\}^n, we want to minimize the regression/classification error solving a Regularized Least Square (RLS) problem.

In this module two main algorithms are implemented. The first one solves a classical RLS problem with a penalty on the \ell_2\text{-norm} of the vector \boldsymbol{\beta} (also called ridge_regression)

(1)\boldsymbol{\beta^*} =
    \argmin_{\boldsymbol{\beta}}
        \Big\{
        \frac{1}{n} \| \mathbf{Y} - \mathbf{X}\boldsymbol{\beta} \|_2^2
        + \mu \|\boldsymbol{\beta}\|_2^2
        \Big\},

with \mu > 0.

The second one minimizes a functional with a linear combination of two penalties on the \ell_1\text{-norm} and \ell_2\text{-norm} of the vector \boldsymbol{\beta} (also called l1l2_regularization)

(2)\boldsymbol{\beta^*} =
    \argmin_{\boldsymbol{\beta}}
        \Big\{
        \frac{1}{n} \| \mathbf{Y} - \mathbf{X}\boldsymbol{\beta} \|_2^2
        + \mu \|\boldsymbol{\beta}\|_2^2
        + \tau \|\boldsymbol{\beta}\|_1
        \Big\},

with \mu > 0 and \tau > 0.

Implementation details

While (1) has closed-form solution, for (2) there are many different approaches. In this module we provide an Iterative Shrinkage-Thresholding Algorithm (ISTA) proposed in [DeMol09a] exploiting a faster variation (called FISTA) proposed in [Beck09].

Starting from a null vector \boldsymbol{\beta}, each step updates the value of \boldsymbol{\beta} until convergence:

\boldsymbol{\beta}^{(k+1)} =
    \mathbf{S}_{\frac{\tau}{\sigma}} (
                    (1 - \frac{\mu}{\sigma})\boldsymbol{\beta}^k +
                    \frac{1}{n\sigma}
                        \mathbf{X^T}[\mathbf{Y} -
                                     \mathbf{X}\boldsymbol{\beta}^k]
                )

where, \mathbf{S}_{\gamma > 0} is the soft-thresholding function

\mathbf{S}_{\gamma}(x) = \text{sign}(x) \text{max}(0, | x | - \gamma/2)

The constant \sigma is a (theorically optimal) step size which depends on the data:

\sigma = \frac{e}{n} + \mu,

where e is the maximum eigenvalue of the matrix \mathbf{X^T}\mathbf{X}.

The convergence is reached when for each j \in \{0,\dots,d-1\}:

| \beta_j^k - \beta_j^{k-1} | \leq | \beta_j^k | * (tol/k),

where tol > 0 and before k reaches a fixed maximum number of iterations.

Regularization Algorithms

l1l2py.algorithms.ridge_regression(data, labels, mu=0.0)

Implementation of the Regularized Least Squares solver.

It solves the ridge regression problem with parameter mu on the l2-norm.

Parameters :

data : (N, P) ndarray

Data matrix.

labels : (N,) or (N, 1) ndarray

Labels vector.

mu : float, optional (default is 0.0)

l2-norm penalty.

Returns :

beta : (P, 1) ndarray

Ridge regression solution.

Examples

>>> X = numpy.array([[0.1, 1.1, 0.3], [0.2, 1.2, 1.6], [0.3, 1.3, -0.6]])
>>> beta = numpy.array([0.1, 0.1, 0.0])
>>> Y = numpy.dot(X, beta)
>>> beta = l1l2py.algorithms.ridge_regression(X, Y, 1e3).T
>>> len(numpy.flatnonzero(beta))
3
l1l2py.algorithms.l1l2_regularization(data, labels, mu, tau, beta=None, kmax=100000, tolerance=1e-05, return_iterations=False, adaptive=False)

Implementation of the Fast Iterative Shrinkage-Thresholding Algorithm to solve a least squares problem with l1l2 penalty.

It solves the l1l2 regularization problem with parameter mu on the l2-norm and parameter tau on the l1-norm.

Parameters :

data : (N, P) ndarray

Data matrix.

labels : (N,) or (N, 1) ndarray

Labels vector.

mu : float

l2-norm penalty.

tau : float

l1-norm penalty.

beta : (P,) or (P, 1) ndarray, optional (default is None)

Starting value for the iterations. If None, then iterations starts from the empty model.

kmax : int, optional (default is 1e5)

Maximum number of iterations.

tolerance : float, optional (default is 1e-5)

Convergence tolerance.

return_iterations : bool, optional (default is False)

If True, returns the number of iterations performed. The algorithm has a predefined minimum number of iterations equal to 10.

adaptive : bool, optional (default is False)

If True, minimization is performed calculating an adaptive step size for each iteration.

Returns :

beta : (P, 1) ndarray

l1l2 solution.

k : int, optional

Number of iterations performed.

Examples

>>> X = numpy.array([[0.1, 1.1, 0.3], [0.2, 1.2, 1.6], [0.3, 1.3, -0.6]])
>>> beta = numpy.array([0.1, 0.1, 0.0])
>>> Y = numpy.dot(X, beta)
>>> beta = l1l2py.algorithms.l1l2_regularization(X, Y, 0.1, 0.1)
>>> len(numpy.flatnonzero(beta))
1

Utility Functions

l1l2py.algorithms.l1_bound(data, labels)

Estimation of an useful maximum bound for the l1 penalty term.

Fixing mu close to 0.0 and using the maximum value calculated with this function as tau in the l1l2 regularization, the solution vector contains only zero elements.

For each value of tau smaller than the maximum bound the solution vector contains at least one non zero element.

Parameters :

data : (N, P) ndarray

Data matrix.

labels : (N,) or (N, 1) ndarray

Labels vector.

Returns :

tau_max : float

Maximum tau.

Examples

>>> X = numpy.array([[0.1, 1.1, 0.3], [0.2, 1.2, 1.6], [0.3, 1.3, -0.6]])
>>> beta = numpy.array([0.1, 0.1, 0.0])
>>> Y = numpy.dot(X, beta)
>>> tau_max = l1l2py.algorithms.l1_bound(X, Y)
>>> l1l2py.algorithms.l1l2_regularization(X, Y, 0.0, tau_max).T
array([[ 0.,  0.,  0.]])
>>> beta = l1l2py.algorithms.l1l2_regularization(X, Y, 0.0, tau_max - 1e-5)
>>> len(numpy.flatnonzero(beta))
1
l1l2py.algorithms.l1l2_path(data, labels, mu, tau_range, beta=None, kmax=100000, tolerance=1e-05, adaptive=False)

Efficient solution of different l1l2 regularization problems on increasing values of the l1-norm parameter.

Finds the l1l2 regularization path for each value in tau_range and fixed value of mu.

The values in tau_range are used during the computation in reverse order, while the output path has the same ordering of the tau values.

Note

For efficency purposes, if mu = 0.0 and the number of non-zero values is higher than N for a given value of tau (that means algorithm has reached the limit of allowed iterations), the following solutions (for smaller values of tau) are simply the least squares solutions.

Warning

The number of solutions can differ from len(tau_range). The function returns only the solutions with at least one non-zero element. For values higher than tau_max a solution have all zero values.

Parameters :

data : (N, P) ndarray

Data matrix.

labels : (N,) or (N, 1) ndarray

Labels vector.

mu : float

l2-norm penalty.

tau_range : array_like of float

l1-norm penalties in increasing order.

beta : (P,) or (P, 1) ndarray, optional (default is None)

Starting value of the iterations. If None, then iterations starts from the empty model.

kmax : int, optional (default is 1e5)

Maximum number of iterations.

tolerance : float, optional (default is 1e-5)

Convergence tolerance.

adaptive : bool, optional (default is False)

If True, minimization is performed calculating an adaptive step size for each iteration.

Returns :

beta_path : list of (P,) or (P, 1) ndarrays

l1l2 solutions with at least one non-zero element.

Note

The acceleration method based on warm starts, implemented in this function, is been theoretically proved in [Hale08].

[Hale08]E. T. Hale, W. Yin, Y. Zhang “Fixed-point continuation for \ell_1-minimization: Methodology and convergence” SIAM J. Optim. Volume 19, Issue 3, pp. 1107-1130, 2008

Table Of Contents