SlipGURU Dipartimento di Informatica e Scienze dell'Informazione Università Degli Studi di Genova

Tools (l1l2py.tools)

This module contains useful functions to be used in combination with the main functions of the package.

The functions included in this module are divided in four groups:

Range generators

l1l2py.tools.linear_range(min_value, max_value, number)

Linear range of values between min_value and max_value.

Sequence of number evenly spaced values from min_value to max_value.

Parameters :

min_value : float

max_value : float

number : int

Returns :

range : (number, ) ndarray

Examples

>>> l1l2py.tools.linear_range(min_value=0.0, max_value=10.0, number=4)
array([  0.        ,   3.33333333,   6.66666667,  10.        ])
>>> l1l2py.tools.linear_range(min_value=0.0, max_value=10.0, number=2)
array([  0.,  10.])
>>> l1l2py.tools.linear_range(min_value=0.0, max_value=10.0, number=1)
array([ 0.])
>>> l1l2py.tools.linear_range(min_value=0.0, max_value=10.0, number=0)
array([], dtype=float64)
l1l2py.tools.geometric_range(min_value, max_value, number)

Geometric range of values between min_value and max_value.

Sequence of number values from min_value to max_value generated by a geometric sequence.

Parameters :

min_value : float

max_value : float

number : int

Returns :

range : (number, ) ndarray

Raises :

ZeroDivisionError :

If min_value is 0.0 or number is 1

Examples

>>> l1l2py.tools.geometric_range(min_value=0.0, max_value=10.0, number=4)
Traceback (most recent call last):
    ...
ZeroDivisionError: float division
>>> l1l2py.tools.geometric_range(min_value=0.1, max_value=10.0, number=4)
array([ 0.1       ,  0.46415888,  2.15443469, 10.        ])
>>> l1l2py.tools.geometric_range(min_value=0.1, max_value=10.0, number=2)
array([  0.1,  10. ])
>>> l1l2py.tools.geometric_range(min_value=0.1, max_value=10.0, number=1)
Traceback (most recent call last):
    ...
ZeroDivisionError: float division
>>> l1l2py.tools.geometric_range(min_value=0.1, max_value=10.0, number=0)
array([], dtype=float64)

Note

The geometric sequence of n elements between a and b is

a,\ ar^1,\ ar^2,\ \dots,\ ar^{n-1}

where the ratio r is

r = \left(\frac{b}{a}\right)^{\frac{1}{n-1}}

Data normalizers

l1l2py.tools.center(matrix, optional_matrix=None, return_mean=False)

Center columns of a matrix setting each column to zero mean.

The function returns the centered matrix given as input. Optionally centers an optional_matrix with respect to the mean value evaluated for matrix.

Note

A one dimensional matrix is considered as a column vector.

Parameters :

matrix : (N,) or (N, P) ndarray

Input matrix whose columns are to be centered.

optional_matrix : (N,) or (N, P) ndarray, optional (default is None)

Optional matrix whose columns are to be centered using mean of matrix. It must have the same number of columns as matrix.

return_mean : bool, optional (default is False)

If True returns mean of matrix.

Returns :

matrix_centered : (N,) or (N, P) ndarray

Centered matrix.

optional_matrix_centered : (N,) or (N, P) ndarray, optional

Centered optional_matrix with respect to matrix

mean : float or (P,) ndarray, optional

Mean of matrix columns.

Examples

>>> X = numpy.array([[1, 2, 3], [4, 5, 6]])
>>> l1l2py.tools.center(X)
array([[-1.5, -1.5, -1.5],
       [ 1.5,  1.5,  1.5]])
>>> l1l2py.tools.center(X, return_mean=True)
(array([[-1.5, -1.5, -1.5],
       [ 1.5,  1.5,  1.5]]), array([ 2.5,  3.5,  4.5]))
>>> x = numpy.array([[1, 2, 3]])             # 2-dimensional matrix
>>> l1l2py.tools.center(x, return_mean=True)
(array([[ 0.,  0.,  0.]]), array([ 1.,  2.,  3.]))
>>> x = numpy.array([1, 2, 3])               # 1-dimensional matrix
>>> l1l2py.tools.center(x, return_mean=True) # centered as a (3, 1) matrix
(array([-1.,  0.,  1.]), 2.0)
>>> l1l2py.tools.center(X, X[:,:2])
Traceback (most recent call last):
    ...
ValueError: shape mismatch: objects cannot be broadcast to a single shape
l1l2py.tools.standardize(matrix, optional_matrix=None, return_factors=False)

Standardize columns of a matrix setting each column with zero mean and unitary standard deviation.

The function returns the standardized matrix given as input. Optionally it standardizes an optional_matrix with respect to the mean and standard deviation evaluatted for matrix.

Note

A one dimensional matrix is considered as a column vector.

Parameters :

matrix : (N,) or (N, P) ndarray

Input matrix whose columns are to be standardized to mean 0 and standard deviation 1.

optional_matrix : (N,) or (N, P) ndarray, optional (default is None)

Optional matrix whose columns are to be standardized using mean and standard deviation of matrix. It must have same number of columns as matrix.

return_factors : bool, optional (default is False)

If True, returns mean and standard deviation of matrix.

Returns :

matrix_standardized : (N,) or (N, P) ndarray

Standardized matrix.

optional_matrix_standardized : (N,) or (N, P) ndarray, optional

Standardized optional_matrix with respect to matrix

mean : float or (P,) ndarray, optional

Mean of matrix columns.

std : float or (P,) ndarray, optional

Standard deviation of matrix columns.

Raises :

ValueError :

If matrix has only one row.

Examples

>>> X = numpy.array([[1, 2, 3], [4, 5, 6]])
>>> l1l2py.tools.standardize(X)
array([[-0.70710678, -0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678,  0.70710678]])
>>> l1l2py.tools.standardize(X, return_factors=True)
(array([[-0.70710678, -0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678,  0.70710678]]), array([ 2.5,  3.5,  4.5]), array([ 2.12132034,  2.12132034,  2.12132034]))
>>> x = numpy.array([[1, 2, 3]])                     # 1 row matrix
>>> l1l2py.tools.standardize(x, return_factors=True)
Traceback (most recent call last):
    ...
ValueError: 'matrix' must have more than one row
>>> x = numpy.array([1, 2, 3])                       # 1-dimensional matrix
>>> l1l2py.tools.standardize(x, return_factors=True) # standardized as a (3, 1) matrix
(array([-1.,  0.,  1.]), 2.0, 1.0)
>>> l1l2py.tools.center(X, X[:,:2])
Traceback (most recent call last):
    ...
ValueError: shape mismatch: objects cannot be broadcast to a single shape

Error functions

l1l2py.tools.regression_error(labels, predictions)

Returns regression error.

The regression error is the sum of the quadratic differences between the labels values and the predictions values, over the number of samples.

Parameters :

labels : array_like, shape (N,)

Regression labels.

predictions : array_like, shape (N,)

Regression labels predicted.

Returns :

error : float

Regression error calculated.

Note

The regression error is calculated using the formula

error = \frac{\sum_{i=1}^N{| l_i - p_i|^2}} {N}
    \qquad
    l_i \in\ labels,\, p_i \in\ predicted

l1l2py.tools.classification_error(labels, predictions)

Evaluate the binary classification error.

The classification error is based on the sign of the predictions values, with respect to the sign of the data labels.

The function assumes that labels contains positive values for one class and negative values for the other one.

Warning

For efficiency reasons, the values in labels are not checked by the function.

Parameters :

labels : array_like, shape (N,)

Classification labels (usually contains only 1s and -1s).

predictions : array_like, shape (N,)

Classification labels predicted.

Returns :

error : float

Classification error evaluated.

Examples

>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[1, 1, 1])
0.0
>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[1, 1, -1])
0.33333333333333331
>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[1, -1, -1])
0.66666666666666663
>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[-1, -1, -1])
1.0
>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[10, -2, -3])
0.66666666666666663

Note

The classification error is calculated using this formula

error = \frac{\sum_{i=1}^N{f(l_i, p_i)}}{N} \qquad
        l_i \in\ labels,\, p_i \in\ predictions,

where

f(l_i, p_i) =
\left\{
    \begin{array}{l l}
      1 & \quad \text{if $sign(l_i) \neq sign(p_i)$}\\
      0 & \quad \text{otherwise}\\
    \end{array}
\right.

Warning

The classification error is calculated using the numpy.sign function. Keep in mind that the sign(x) returns 0 if x==0.

l1l2py.tools.balanced_classification_error(labels, predictions, error_weights=None)

Returns the binary classification error balanced across the size of classes.

This function returns a balanced classification error. With the default value for error_weights, the function assigns greater weight to the errors belonging to the smaller class.

Parameters :

labels : array_like, shape (N,)

Classification labels (usually contains only 1s and -1s).

predictions : array_like, shape (N,)

Classification labels predicted.

error_weights : array_line, shape (N,), optional (default is None)

Classification error weigths. If None the default weights are calculated removing from each value in labels their mean value.

Returns :

error : float

Classification error calculated.

Examples

>>> l1l2py.tools.balanced_classification_error(labels=[1, 1, 1], predictions=[-1, -1, -1])
0.0
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[-1, 1, 1])
0.0
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[1, -1, -1])
0.88888888888888895
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[1, 1, 1])
0.44444444444444442
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[-1, 1, -1])
0.22222222222222224
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[-1, 1, -1],
...                                            error_weights=[1, 1, 1])
0.33333333333333331

Note

The balanced classification error is calculated using this formula:

error = \frac{\sum_{i=1}^N{w_i \cdot f(l_i, p_i)}}
              {N} \qquad l_i \in\ labels,\, p_i \in\ predictions,

where f(l_i, p_i) is as defined above.

With the default weigths the error function becomes:

error =
        \frac{\sum_{i=1}^N{|l_i - \overline{labels}| \cdot f(l_i, p_i)}}
              {N}
        \qquad
        l_i \in\ labels,\, p_i \in\ predicted

Warning

If labels contains only values belonging to one class, the functions returns always 0.0 because l_i - \overline{labels} = 0, than w_i=0 for each i.

Cross Validation utilities

l1l2py.tools.kfold_splits(labels, k, rseed=0)

k-fold cross validation splits.

Given a list of labels, the function produces a list of k splits. Each split is a pair of tuples containing the indexes of the training set and the indexes of the test set.

Parameters :

labels : array_like, shape (N,)

Data labels.

k : int, greater than 0

Number of splits.

rseed : int, optional (default is 0)

Random seed.

Returns :

splits : list of k tuples

Each tuple contains two lists with the training set and test set indexes.

Raises :

ValueError :

If k is less than 2 or greater than N.

Examples

>>> labels = range(10)
>>> l1l2py.tools.kfold_splits(labels, 2)
[([7, 1, 3, 6, 8], [9, 4, 0, 5, 2]), ([9, 4, 0, 5, 2], [7, 1, 3, 6, 8])]
>>> l1l2py.tools.kfold_splits(labels, 1)
Traceback (most recent call last):
    ...
ValueError: 'k' must be greater than one and smaller or equal than the number of samples
l1l2py.tools.stratified_kfold_splits(labels, k, rseed=0)

Sstratified k-fold cross validation splits.

This function is a variation of kfold_splits, which returns stratified splits. The divisions are made by preserving the percentage of samples for each class, assuming that the problem is binary.

Parameters :

labels : array_like, shape (N,)

Data labels (usually contains only 1s and -1s).

k : int, greater than 0

Number of splits.

rseed : int, optional (default is 0)

Random seed.

Returns :

splits : list of k tuples

Each tuple contains two lists with the training set and test set indexes.

Raises :

ValueError :

If labels contains more than two classes labels.

ValueError :

If k is less than 2 or greater than number of positive or negative samples in labels.

Examples

>>> labels = range(10)
>>> l1l2py.tools.stratified_kfold_splits(labels, 2)
Traceback (most recent call last):
    ...
ValueError: 'labels' must contains only two class labels
>>> labels = [1, 1, 1, 1, 1, 1, -1, -1, -1, -1]
>>> l1l2py.tools.stratified_kfold_splits(labels, 2)
[([8, 9, 5, 2, 1], [7, 6, 3, 0, 4]), ([7, 6, 3, 0, 4], [8, 9, 5, 2, 1])]
>>> l1l2py.tools.stratified_kfold_splits(labels, 1)
Traceback (most recent call last):
    ...
ValueError: 'k' must be greater than one and smaller or equal than number of positive and negative samples

Note

Running this functions more times with the same value of the parameter rseed gives always the same result, in order to allow repeatable experiments. Note, moreover, that each of this functions sets the random seed equal to None, to restore a random seed for the following use of the random module (see random.seed).

Table Of Contents