This module contains useful functions to be used in combination with the main functions of the package.
The functions included in this module are divided in four groups:
Linear range of values between min_value and max_value.
Sequence of number evenly spaced values from min_value to max_value.
| Parameters : | min_value : float max_value : float number : int |
|---|---|
| Returns : | range : (number, ) ndarray |
Examples
>>> l1l2py.tools.linear_range(min_value=0.0, max_value=10.0, number=4)
array([ 0. , 3.33333333, 6.66666667, 10. ])
>>> l1l2py.tools.linear_range(min_value=0.0, max_value=10.0, number=2)
array([ 0., 10.])
>>> l1l2py.tools.linear_range(min_value=0.0, max_value=10.0, number=1)
array([ 0.])
>>> l1l2py.tools.linear_range(min_value=0.0, max_value=10.0, number=0)
array([], dtype=float64)
Geometric range of values between min_value and max_value.
Sequence of number values from min_value to max_value generated by a geometric sequence.
| Parameters : | min_value : float max_value : float number : int |
|---|---|
| Returns : | range : (number, ) ndarray |
| Raises : | ZeroDivisionError :
|
Examples
>>> l1l2py.tools.geometric_range(min_value=0.0, max_value=10.0, number=4)
Traceback (most recent call last):
...
ZeroDivisionError: float division
>>> l1l2py.tools.geometric_range(min_value=0.1, max_value=10.0, number=4)
array([ 0.1 , 0.46415888, 2.15443469, 10. ])
>>> l1l2py.tools.geometric_range(min_value=0.1, max_value=10.0, number=2)
array([ 0.1, 10. ])
>>> l1l2py.tools.geometric_range(min_value=0.1, max_value=10.0, number=1)
Traceback (most recent call last):
...
ZeroDivisionError: float division
>>> l1l2py.tools.geometric_range(min_value=0.1, max_value=10.0, number=0)
array([], dtype=float64)
Note
The geometric sequence of
elements
between
and
is

where the ratio
is

Center columns of a matrix setting each column to zero mean.
The function returns the centered matrix given as input. Optionally centers an optional_matrix with respect to the mean value evaluated for matrix.
Note
A one dimensional matrix is considered as a column vector.
| Parameters : | matrix : (N,) or (N, P) ndarray
optional_matrix : (N,) or (N, P) ndarray, optional (default is None)
return_mean : bool, optional (default is False)
|
|---|---|
| Returns : | matrix_centered : (N,) or (N, P) ndarray
optional_matrix_centered : (N,) or (N, P) ndarray, optional
mean : float or (P,) ndarray, optional
|
Examples
>>> X = numpy.array([[1, 2, 3], [4, 5, 6]])
>>> l1l2py.tools.center(X)
array([[-1.5, -1.5, -1.5],
[ 1.5, 1.5, 1.5]])
>>> l1l2py.tools.center(X, return_mean=True)
(array([[-1.5, -1.5, -1.5],
[ 1.5, 1.5, 1.5]]), array([ 2.5, 3.5, 4.5]))
>>> x = numpy.array([[1, 2, 3]]) # 2-dimensional matrix
>>> l1l2py.tools.center(x, return_mean=True)
(array([[ 0., 0., 0.]]), array([ 1., 2., 3.]))
>>> x = numpy.array([1, 2, 3]) # 1-dimensional matrix
>>> l1l2py.tools.center(x, return_mean=True) # centered as a (3, 1) matrix
(array([-1., 0., 1.]), 2.0)
>>> l1l2py.tools.center(X, X[:,:2])
Traceback (most recent call last):
...
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Standardize columns of a matrix setting each column with zero mean and unitary standard deviation.
The function returns the standardized matrix given as input. Optionally it standardizes an optional_matrix with respect to the mean and standard deviation evaluatted for matrix.
Note
A one dimensional matrix is considered as a column vector.
| Parameters : | matrix : (N,) or (N, P) ndarray
optional_matrix : (N,) or (N, P) ndarray, optional (default is None)
return_factors : bool, optional (default is False)
|
|---|---|
| Returns : | matrix_standardized : (N,) or (N, P) ndarray
optional_matrix_standardized : (N,) or (N, P) ndarray, optional
mean : float or (P,) ndarray, optional
std : float or (P,) ndarray, optional
|
| Raises : | ValueError :
|
Examples
>>> X = numpy.array([[1, 2, 3], [4, 5, 6]])
>>> l1l2py.tools.standardize(X)
array([[-0.70710678, -0.70710678, -0.70710678],
[ 0.70710678, 0.70710678, 0.70710678]])
>>> l1l2py.tools.standardize(X, return_factors=True)
(array([[-0.70710678, -0.70710678, -0.70710678],
[ 0.70710678, 0.70710678, 0.70710678]]), array([ 2.5, 3.5, 4.5]), array([ 2.12132034, 2.12132034, 2.12132034]))
>>> x = numpy.array([[1, 2, 3]]) # 1 row matrix
>>> l1l2py.tools.standardize(x, return_factors=True)
Traceback (most recent call last):
...
ValueError: 'matrix' must have more than one row
>>> x = numpy.array([1, 2, 3]) # 1-dimensional matrix
>>> l1l2py.tools.standardize(x, return_factors=True) # standardized as a (3, 1) matrix
(array([-1., 0., 1.]), 2.0, 1.0)
>>> l1l2py.tools.center(X, X[:,:2])
Traceback (most recent call last):
...
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Returns regression error.
The regression error is the sum of the quadratic differences between the labels values and the predictions values, over the number of samples.
| Parameters : | labels : array_like, shape (N,)
predictions : array_like, shape (N,)
|
|---|---|
| Returns : | error : float
|
Note
The regression error is calculated using the formula

Evaluate the binary classification error.
The classification error is based on the sign of the predictions values, with respect to the sign of the data labels.
The function assumes that labels contains positive values for one class and negative values for the other one.
Warning
For efficiency reasons, the values in labels are not checked by the function.
| Parameters : | labels : array_like, shape (N,)
predictions : array_like, shape (N,)
|
|---|---|
| Returns : | error : float
|
Examples
>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[1, 1, 1])
0.0
>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[1, 1, -1])
0.33333333333333331
>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[1, -1, -1])
0.66666666666666663
>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[-1, -1, -1])
1.0
>>> l1l2py.tools.classification_error(labels=[1, 1, 1], predictions=[10, -2, -3])
0.66666666666666663
Note
The classification error is calculated using this formula

where

Warning
The classification error is calculated using the numpy.sign function. Keep in mind that the sign(x) returns 0 if x==0.
Returns the binary classification error balanced across the size of classes.
This function returns a balanced classification error. With the default value for error_weights, the function assigns greater weight to the errors belonging to the smaller class.
| Parameters : | labels : array_like, shape (N,)
predictions : array_like, shape (N,)
error_weights : array_line, shape (N,), optional (default is None)
|
|---|---|
| Returns : | error : float
|
Examples
>>> l1l2py.tools.balanced_classification_error(labels=[1, 1, 1], predictions=[-1, -1, -1])
0.0
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[-1, 1, 1])
0.0
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[1, -1, -1])
0.88888888888888895
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[1, 1, 1])
0.44444444444444442
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[-1, 1, -1])
0.22222222222222224
>>> l1l2py.tools.balanced_classification_error(labels=[-1, 1, 1], predictions=[-1, 1, -1],
... error_weights=[1, 1, 1])
0.33333333333333331
Note
The balanced classification error is calculated using this formula:

where
is as defined above.
With the default weigths the error function becomes:

Warning
If labels contains only values belonging to one class,
the functions returns always 0.0 because
, than
for
each
.
k-fold cross validation splits.
Given a list of labels, the function produces a list of k splits. Each split is a pair of tuples containing the indexes of the training set and the indexes of the test set.
| Parameters : | labels : array_like, shape (N,)
k : int, greater than 0
rseed : int, optional (default is 0)
|
|---|---|
| Returns : | splits : list of k tuples
|
| Raises : | ValueError :
|
Examples
>>> labels = range(10)
>>> l1l2py.tools.kfold_splits(labels, 2)
[([7, 1, 3, 6, 8], [9, 4, 0, 5, 2]), ([9, 4, 0, 5, 2], [7, 1, 3, 6, 8])]
>>> l1l2py.tools.kfold_splits(labels, 1)
Traceback (most recent call last):
...
ValueError: 'k' must be greater than one and smaller or equal than the number of samples
Sstratified k-fold cross validation splits.
This function is a variation of kfold_splits, which returns stratified splits. The divisions are made by preserving the percentage of samples for each class, assuming that the problem is binary.
| Parameters : | labels : array_like, shape (N,)
k : int, greater than 0
rseed : int, optional (default is 0)
|
|---|---|
| Returns : | splits : list of k tuples
|
| Raises : | ValueError :
ValueError :
|
Examples
>>> labels = range(10)
>>> l1l2py.tools.stratified_kfold_splits(labels, 2)
Traceback (most recent call last):
...
ValueError: 'labels' must contains only two class labels
>>> labels = [1, 1, 1, 1, 1, 1, -1, -1, -1, -1]
>>> l1l2py.tools.stratified_kfold_splits(labels, 2)
[([8, 9, 5, 2, 1], [7, 6, 3, 0, 4]), ([7, 6, 3, 0, 4], [8, 9, 5, 2, 1])]
>>> l1l2py.tools.stratified_kfold_splits(labels, 1)
Traceback (most recent call last):
...
ValueError: 'k' must be greater than one and smaller or equal than number of positive and negative samples
Note
Running this functions more times with the same value of the parameter rseed gives always the same result, in order to allow repeatable experiments. Note, moreover, that each of this functions sets the random seed equal to None, to restore a random seed for the following use of the random module (see random.seed).