A C++ library on regularized methods for feature selection and model assessment [back]

LibMsFeatS (Model assessment for Feature Selection) is a platform independent C++ toolbox that includes the implementation of elastic net and group LASSO for binary and multi-class classification problems. For the latter, the (groups of) features that are selected are the ones that best discriminate between all the classes simultaneously. This leads to a single valid description for all the classes, differently from the classical approach of modeling the multi-class with several one-vs-all binary problems, ending with many different representations to be computed at run time.

Our toolbox also provides model selection functionalies, allowing the user to automatically select the best performing regularization parameters. The input data file is compliant with libSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) format, to favor the interchange with classification.

The toolbox commands are designed in a very intuitive way, to facilitate the usability of both experts and non experts on the topic. Our goal is to provide a feature selection tool which can be instantiated with respect to different setting, prior knowledge on the problem and specific requirements when available, or, in absence, to favor the comparison between different solutions to select the most appropriate for a given task.

Installation

The toolbox source files can be downloaded from here.

Boost (http://www.boost.org) and Eigen 3 (http://eigen.tuxfamily.org) libraries are required prior the installation. Boost libraries are assumed to be available at the default system locations. If this is not your case, you should specify the locations by setting the corresponding path variable.

Compilation can be performed with cmake using the CMakeLists.txt file provided with the code. Run cmake as follows

   ccmake TOOLBOX_FOLDER

and set the Boost and Eigen 3 path variables if needed. Then simply run

   make

to compile.
Functionalities
After compilation two main executables will be available:
  • groupLASSO implements the binary version of elastic net and group LASSO

  • MCgroupLASSO implements the multi-class version of elastic net and group LASSO
Both the executables support a number of options that can be specified at run time:

  --train set arg   train data file. It can be either binary or in libSVM format.
  --tau arg (def. 0.1)   L1 penalty multiplier. It can be a scalar or a string in the form
  min:n_samples:max. In this second case a geometric series of N elements
  will be used to sample the range [min-max].
  --mu arg (def. 0)   L2 penalty multiplier. Same format of --tau
  --model arg   output model file. If not specified information about selected variables are printed
  on stdout
  --groups arg   groups file. It is a text file such that each row corresponds to a group
  and specifies the list of positions of features belonging to it. If specified,
  the algorithm runs structured feature selection, otherwise it applies single
  feature selection.
  --folds arg   kCV folds used for model selection
  --validation-set arg   validation data file for model selection.
  --roc arg   roc base path and name (for groupLASSO). Each row of the roc files reports bias, False
  Positive Rate and True Positive Rate
  --tier arg   tier base path and name (for MCgroupLASSO). Each row of the tier files reports the
  number of top-ranked classes (0 means only the winner is taken into account, 1 refers
  to the first and the second, ...) together with the corresponding accuracy
  --save-all   save all models and roc/tier. If omitted it saves only the model (and roc/tier)
  selected by model selection.
  --debug arg   debug output file
  --binary   use binary dataset files instead of ASCII libSVM format.
  --single-precision   use single precision floating point instead of double
  --mt   use multithread algorithms.
  --help   produce this help message

The toolbox also provides two utility functions, ascii2binary and binary2ascii to convert the input files from one format to the other.

How to use the toolbox

We provide here examples of use of the toolbox functionalities. We focus on a multi-class problem and adopt a dataset, glass.centered, downloaded from http://www.ics.uci.edu/~mlearn/MLRepository.html and provided with the toolbox. The dataset includes 7 classes and descriptors with 9 features.

To obtain a sparser representation you just have to increase the value of tau. Examples:

./MCgrouplasso --train-set glass.centered --tau 1e-3     selects all variables

./MCgrouplasso --train-set glass.centered --tau 1e-2    selects 6 variables (indexes 1 2 3 4 5 7)

./MCgrouplasso --train-set glass.centered --tau 2*1e-2    selects 4 variables (indexes 1 2 3 7)

Let us now suppose we want to select and appropriate values for tau in the range [1e-10 : 1e-3] using 10-folds cross validation and in multi-thread. Then simply run

./MCgrouplasso --train-set glass.centered --tau 1e-10:10:1e-3 --mt --folds 10

An example of stdout message:

No group file specified. Using MCL1L2 [specific feature selection algorithm adopted depending on the options (in this case Multi-Class L1L2 since no groups file was specified)]
Estimated accuracy:0.640187
Parameters selected: tau 2.78256e-05 mu 0
Selected variables: 0 1 2 3 4 5 6 7 8 [single feature indexes, starting from zero]


Copyright notice

Copyright 2014 by Luca Zini, Nicoletta Noceti and Francesca Odone,
Department of Informatics, Bioengineering, Robotics, and Systems Engineering
University of Genova.

This program is free software: you can redistribute it and/or modify it under the terms of the CC-BY Public Licence, Version 4.0, 25 November 2013. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, including but not limited to the warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the CC-BY Public License copy provided with this software for more details.

If you use libMsFeatS in your research, please cite

Structured multi-class feature selection with an application to face recognition, by L.Zini, N.Noceti, G.Fusco, F.Odone
Pattern Recognition Letters, 2014.

Contacts

For more information:
luca.zini <at>unige.it (Luca Zini)
nicoletta.noceti <at>unige.it (Nicoletta Noceti)
francesca.odone <at>unige.it (Francesca Odone)

[back]