Extensions to Metric-Based Model Selection
Yoshua Bengio, Nicolas Chapados;
3(Mar):1209-1227, 2003.
Abstract
Metric-based methods have recently been introduced for model selection and regularization, often yielding very
significant improvements over the alternatives tried (including cross-validation). All these methods require unlabeled data
over which to compare functions and detect gross differences in behavior away from the training points. We introduce three
new extensions of the metric model selection methods and apply them to feature selection. The first extension takes
advantage of the particular case of time-series data in which the task involves prediction with a horizon
h. The idea is
to use at
t the
h unlabeled examples that precede
t for model selection. The second extension takes advantage of the
different error distributions of cross-validation and the metric methods: cross-validation tends to have a larger variance
and is unbiased. A hybrid combining the two model selection methods is rarely beaten by any of the two methods. The third
extension deals with the case when unlabeled data is not available at all, using an estimated input density. Experiments
are described to study these extensions in the context of capacity control and feature subset selection.
[abs]
[pdf]
[ps.gz]
[ps]