Bayes Point Machines
Ralf Herbrich, Thore Graepel, Colin Campbell;
1(Aug):245-279, 2001.
Abstract
Kernel-classifiers comprise a powerful class of non-linear decision
functions for binary classification. The support vector machine is an
example of a learning algorithm for kernel classifiers that singles
out the consistent classifier with the largest margin, i.e. minimal
real-valued output on the training sample, within the set of
consistent hypotheses, the so-called
version space. We suggest
the
Bayes point machine as a well-founded improvement which
approximates the Bayes-optimal decision by the centre of mass of
version space. We present two algorithms to stochastically approximate
the centre of mass of version space: a billiard sampling algorithm and
a sampling algorithm based on the well known perceptron algorithm. It
is shown how both algorithms can be extended to allow for
soft-boundaries in order to admit training errors. Experimentally, we
find that - for the zero training error case - Bayes point
machines consistently outperform support vector machines on both
surrogate data and real-world benchmark data sets. In the
soft-boundary/soft-margin case, the improvement over support vector
machines is shown to be reduced. Finally, we demonstrate that the
real-valued output of single Bayes points on novel test points is a
valid
confidence measure and leads to a steady decrease in
generalisation error when used as a rejection criterion.
[abs]
[pdf]
[ps.gz]
[ps]