Efficient Algorithms for Decision Tree Cross-validation
Hendrik Blockeel, Jan Struyf;
3(Dec):621-650, 2002.
Abstract
Cross-validation is a useful and generally applicable technique
often employed in machine learning, including decision
tree induction. An important disadvantage of straightforward implementation
of the technique is its computational overhead. In this paper
we show that, for decision trees, the computational overhead of
cross-validation can be reduced significantly by integrating the
cross-validation with the normal decision tree induction process.
We discuss how existing decision tree algorithms can be adapted to this
aim, and provide an analysis of the speedups these adaptations
may yield. We identify a number of parameters that influence the obtainable
speedups, and validate and refine our analysis with experiments
on a variety of data sets with two different implementations. Besides
cross-validation, we also briefly explore the usefulness of these techniques
for bagging. We conclude with some guidelines concerning when these
optimizations should be considered.
[abs]
[pdf]
[ps.gz]
[ps]