Home Page

Papers

Submissions

News

Editorial Board

Special Issues

Open Source Software

Proceedings (PMLR)

Data (DMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

To Tune or Not to Tune the Number of Trees in Random Forest

Philipp Probst, Anne-Laure Boulesteix; 18(181):1−18, 2018.

Abstract

The number of trees $T$ in the random forest (RF) algorithm for supervised learning has to be set by the user. It is unclear whether $T$ should simply be set to the largest computationally manageable value or whether a smaller $T$ may be sufficient or in some cases even better. While the principle underlying bagging is that more trees are better, in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting $T$ to a computationally feasible large number as long as classical error measures based on average loss are considered.

[abs][pdf][bib]       
© JMLR 2018. (edit, beta)

Mastodon