Understanding and Overcoming the Statistical Limitations of Decision Trees

December 1, 2023 11:00 AM Singapore


Decision trees are important both as interpretable models, amenable to high-stakes decision-making, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not yet well understood. In particular, it is unclear why there is a prediction performance gap between them and powerful but uninterpretable machine learning methods. In this talk, we discuss how to bridge this gap partially via Hierarchical Shrinkage (HS), a post-hoc algorithm which regularizes the tree not by altering its structure, but by shrinking the prediction over each leaf toward the sample means over each of its ancestors. Furthermore, we discuss generalization lower bounds that reveal some of the inductive biases of tree-based methods, and how HS helps to overcome some of it.


  1. https://proceedings.mlr.press/v162/agarwal22b.html
  2. https://proceedings.mlr.press/v151/shuo-tan22a.html

About the Speaker

Yan Shuo Tan is an assistant professor at the Department of Statistics and Data Science at the National University of Singapore. He was previously a Neyman Visiting Assistant Professor at UC Berkeley’s Statistics Department, where he was advised by Bin Yu. He did his PhD in Mathematics at the University of Michigan, where he was advised by Roman Vershynin. His current research is in statistical machine learning, focusing on the theory, methodology and applications of modeling with decision trees and tree ensembles.

For more information about the ESD Seminar, please email esd_invite@sutd.edu.sg

Yan Shuo Tan (National University of Singapore) - Understanding and Overcoming the Statistical Limitations of Decision Trees