Recently there’s quite a few research learning papers detailing work in attempt to unseat XGBoost from the crown of the best model for tabular data. Though none has worked well enough, the biggest contributions of these papers are the XGBoost hyperparameter ranges that they used to tune the models for comparison. The following outlined the ranges used.

Tabular Data: Deep Learning is Not All You Need

Hyperopt

Hyperparameter Distribution Range
Eta Log-Uniform distribution [e−7,1]
Max depth Discrete uniform distribution [1,10]
Subsample Uniform distribution [0.2,1]
Colsample bytree Uniform distribution [0.2,1]
Colsample bylevel Uniform distribution [0.2,1]
Min child weight Log-Uniform distribution [e−16, e5]
Alpha Uniform choice {0, Log-Uniform distribution [e−16, e2]}
Lambda Uniform choice {0, Log-Uniform distribution [e−16, e2]}
Gamma Uniform choice {0, Log-Uniform distribution [e−16, e2]}

Regularization is all you Need: Simple Neural Nets can Excel on Tabular Data

Auto-sckit learn, no one hot encoding

Hyperparameter Type Range Log scale
eta Continuous [0.001,1] X
lambda Continuous [1e−10,1] X
alpha Continuous [1e−10,1] X
num_round Integer [1,1000] -
gamma Continuous [0.1,1] X
colsample_bylevel Continuous [0.1,1] -
colsample_bynode Continuous [0.1,1] -
colsample_bytree Continuous [0.5,1] -
max_depth Integer [1,20] -
max_delta_step Integer [0,10] -
min_child_weight Continuous [0.1,20] X
subsample Continuous [0.01,1] -

AWS: Tune an XGBoost Model

Lastly, recommendations from AWS. | Parameter Name | Parameter Type | Recommended Ranges | |—————-|—————-|——————–| | alpha | ContinuousParameterRanges | MinValue: 0, MaxValue: 1000 | | colsample_bylevel | ContinuousParameterRanges | MinValue: 0.1, MaxValue: 1 | | colsample_bynode | ContinuousParameterRanges | MinValue: 0.1, MaxValue: 1 | | colsample_bytree |ContinuousParameterRanges | MinValue: 0.5, MaxValue: 1 | | eta | ContinuousParameterRanges | MinValue: 0.1, MaxValue: 0.5 | | gamma | ContinuousParameterRanges | MinValue: 0, MaxValue: 5 | | lambda | ContinuousParameterRanges | MinValue: 0, MaxValue: 1000 | | max_delta_step | IntegerParameterRanges | [0, 10] | | max_depth | IntegerParameterRanges | [0, 10] | | min_child_weight | ContinuousParameterRanges | MinValue: 0, MaxValue: 120 | | num_round | IntegerParameterRanges | [1, 4000] | | subsample | ContinuousParameterRanges | MinValue: 0.5, MaxValue: 1 |

References:

Tune an XGBoost Model

Tabular Data: Deep Learning is Not All You Need

Regularization is all you Need: Simple Neural Nets can Excel on Tabular Data

@misc{leehanchung,
author = {Lee, Hanchung},
title = {XGBoost Hyperparameters Tuning: Research Paper Edition},
year = {2021},
howpublished = {Github Repo},
url = {https://leehanchung.github.io/2021-07-17-xgboost-hyperparameter-tuning/}