Recently there’s quite a few research learning papers detailing work in attempt to unseat XGBoost from the crown of the best model for tabular data. Though none has worked well enough, the biggest contributions of these papers are the XGBoost hyperparameter ranges that they used to tune the models for comparison. The following outlined the ranges used.

Tabular Data: Deep Learning is Not All You Need

Hyperopt

Hyperparameter	Distribution	Range
Eta	Log-Uniform distribution	[e−7,1]
Max depth	Discrete uniform distribution	[1,10]
Subsample	Uniform distribution	[0.2,1]
Colsample bytree	Uniform distribution	[0.2,1]
Colsample bylevel	Uniform distribution	[0.2,1]
Min child weight	Log-Uniform distribution	[e−16, e5]
Alpha	Uniform choice	{0, Log-Uniform distribution [e−16, e2]}
Lambda	Uniform choice	{0, Log-Uniform distribution [e−16, e2]}
Gamma	Uniform choice	{0, Log-Uniform distribution [e−16, e2]}

Regularization is all you Need: Simple Neural Nets can Excel on Tabular Data

Auto-sckit learn, no one hot encoding

Hyperparameter	Type	Range	Log scale
eta	Continuous	[0.001,1]	X
lambda	Continuous	[1e−10,1]	X
alpha	Continuous	[1e−10,1]	X
num_round	Integer	[1,1000]	-
gamma	Continuous	[0.1,1]	X
colsample_bylevel	Continuous	[0.1,1]	-
colsample_bynode	Continuous	[0.1,1]	-
colsample_bytree	Continuous	[0.5,1]	-
max_depth	Integer	[1,20]	-
max_delta_step	Integer	[0,10]	-
min_child_weight	Continuous	[0.1,20]	X
subsample	Continuous	[0.01,1]	-

AWS: Tune an XGBoost Model

Lastly, recommendations from AWS.

Parameter Name	Parameter Type	Recommended Ranges
alpha	ContinuousParameterRanges	MinValue: 0, MaxValue: 1000
colsample_bylevel	ContinuousParameterRanges	MinValue: 0.1, MaxValue: 1
colsample_bynode	ContinuousParameterRanges	MinValue: 0.1, MaxValue: 1
colsample_bytree	ContinuousParameterRanges	MinValue: 0.5, MaxValue: 1
eta	ContinuousParameterRanges	MinValue: 0.1, MaxValue: 0.5
gamma	ContinuousParameterRanges	MinValue: 0, MaxValue: 5
lambda	ContinuousParameterRanges	MinValue: 0, MaxValue: 1000
max_delta_step	IntegerParameterRanges	[0, 10]
max_depth	IntegerParameterRanges	[0, 10]
min_child_weight	ContinuousParameterRanges	MinValue: 0, MaxValue: 120
num_round	IntegerParameterRanges	[1, 4000]
subsample	ContinuousParameterRanges	MinValue: 0.5, MaxValue: 1

References:

Tune an XGBoost Model

Tabular Data: Deep Learning is Not All You Need

Regularization is all you Need: Simple Neural Nets can Excel on Tabular Data

To cite this content, please use:

@article{
    leehanchung,
    author = {Lee, Hanchung},
    title = {XGBoost Hyperparameters Tuning: Research Paper Edition},
    year = {2021},
    howpublished = {\url{https://leehanchung.github.io}},
    url = {https://leehanchung.github.io/2021-07-17-xgboost-hyperparameter-tuning/}
}