XGBoost (eXtreme Gradient Boosting) is a distributed, open-source machine learning library that uses gradient boosted decision trees, a supervised learning boosting algorithm that makes use of gradient descent. It is known for its speed, efficiency and ability to scale well with large datasets.
Developed by Tianqi Chen from the University of Washington, XGBoost is an advanced implementation of gradient boosting with the same general framework; that is, it combines weak learner trees into strong learners by adding up residuals. The library is available for C++, Python, R, Java, Scala and Julia1.
Decision trees are used for classification or regression tasks in machine learning. They use a hierarchical tree structure where an internal node represents a feature, the branch represents a decision rule and each leaf node represents the outcome of the dataset.
Because decision trees are prone to overfitting, ensemble methods, like boosting, can often be used to create more robust models. Boosting combines multiple individual weak trees—that is, models that perform slightly better than random chance, to form a strong learner. Each weak learner is trained sequentially to correct the errors made by the previous models. After hundreds of iterations, weak learners are converted into strong learners.
Random forests and boosting algorithms are both popular ensemble learning techniques that use individual learner trees to improve predictive performance. Random forests are based on the concept of bagging (bootstrap aggregating) and train each tree independently to combine their predictions, while boosting algorithms use an additive approach where weak learners are sequentially trained to correct the previous models’ mistakes.
Gradient boosted decision trees are a type of boosting algorithm that uses gradient descent. Like other boosting methodologies, gradient boosting starts with a weak learner to make predictions. The first decision tree in gradient boosting is called the base learner. Next, new trees are created in an additive manner based on the base learner’s mistakes. The algorithm then calculates the residuals of each tree’s predictions to determine how far off the model’s predictions were from reality. Residuals are the difference between the model’s predicted and actual values. The residuals are then aggregated to score the model with a loss function.
In machine learning, loss functions are used to measure a model's performance. The gradient in gradient boosted decision trees refers to gradient descent. Gradient descent is used to minimize the loss (i.e. to improve the model’s performance) when we train new models. Gradient descent is a popular optimization algorithm used to minimize the loss function in machine learning problems. Some examples of loss functions include mean squared error or mean absolute error for regression problems, cross-entropy loss for classification problems or custom loss functions may be developed for a specific use case and dataset.
Below is a discussion of some of XGBoost’s features in Python that make it stand out compared to the normal gradient boosting package in scikit-learn2:
In this section, we will go over how to use the XGBoost package, how to select hyperparameters for the XGBoost tree booster, how XGBoost compares to other boosting implementations and some of its use cases.
Assuming you’ve already performed an exploratory data analysis on your data, continue with splitting your data between a training dataset and testing dataset. Next, convert your data into the DMatrix format that XGBoost expects3. DMatrix is XGBoost's internal data structure optimized for memory efficiency and training speed4.
Next, instantiate an XGBoost model and, depending on your use case, select which objective function you’d like to use via the “object” hyperparameter. For example, if you have a multi-class classification task, you should set the objective to “multi:softmax”5. Alternatively, if you have a binary classification problem, you can use the logistic regression objective “binary:logistic”. Now you can use your training set to train the model and predict classifications for the data set aside as the test set. Assess the performance of the model by comparing the predicted values with the test set’s actual values. You may use metrics such as accuracy, precision, recall or f-1 score to evaluate your model. You may also want to visualize your true positives, true negatives, false positives and false negatives using a confusion matrix.
Next, you may want to iterate through a combination of hyperparameters to help improve the performance of your model. Hyperparameter tuning is the optimization process for a machine learning algorithm’s hyperparameters. The best hyperparameters can be found using grid search and cross-validation methods, which will iterate through a dictionary of possible hyperparameter combinations.
Below is an explanation of some of the hyperparameters available to tune for gradient boosted trees in XGBoost:
XGBoost is one of many available open-source boosting algorithms. In this section, we’ll compare XGBoost to three other boosting frameworks.
AdaBoost is an early boosting algorithm invented by Yoav Freund and Robert Schapire in 19957. In AdaBoost, more emphasis is made on incorrect predictions through a system of weights that affect those harder to predict data points more significantly. First, each data point in the dataset is assigned a specific weight. As the weak learners correctly predict an example, the example’s weight is reduced. But if learners get an example wrong, the weight for that data point increases. As new trees are created, their weights are based on the misclassifications of the previous learner trees. As the number of learners increases, the samples that are easy to predict will be used less for future learners while those data points that are harder to predict will be weighted more prominently. Gradient boosting and XGBoost tend to be stronger alternatives to AdaBoost due to their accuracy and speed.
CatBoost is another gradient boosting framework. Developed by Yandex in 2017, it specializes in handling categorical features without any need for preprocessing and generally performs well out-of-the-box without the need to perform extensive hyperparameter tuning8. Like XGBoost, CatBoost has built in support for handling missing data. CatBoost is especially useful for datasets with many categorical features. According to Yandex, the framework is used for search, recommendation systems, personal assistants, self-driving cars, weather prediction and other tasks.
LightGBM (Light Gradient Boosting Machine) is the final gradient boosting algorithm we will review. LightGBM was developed by Microsoft and first released in 20169. Where most decision tree learning algorithms grow trees depth-wise, LightGBM uses a leaf-wise tree growth strategy10. Like XGBoost, LightGBM exhibits fast model training speed and accuracy and performs well with large datasets.
XGBoost and gradient boosted decision trees are used across a variety of data science applications, including:
We surveyed 2,000 organizations about their AI initiatives to discover what's working, what's not and how you can get ahead.
IBM® Granite™ is our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Access our full catalog of over 100 online courses by purchasing an individual or multi-user subscription today, enabling you to expand your skills across a range of our products at one low price.
Led by top IBM thought leaders, the curriculum is designed to help business leaders gain the knowledge needed to prioritize the AI investments that can drive growth.
Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.
Learn how to confidently incorporate generative AI and machine learning into your business.
Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.
1 "Scalable and Flexible Gradient Boosting," https://xgboost.ai/.
2 Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System," University of Washington, 10 June 2016, https://arxiv.org/pdf/1603.02754.
3 "XGBoost Python Package Introduction, Data Interface," https://xgboost.readthedocs.io/en/stable/python/python_intro.html#data-interface.
4 "XGBoost API Reference, Core Data Structure," https://xgboost.readthedocs.io/en/stable/python/python_api.html#module-xgboost.core.
5 "XGBoost Parameters, Learning Task Parameters," https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters.
6 "XGBoost Parameters for Tree Booster," https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster.
7 Yoav Freund and Robert E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of Computer and System Sciences, Vol. 55, pp. 119–139, August 1997.
8 "CatBoost is a high-performance open source library for gradient boosting on decision trees," https://catboost.ai/.
9 Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma and Tie-Yan Liu, "A Communication-Efficient Parallel Algorithm for Decision Tree," Peking University, Microsoft Research and Chinese Academy of Mathematics and Systems Science, 4 November 2016, https://arxiv.org/pdf/1611.01276.
10 "LightGBM Features, Leaf-wise (Best-first) Tree Growth," https://lightgbm.readthedocs.io/en/latest/Features.html#leaf-wise-best-first-tree-growth.
11 "XGBoost Tutorials, Learning to Rank Overview," https://xgboost.readthedocs.io/en/latest/tutorials/learning_to_rank.html#overview.
12 AlAli Moneera, AlQahtani Maram, AlJuried Azizah, Taghareed AlOnizan, Dalia Alboqaytah, Nida Aslam and Irfan Ullah Khan, "Click through Rate Effectiveness Prediction on Mobile Ads Using Extreme Gradient Boosting," College of Computer Science and Information Technology, Imam Abdulrahman bin Faisal University, 12 September 2020, https://www.techscience.com/cmc/v66n2/40673/html.
13 Yetunde Faith Akande, Joyce Idowu, Abhavya Gautam, Sanjay Misra, Oluwatobi Noah Akande and Ranjan Kumar Behera, "Application of Xgboost Algorithm for Sales Forecasting Using Walmart Dataset," Landmark University, Ladoke Akintola University of Technology, Brandan University, Covenant University and XIM University, June 2022, https://www.researchgate.net/publication/361549465_Application_of_XGBoost_Algorithm_for_Sales_Forecasting_Using_Walmart_Dataset.
14 Jakub Palša, Norbert Ádám, Ján Hurtuk, Eva Chovancová, Branislav Madoš, Martin Chovanec and Stanislav Kocan, "MLMD—A Malware-Detecting Antivirus Tool Based on the XGBoost Machine Learning Algorithm," MDPI.com Journal of Applied Sciences, Vol 12, 6672, 1 July 2022, https://www.mdpi.com/2076-3417/12/13/6672.
15 "Distributed (Deep) Machine Learning Community XGBoost Machine Learning Challenge Winning Solutions," https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions.