Home Topics XGBoost What is XGBoost?
Use XGBoost with watsonx.ai Subscribe for AI updates
Illustration with collage of pictograms of clouds, pie chart, graph pictograms

Published: 9 May 2024
Contributors: Eda Kavlakoglu, Erika Russi

What is XGBoost?

XGBoost (eXtreme Gradient Boosting) is a distributed, open-source machine learning library that uses gradient boosted decision trees, a supervised learning boosting algorithm that makes use of gradient descent. It is known for its speed, efficiency and ability to scale well with large datasets.

Developed by Tianqi Chen from the University of Washington, XGBoost is an advanced implementation of gradient boosting with the same general framework; that is, it combines weak learner trees into strong learners by adding up residuals. The library is available for C++, Python, R, Java, Scala and Julia1.

 

Decision trees vs. boosting

Decision trees are used for classification or regression tasks in machine learning. They use a hierarchical tree structure where an internal node represents a feature, the branch represents a decision rule­ and each leaf node represents the outcome of the dataset.

Because decision trees are prone to overfitting, ensemble methods, like boosting, can often be used to create more robust models. Boosting combines multiple individual weak trees—that is, models that perform slightly better than random chance, to form a strong learner. Each weak learner is trained sequentially to correct the errors made by the previous models. After hundreds of iterations, weak learners are converted into strong learners.

Random forests and boosting algorithms are both popular ensemble learning techniques that use individual learner trees to improve predictive performance. Random forests are based on the concept of bagging (bootstrap aggregating) and train each tree independently to combine their predictions, while boosting algorithms use an additive approach where weak learners are sequentially trained to correct the previous models’ mistakes.

Gradient boosted decision trees 

Gradient boosted decision trees are a type of boosting algorithm that uses gradient descent. Like other boosting methodologies, gradient boosting starts with a weak learner to make predictions. The first decision tree in gradient boosting is called the base learner. Next, new trees are created in an additive manner based on the base learner’s mistakes. The algorithm then calculates the residuals of each tree’s predictions to determine how far off the model’s predictions were from reality. Residuals are the difference between the model’s predicted and actual values. The residuals are then aggregated to score the model with a loss function.

In machine learning, loss functions are used to measure a model's performance. The gradient in gradient boosted decision trees refers to gradient descent. Gradient descent is used to minimize the loss (i.e. to improve the model’s performance) when we train new models. Gradient descent is a popular optimization algorithm used to minimize the loss function in machine learning problems. Some examples of loss functions include mean squared error or mean absolute error for regression problems, cross-entropy loss for classification problems or custom loss functions may be developed for a specific use case and dataset.

Why AI governance is a business imperative for scaling enterprise AI

Learn about barriers to AI adoptions, particularly lack of AI governance and risk management solutions.

Related content

Register for the ebook on responsible AI workflows

Features of XGBoost

Below is a discussion of some of XGBoost’s features in Python that make it stand out compared to the normal gradient boosting package in scikit-learn2:

  • Parallel and distributed computing: The library stores data in in-memory units called blocks. Separate blocks can be distributed across machines or stored on external memory using out-of-core computing. XGBoost also allows for more advanced use cases, such as distributed training across a cluster of computers to speed up computation. XGBoost can also be implemented in its distributed mode using tools like Apache Spark, Dask or Kubernetes.
  • Cache-aware prefetching algorithm: XGBoost uses a cache-aware prefetching algorithm which helps reduce the runtime for large datasets. The library can run more than ten times faster than other existing frameworks on a single machine. Due to its impressive speed, XGBoost can process billions of examples using fewer resources, making it a scalable tree boosting system.
  • Built in regularization: XGBoost includes regularization as part of the learning objective, unlike regular gradient boosting. Data may also be regularized through hyperparameter tuning. Using XGBoost’s built in regularization also allows the library to give better results than the regular scikit-learn gradient boosting package.
  • Handling missing values: XGBoost uses a sparsity-aware algorithm for sparse data. When a value is missing in the dataset, the data point is classified into the default direction and the algorithm learns the best direction to handle missing values.
How XGBoost works

In this section, we will go over how to use the XGBoost package, how to select hyperparameters for the XGBoost tree booster, how XGBoost compares to other boosting implementations and some of its use cases.

Splitting your data and converting to DMatrix format

Assuming you’ve already performed an exploratory data analysis on your data, continue with spitting your data between a training dataset and testing dataset. Next, convert your data into the DMatrix format that XGBoost expects3. DMatrix is XGBoost's internal data structure optimized for memory efficiency and training speed4.

Generate and evaluate the model

Next, instantiate an XGBoost model and, depending on your use case, select which objective function you’d like to use via the “object” hyperparameter. For example, if you have a multi-class classification task, you should set the objective to “multi:softmax”5. Alternatively, if you have a binary classification problem, you can use the logistic regression objective “binary:logistic”. Now you can use your training set to train the model and predict classifications for the data set aside as the test set. Assess the performance of the model by comparing the predicted values with the test set’s actual values. You may use metrics such as accuracy, precision, recall or f-1 score to evaluate your model. You may also want to visualize your true positives, true negatives, false positives and false negatives using a confusion matrix.

Hyperparameter tuning

Next, you may want to iterate through a combination of hyperparameters to help improve the performance of your model. Hyperparameter tuning is the optimization process for a machine learning algorithm’s hyperparameters. The best hyperparameters can be found using grid search and cross-validation methods, which will iterate through a dictionary of possible hyperparameter combinations.

Selected hyperparameters for gradient boosted trees in XGBoost

Below is an explanation of some of the hyperparameters available to tune for gradient boosted trees in XGBoost:

  • Learning rate (also known as the “step size” or the “shrinkage”), is the most important gradient boosting hyperparameter. In the XGBoost library, it is known as “eta”, should be a number between 0 and 1 and the default is 0.36. The learning rate determines the rate at which the boosting algorithm learns from each iteration. A lower value of eta means slower learning, as it scales down the contribution of each tree in the ensemble, thus helping to prevent overfitting. Conversely, a higher value of eta speeds up learning, but it may lead to overfitting if not carefully tuned.
  • The n_estimators hyperparameter specifies the number of trees to be built in the ensemble. Each boosting round adds a new tree to the ensemble and the model slowly learns to correct the errors made by the previous trees. N_estimators directs the complexity of the model and influences both the training time and the model's ability to generalize to unseen data. Increasing the value of n_estimators typically increases the complexity of the model, as it allows the model to capture more intricate patterns in the data. However, adding too many trees can lead to overfitting. Generally speaking, as  n_estimators goes up, the learning rate should go down.
  • Gamma (also known as Lagrange multiplier or the minimum loss reduction parameter) controls the minimum amount of loss reduction required to make a further split on a leaf node of the tree. A lower value means XGBoost stops earlier but may not find the best solution; while a higher value means XGBoost continues training longer, potentially finding better solutions, but with greater risk of overfitting. There is no upper limit for the gamma. The default in XGBoost is 0 and anything over 10 is considered high.
  • Max_depth represents how deeply each tree in the boosting process can grow during training. A tree's depth refers to the number of levels or splits it has from the root node to the leaf nodes. Increasing this value will make the model more complex and more likely to overfit. In XGBoost, the default max_depth is 6, which means that each tree in the model is allowed to grow to a maximum depth of 6 levels.
Implement XGBoost in Python

Dive into this tutorial, which uses the XGBoost algorithm to perform a classification task.

Comparing XGBoost to other boosting algorithms

XGBoost is one of many available open-source boosting algorithms. In this section, we’ll compare XGBoost to three other boosting frameworks.

XGBoost vs. AdaBoost

AdaBoost is an early boosting algorithm invented by Yoav Freund and Robert Schapire in 19957. In AdaBoost, more emphasis is made on incorrect predictions through a system of weights that affect those harder to predict data points more significantly. First, each data point in the dataset is assigned a specific weight. As the weak learners correctly predict an example, the example’s weight is reduced. But if learners get an example wrong, the weight for that data point increases.  As new trees are created, their weights are based on the misclassifications of the previous learner trees. As the number of learners increases, the samples that are easy to predict will be used less for future learners while those data points that are harder to predict will be weighted more prominently. Gradient boosting and XGBoost tend to be stronger alternatives to AdaBoost due to their accuracy and speed.

XGBoost vs. CatBoost

CatBoost is another gradient boosting framework. Developed by Yandex in 2017, it specializes in handling categorical features without any need for preprocessing and generally performs well out-of-the-box without the need to perform extensive hyperparameter tuning8. Like XGBoost, CatBoost has built in support for handling missing data. CatBoost is especially useful for datasets with many categorical features. According to Yandex, the framework is used for search, recommendation systems, personal assistants, self-driving cars, weather prediction and other tasks.

XGBoost vs. LightGBM

LightGBM (Light Gradient Boosting Machine) is the final gradient boosting algorithm we will review. LightGBM was developed by Microsoft and first released in 20169. Where most decision tree learning algorithms grow trees depth-wise, LightGBM uses a leaf-wise tree growth strategy10. Like XGBoost, LightGBM exhibits fast model training speed and accuracy and performs well with large datasets.

Applications of XGBoost

XGBoost and gradient boosted decision trees are used across a variety of data science applications, including:

  • Learning to rank: One of the most popular use cases for the XGBoost algorithm is as a ranker. In information retrieval, the goal of learning to rank is to serve users content ordered by relevance. In XGBoost, the XGBRanker is based on the LambdaMART algorithm11.
  • Advertisement click through rate prediction: Researchers used an XGBoost trained model to determine how frequently online ads had been clicked in 10 days of click through data. The goal of the research was to measure the effectiveness of online ads and pinpoint which ads work well12.
  • Store sales prediction: XGBoost may be used for predictive modeling, as demonstrated in this paper where sales from 45 Walmart stores were predicted using an XGBoost model13.
  • Malware classification: Using an XGBoost classifier, engineers at the Technical University of Košice were able to classify malware accurately, as shown in their paper14.
  • Kaggle competitions: XGBoost has been a popular winning algorithm in Kaggle competitions, as noted on the DMLC (Distributed (Deep) Machine Learning Community) page featuring a list of recent Kaggle competition winners who used XGBoost for their entries15
Related resources What is a decision tree?

Learn about this non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks.

What is boosting?

Learn about this ensemble learning method, which combines a set of weak learners into a strong learner.

Implement XGBoost in R

Dive into this tutorial, which uses the XGBoost algorithm to perform a classification task.

Take the next step

Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.

Explore watsonx.ai Book a live demo
Footnotes

1 "Scalable and Flexible Gradient Boosting," https://xgboost.ai/ (link resides outside ibm.com).

2 Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System," University of Washington, 10 June 2016, https://arxiv.org/pdf/1603.02754 (link resides outside ibm.com).

3 "XGBoost Python Package Introduction, Data Interface," https://xgboost.readthedocs.io/en/stable/python/python_intro.html#data-interface (link resides outside ibm.com).

4 "XGBoost API Reference, Core Data Structure," https://xgboost.readthedocs.io/en/stable/python/python_api.html#module-xgboost.core (link resides outside ibm.com).

5 "XGBoost Parameters, Learning Task Parameters," https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters (link resides outside ibm.com).

6 "XGBoost Parameters for Tree Booster," https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster (link resides outside ibm.com).

7 Yoav Freund and Robert E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of Computer and System Sciences, Vol. 55, pp. 119–139, August 1997.

8 "CatBoost is a high-performance open source library for gradient boosting on decision trees," https://catboost.ai/ (link resides outside ibm.com).

9 Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma and Tie-Yan Liu, "A Communication-Efficient Parallel Algorithm for Decision Tree," Peking University, Microsoft Research and Chinese Academy of Mathematics and Systems Science, 4 November 2016, https://arxiv.org/pdf/1611.01276 (link resides outside ibm.com).

10 "LightGBM Features, Leaf-wise (Best-first) Tree Growth," https://lightgbm.readthedocs.io/en/latest/Features.html#leaf-wise-best-first-tree-growth (link resides outside ibm.com).

11 "XGBoost Tutorials, Learning to Rank Overview," https://xgboost.readthedocs.io/en/latest/tutorials/learning_to_rank.html#overview (link resides outside ibm.com).

12 AlAli Moneera, AlQahtani Maram, AlJuried Azizah, Taghareed AlOnizan, Dalia Alboqaytah, Nida Aslam and Irfan Ullah Khan, "Click through Rate Effectiveness Prediction on Mobile Ads Using Extreme Gradient Boosting," College of Computer Science and Information Technology, Imam Abdulrahman bin Faisal University, 12 September 2020, https://www.techscience.com/cmc/v66n2/40673/html (link resides outside ibm.com).

13 Yetunde Faith Akande, Joyce Idowu, Abhavya Gautam, Sanjay Misra, Oluwatobi Noah Akande and Ranjan Kumar Behera, "Application of Xgboost Algorithm for Sales Forecasting Using Walmart Dataset," Landmark University, Ladoke Akintola University of Technology, Brandan University, Covenant University and XIM University, June 2022, https://www.researchgate.net/publication/361549465_Application_of_XGBoost_Algorithm_for_Sales_Forecasting_Using_Walmart_Dataset (link resides outside ibm.com).

14 Jakub Palša, Norbert Ádám, Ján Hurtuk, Eva Chovancová, Branislav Madoš, Martin Chovanec and Stanislav Kocan, "MLMD—A Malware-Detecting Antivirus Tool Based on the XGBoost Machine Learning Algorithm," MDPI.com Journal of Applied Sciences, Vol 12, 6672, 1 July 2022, https://www.mdpi.com/2076-3417/12/13/6672 (link resides outside ibm.com).

15 "Distributed (Deep) Machine Learning Community XGBoost Machine Learning Challenge Winning Solutions," https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions (link resides outside ibm.com).