XGBoost With Python
Discover The Algorithm That Is Winning Machine Learning Competitions
XGBoost is the dominant technique for predictive modeling on regular data.
The gradient boosting algorithm is the top technique on a wide range of predictive modeling problems, and XGBoost is the fastest implementation. When asked, the best machine learning competitors in the world recommend using XGBoost.
In this new Ebook written in the friendly Machine Learning Mastery style that you’re used to, learn exactly how to get started and bring XGBoost to your own machine learning projects. After purchasing you will get:
- 155 Page PDF Ebook.
- 30 Python Recipes.
- 15 Step-by-Step Tutorial Lessons.
Apply XGBoost To Your Projects Today!
Click to jump straight to the packages.
Why Is XGBoost So Powerful?
… the secret is its “speed” and “model performance”
The Gradient Boosting algorithm has been around since 1999. So why is it so popular right now?
The reason is that we now have machines fast enough and enough data to really make this algorithm shine.
Academics and researchers knew it was a dominant algorithm, more powerful than random forest, but few people in industry knew about it.
This was due to two main reasons:
- The implementations of gradient boosting in R and Python were not really developed for performance and hence took a long time to train even modest sized models.
- Because of the lack of attention on the algorithm, there were few good heuristics on which parameters to tune and how to tune them.
Naive implementations are slow, because the algorithm requires one tree to be created at a time to attempt to correct the errors of all previous trees in the model.
This sequential procedure results in models with really great predictive capability, but can be very slow to train when hundreds or thousands of trees need to be created from large datasets.
XGBoost Changed Everything
XGBoost was developed by Tianqi Chen and collaborators for speed and performance.
Tianqi is a top machine learning researcher, so he knows deeply how the algorithm works. He is also a very good engineer, so he knows how to build high-quality software.
This combination allowed him to combine his talents and re-frame the interns of the gradient boosting algorithm in such a way that it can exploit the full potential of the memory and CPU cores of your hardware.
In XGBoost, individual trees are created using multiple cores and data is organized to minimize the lookup times, all good computer science tips and tricks.
The result is an implementation of gradient boosting in the XGBoost library that can be configured to squeeze the best performance from your machine, whilst offering all of the knobs and dials to tune the behavior of the algorithm to your specific problem.
This Power Did Not Go Unnoticed
Soon after the release of XGBoost, top machine learning competitors started using it.
More than that, they started winning competitions on sites like Kaggle. And they were not shy about sharing the news about XGBoost.
For example, here are some quotes from top Kaggle competitors:
As the winner of an increasing amount of Kaggle competitions, XGBoost showed us again to be a great all-round algorithm worth having in your toolbox.
— Dato Winners’ Interview, Mad Professors
I only used XGBoost.
— Liberty Mutual Property Inspection Winner’s Interview, Qingchen Wang
In fact, the formally ranked #1 Kaggle competitor in the world, Owen Zhang, strongly encourages the use of XGBoost:
When in doubt, use xgboost.
— Avito Winner’s Interview, Owen Zhang
XGBoost is a powerhouse when it comes to developing predictive models.
So how do you get started using it?
How Do You Get Started Using XGBoost
…be systematic and develop a new core skill
The Slow Way
The way that most people get started with XGBoost is the slow way.
- They try and find and read all of the official documentation for the library.
- Next, they try to adapt demos and examples to their problem.
The problem is they don’t even know anything about the underlying algorithm that XGBoost implements. Therefore, they don’t know what parameters to tune to best adapt the algorithm to their problem.
They most definitely don’t know about the full capabilities of the library.
This is the slow and frustrating way to get started with XGBoost, and sadly it is the most common.
The Fast Way
Knowing that things can be different, you can see the faster path:
- Learn something about the underlying algorithm so you know how to configure it.
- Learn about the suite of key features supported by the library.
- Practice using features of the library on small well understood problems.
- Get started applying XGBoost to your own problem.
This will cut the time taken in going from beginner to proficient practitioner by a factor of 2x or 4x if not more.
You also get the benefits of really knowing how to wield XGBoost in a range of different situations.
But you still have to find and gather all of the materials together yourself, and then study them.
The Best Way
There is an even faster way.
- Find an expert who has actually done all of the research and who has actually use XGBoost on real problems.
- Have them prepare the materials for you to study.
In addition to saving you a lot of wasteful time researching algorithm and library details, this approach can speed up the learning process by giving you access to:
- Tips and tricks to get past roadblocks and get the most from the algorithm.
- Code examples that work, can be run immediately and can provide templates for your own problems.
- An expert who can answer questions and point you to the best results to learn more.
If you want to get started with XGBoost, then you are in the right place.
Introducing “XGBoost With Python”
…your ticket to developing and tuning XGBoost models
This book was designed using for you as a developer to rapidly get up to speed with applying Gradient Boosting in Python using the best-of-breed library XGBoost.
The Ebook uses a step-by-step tutorial approach throughout to help you focus on getting results in your projects and delivering value.
The goal is to get you up to speed on gradient boosting and XGBoost to quickly create your first gradient boosting model as fast as possible, then guide you through the finer points of the library and tuning your models.
This Ebook is your guide to developing and tuning XGBoost models on your own machine learning projects.
Let’s take a closer look at the breakdown of what you will discover inside this Ebook.
Everything You Need To Know to Develop XGBoost Model in Python
This Ebook designed to get you up and running with XGBoost as fast as possible.
As such, a series of step-by-step tutorial based lessons was designed to lead you from XGBoost beginner to being an effective XGBoost practitioner.
Below is an overview of the step-by-step lessons on XGBoost you will complete divided into three parts:
Part 1: XGBoost Basics
- Lesson 01: A Gentle Introduction to Gradient Boosting.
- Lesson 02: A Gentle Introduction to XGBoost.
- Lesson 03: How to Develop your First XGBoost Model in Python.
- Lesson 04: How to Best Prepare Data For Use With XGBoost.
- Lesson 05: How to Evaluate the Performance of Models.
- Lesson 06: How to Visualize Individual Decision Trees in XGBoost.
Part 2: XGBoost Advanced
- Lesson 07: How to Save And Load XGBoost Models.
- Lesson 08: How to Review and Use Feature Importance.
- Lesson 09: How to Monitor Performing and Use Early Stopping.
- Lesson 10: How to Configure XGBoost for Multithreading.
- Lesson 11: How to Develop Large XGBoost models in the Cloud.
Part 3: XGBoost Tuning
- Lesson 12: Best Practices When Configuring XGBoost.
- Lesson 13: How to Tune the Number and Size of Decision Trees.
- Lesson 14: How to Tune Learning Rate and Number of Trees.
- Lesson 15: How to Tune Sampling in Stochastic Gradient Boosting.
Each lesson was designed to be completed in about 30 minutes by the average developer
Here’s Everything You’ll Get…
in XGBoost With Python
A digital download that contains everything you need, including:
- Clear algorithm descriptions that help you to understand the principles that underlie the technique.
- Step-by-step XGBoost tutorials to show you exactly how to apply each method.
- Python source code recipes for every example in the book so that you can run the tutorial and project code in seconds.
- Digital Ebook in PDF format so that you can have the book open side-by-side with the code and see exactly how each example works.
The XGBoost basics to get you started and build a foundation, including:
- The gradient boosting algorithm description and the 4 extensions that improve performance.
- The XGBoost implementation of gradient boosting and the key differences that make it so fast.
- The application of XGBoost to a simple predictive modeling problem, step-by-step.
- The 2 important steps in data preparation you must know when using XGBoost with scikit-learn.
- The surprising automatic handling of missing values and how it compares to imputing values manually.
- The 2 ways to estimate model performance of XGBoost models with scikit-learn.
- The visualization of individual trees within a trained XGBoost model.
Advanced Usage and Tuning
The advanced XGBoost usage to speed-up your own projects, including:
- The 2 techniques to save a trained XGBoost model and later load it to make predictions on new data.
- The calculation of feature importance scores and the 2 ways to plot the results.
- The diagnostics of plotting learning curves from XGBoost models and how to stop training early.
- The multithreading support of XGBoost and how to best harness this feature when parallelizing models.
- The use of Amazon cloud computing to speed up the training of very large XGBoost models using lots of CPU cores.
The important XGBoost model tuning steps needed to get the best results, including:
- The expert best practices that you need to know when tuning gradient boosting models.
- The balance between the size and number of decision trees when tuning XGBoost models.
- The slowing down of learning during training with learning rate and the impact on the number of trees.
- The careful use of random sampling of rows and columns in tree construction and how this affects the mean and variance of performance.
Resources you need to go deeper, when you need to, including:
- Top machine learning textbooks and the specific chapters that discuss gradient boosting to deepen your understanding, if you crave more.
- Seminal gradient boosting papers by the experts and links to download the PDF versions.
- The best places online where you can find more details about the XGBoost library.
What More Do You Need?
Take a Sneak Peek Inside The Ebook
BONUS: XGBoost Python Code Recipes
…you also get 30 fully working XGBoost scripts
Each recipe presented in the book is standalone, meaning that you can copy and paste it into your project and use it immediately.
- You get one Python script (.py) for each example provided in the book.
- You get the datasets used throughout the book.
Your XGBoost Code Recipe Library covers the following topics:
- Binary Classification
- Multiclass Classification
- One Hot Encoding
- k-fold Cross Validation
- Train-Test Splits
- Tree Visualization
- Model Serialization
- Feature Importance Scoring
- Feature Selection
- Early Stopping
- Multicore and Multithreaded Configuration
- Grid Search Hyperparameter Tuning
This means that you can follow along and compare your answers to a known working implementation of each algorithm in the provided Python files.
This helps a lot to speed up your progress when working through the details of a specific task.
About The Author
Hi, I'm Jason Brownlee.
I live in Australia with my wife and son and love to write and code.
I have a computer science background as well as a Masters and Ph.D. degree in Artificial Intelligence.
I’ve written books on algorithms, won and ranked in the top 10% in machine learning competitions, consulted for startups and spent a long time working on systems for forecasting tropical cyclones. (yes I have written tons of code that runs operationally)
I get a lot of satisfaction helping developers get started and get really good at machine learning.
I teach an unconventional top-down and results-first approach to machine learning where we start by working through tutorials and problems, then later wade into theory as we need it.
I'm here to help if you ever have any questions. I want you to be awesome at machine learning.
Get Your Sample Chapter
Want to take a look before you buy? Download a free sample chapter PDF.
Enter your email address and your sample chapter will be sent to your inbox.
Check Out What Customers Are Saying:
You're Not Alone in Choosing Machine Learning Mastery
Trusted by Over 10,000 Practitioners
...including employees from companies like:
...students and faculty from universities like:
and many thousands more...
Absolutely No Risk with...
100% Money Back Guarantee
Plus, as you should expect of any great product on the market, every Machine Learning Mastery Ebook
comes with the surest sign of confidence: my gold-standard 100% money-back guarantee.
100% Money-Back Guarantee
If you're not happy with your purchase of any of the Machine Learning Mastery Ebooks,
just email me within 90 days of buying, and I'll give you your money back ASAP.
No waiting. No questions asked. No risk.
Get Results With The Algorithm That Is
Winning Machine Learning Competitions
Choose Your Package:
You will get:
- XGBoost With Python
(including bonus source code)
Python Pro Bundle
You get the 3-book set:
- Machine Learning Mastery With Python
- Deep Learning With Python
- XGBoost With Python
(includes all bonus source code)
(save $37, like getting a book for free!)
You get the complete 7-book set:
- Master Machine Learning Algorithms
- ML Algorithms From Scratch
- Machine Learning Mastery With Weka
- Machine Learning Mastery With R
- Machine Learning Mastery With Python
- Deep Learning With Python
- XGBoost With Python
(includes all bonus source code)
(save a massive $72)
(1) Click the button. (2) Enter your details. (3) Download your package immediately.
Secure Payment Processing With SSL Encryption
Have more Questions?
Are you a Student?
Want it for the Team?
What Are Skills in Machine Learning Worth?
Your boss asks you:
Hey, can you build a predictive model for this?
Imagine you had the skills and confidence to say:
...and follow through.
I have been there. It feels great!
How much is that worth to you?
The industry is demanding skills in machine learning.
The market wants people that can deliver results, not write academic papers.
Business knows what these skills are worth and are paying sky-high starting salaries.
A Data Scientists Salary Begins at:
$100,000 to $150,000.
A Machine Learning Engineers Salary is Even Higher.
What Are Your Alternatives?
You made it this far.
You're ready to take action.
But, what are your alternatives? What options are there?
(1) A Theoretical Textbook for $100+
...it's boring, math-heavy and you'll probably never finish it.
(2) An On-site Boot Camp for $10,000+
...it's full of young kids, you must travel and it can take months.
(3) A Higher Degree for $100,000+
...it's expensive, takes years, and you'll be an academic.
For the Hands-On Skills You Get...
And the Speed of Results You See...
And the Low Price You Pay...
Machine Learning Mastery Ebooks are
And they work. That's why I offer the money-back guarantee.
You're A Professional
The field moves quickly,
...how long can you wait?
You think you have all the time in the world, but...
- New methods are devised and algorithms change.
- New books get released and prices increase.
- New graduates come along and jobs get filled.
Right Now is the Best Time to make your start.
Bottom-up is Slow and Frustrating,
...don't you want a faster way?
Can you really go on another day, week or month...
- Scraping ideas and code from incomplete posts.
- Skimming theory and insight from short videos.
- Parsing Greek letters from academic textbooks.
Targeted Training is your Shortest Path to a result.
Professionals Use Training To Stay On Top Of Their Field
Get The Training You Need!
You don't want to fall behind or miss the opportunity.
Frequently Asked Questions
What programming language is used? All examples use the Python programming language version 2 or 3. It assumes you have a working SciPy environment with NumPy, pandas, matplotlib and scikit-learn installed.
Do I need to be a good programmer? Not at all. This Ebook requires that you have a programmers mindset of thinking in procedures and learning by doing. You do not need to be an excellent programmer to read and learn about machine learning algorithms.
How much math do I need to know? No background in statistics, probability or linear algebra is required. We do not derive any equations.
How many pages it the Ebook? The Ebook is 115 pages.
How many example Python scripts are included? There are 30 Python scripts included.
Is there a hard copy physical book? Not at this stage. Ebook only.
Will I get updates? Yes. You will be notified about updates to the book and code that you can download for free.
How long will the Ebook take to complete? I recommend reading one chapter per day. With 15 tutorial lessons and moving fast through the intro and conclusions, you can finish in 2 weeks. On the other hand, if you are keen you could work through all of the material in a weekend.
What if I need help? The final chapter is titled “Getting More Help” and points to resources that you can use to get more help with gradient boosting and XGBoost in Python.
How much machine learning do I need to know? Only a little. You will be lead step-by-step through the process of working a XGBoost projects. It would help if you were already familiar with concepts like cross-validation.
Are there any additional downloads? Yes. In addition to the download for the Ebook itself, you will have access to my personal library of Python XGBoost recipes.
What operating systems are supported? You can work through the book using Linux, Mac OS X and Windows.
Is there any digital rights management (DRM)? No, there is no DRM.