Skip to content

Commit ee7fb66

Browse files
committed
minor tweaks to language for clarity.
1 parent b0fc33c commit ee7fb66

File tree

1 file changed

+18
-11
lines changed

1 file changed

+18
-11
lines changed

_episodes/04-boosting.md

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,16 @@ keypoints:
1515

1616
## Boosting
1717

18-
In the previous episode, we demonstrated that decision trees may have high "variance". Their performance can vary widely given different samples of data.
19-
20-
An algorithm that performs somewhat poorly at a task - such as simple decision tree - is sometimes referred to as a "weak learner".
18+
In the previous episode, we demonstrated that decision trees may have high "variance". Their performance can vary widely given different samples of data. An algorithm that performs somewhat poorly at a task - such as simple decision tree - is sometimes referred to as a "weak learner".
2119

2220
The premise of boosting is the combination of many weak learners to form a single "strong" learner. In a nutshell, boosting involves building a models iteratively. At each step we focus on the data on which we performed poorly.
2321

24-
In our context, we'll use decision trees, so the first step would be to build a tree using the data. Next, we'd look at the data that we misclassified, and re-weight the data so that we really wanted to classify those observations correctly, at a cost of maybe getting some of the other data wrong this time. Let's see how this works in practice.
22+
In our context, the first step is to build a tree using the data. Next, we look at the data that we misclassified, and re-weight the data so that we really wanted to classify those observations correctly, at a cost of maybe getting some of the other data wrong this time. Let's see how this works in practice.
2523

2624
```python
2725
from sklearn import ensemble
2826

29-
# build the model
27+
# build models with a single split
3028
clf = tree.DecisionTreeClassifier(max_depth=1)
3129
mdl = ensemble.AdaBoostClassifier(base_estimator=clf,n_estimators=6)
3230
mdl = mdl.fit(x_train.values, y_train.values)
@@ -41,15 +39,18 @@ for i, estimator in enumerate(mdl.estimators_):
4139

4240
![](../fig/section4-fig1.png){: width="900px"}
4341

44-
Looking at our example above, we can see that the first iteration builds the exact same simple decision tree as we had seen earlier. This makes sense. It is using the entire dataset with no special weighting.
45-
46-
In the next iteration we can see the model shift. It misclassified several observations in class 1, and now these are the most important observations. Consequently, it picks the boundary that, while prioritizing correctly classifies these observations, still tries to best classify the rest of the data too.
42+
> ## Question
43+
> A) Does the first tree in the collection (the one in the top left) look familiar to you? Why?
44+
> > ## Answer
45+
> > A) We have seen the tree before. It is the very first tree that we built, which makes sense: it is using the entire dataset with no special weighting.
46+
> {: .solution}
47+
{: .challenge}
4748

48-
The iteration process continues until the model may be creating boundaries to capture just one or two observations.
49+
In the second tree we can see the model shift. It misclassified several observations in class 1, and now these are the most important observations. Consequently, it picks the boundary that, while prioritizing correctly classifies these observations, still tries to best classify the rest of the data too. The iteration process continues until the model may be creating boundaries to capture just one or two observations.
4950

5051
One important point is that each tree is weighted by its global error. So, for example, Tree 6 would carry less weight in the final model. It is clear that we wouldn't want Tree 6 to carry the same importance as Tree 1, when Tree 1 is doing so much better overall. It turns out that weighting each tree by the inverse of its error is a pretty good way to do this.
5152

52-
Let's look at final model's decision surface.
53+
Let's look at the decision surface of the final ensemble.
5354

5455
```python
5556
# plot the final prediction
@@ -60,7 +61,13 @@ glowyr.plot_model_pred_2d(mdl, x_train, y_train, title=txt)
6061

6162
![Boosted tree](../fig/section4-fig2.png){: width="900px"}
6263

63-
And that's AdaBoost! There are a few tricks we have glossed over here, but you understand the general principle. Now we'll move on to a different approach. With boosting, we iteratively changed the dataset to have new trees focus on the "difficult" observations. The next approach we discuss is similar as it also involves using changed versions of our dataset to build new trees.
64+
And that's AdaBoost! There are a few tricks we have glossed over here, but you understand the general principle. We modified the data to focus on hard to classify observations. We can imagine this as a form of data resampling for each new tree.
65+
66+
For example, say we have three observations: A, B, and C, [A, B, C]. If we correctly classify observations [A, B], but incorrectly classify C, then AdaBoost involves building a new tree that focuses on C.
67+
68+
Equivalently, we could say AdaBoost builds a new tree using the dataset [A, B, C, C, C], where we have intentionally repeated observation C 3 times so that the algorithm thinks it is 3 times as important as the other observations. Makes sense?
69+
70+
Now we'll move on to a different approach that also involves manipulating data to build new trees.
6471

6572
{% include links.md %}
6673

0 commit comments

Comments
 (0)