Skip to content

Commit 8ca2658

Browse files
committed
minor tweaks to language for clarity.
1 parent ee7fb66 commit 8ca2658

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

_episodes/05-bagging.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,15 @@ keypoints:
1616

1717
## Bootstrap aggregation ("Bagging")
1818

19-
Bootstrap aggregation, or "Bagging", is another form of ensemble learning where we aim to build a single good model by combining many models together. With AdaBoost, we modified the data to focus on hard to classify observations. We can imagine this as a form of resampling the data for each new tree. For example, say we have three observations: A, B, and C, [A, B, C]. If we correctly classify observations [A, B], but incorrectly classify C, then AdaBoost involves building a new tree that focuses on C. Equivalently, we could say AdaBoost builds a new tree using the dataset [A, B, C, C, C], where we have intentionally repeated observation C 3 times so that the algorithm thinks it is 3 times as important as the other observations. Makes sense?
19+
Bootstrap aggregation, or "Bagging", is another form of ensemble learning.
2020

21-
Bagging involves the same approach, except we don't selectively choose which observations to focus on, but rather we randomly select subsets of data each time. As you can see, while this is a similar process to AdaBoost, the concept is quite different. Whereas before we aimed to iteratively improve our overall model with new trees, we now build trees on what we hope are independent datasets.
21+
With boosting, we iteratively changed the dataset to have new trees focus on the "difficult" observations. Bagging involves the same approach, except we don't selectively choose which observations to focus on, but rather we randomly select subsets of data each time.
2222

23-
Let's take a step back, and think about a practical example. Say we wanted a good model of heart disease. If we saw researchers build a model from a dataset of patients from their hospital, we would be happy. If they then acquired a new dataset from new patients, and built a new model, we'd be inclined to feel that the combination of the two models would be better than any one individually. This exact scenario is what bagging aims to replicate, except instead of actually going out and collecting new datasets, we instead use bootstrapping to create new sets of data from our current dataset. If you are unfamiliar with bootstrapping, you can treat it as "magic" for now (and if you are familiar with the bootstrap, you already know that it is magic).
23+
Boosting aimed to iteratively improve our overall model with new trees. With bagging, we now build trees on what we hope are independent datasets.
24+
25+
Let's take a step back, and think about a practical example. Say we wanted a good model of heart disease. If we saw researchers build a model from a dataset of patients from their hospital, we might think this would be sufficient. If the researchers were able to acquire a new dataset from new patients, and built a new model, we'd be inclined to feel that the combination of the two models would be better than any one individually.
26+
27+
This is the scenario that bagging aims to replicate, except instead of actually going out and collecting new datasets, we instead use "bootstrapping" to create new sets of data from our current dataset. If you are unfamiliar with bootstrapping, you can treat it as magic for now (and if you are familiar with the bootstrap, you already know that it is magic).
2428

2529
Let's take a look at a simple bootstrap model.
2630

@@ -39,7 +43,7 @@ for i, estimator in enumerate(mdl.estimators_):
3943

4044
![](../fig/section5-fig1.png){: width="900px"}
4145

42-
We can see that each individual tree is quite variable. This is a result of using a random set of data to train the classifier.
46+
We can see that each individual tree varies considerably. This is a result of using a random set of data to train the classifier.
4347

4448
```python
4549
# plot the final prediction

0 commit comments

Comments
 (0)