You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/03-variance.md
+31-12Lines changed: 31 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,15 +43,25 @@ Image(graph.create_png())
43
43
44
44
## Overfitting
45
45
46
-
Looking at the tree, we can see that there are some very specific rules. Consider a patient aged 45 years with an acute physiology score of 100. From the top of the tree, we would work our way down:
47
-
48
-
- acutePhysiologyScore <= 78.5? No.
49
-
- acutePhysiologyScore <= 104.5? Yes.
50
-
- age <= 76.5? Yes
51
-
- age <= 55.5. Yes.
52
-
- acutePhysiologyScore <= 96.5? No.
53
-
54
-
This leads us to our single node with a gini impurity of 0. Having an entire rule based upon this one observation seems silly, but it is perfectly logical at the moment. The only objective the algorithm cares about is minimizing the gini impurity.
46
+
Looking at the tree, we can see that there are some very specific rules.
47
+
48
+
> ## Question
49
+
> a) Consider a patient aged 45 years with an acute physiology score of 100. Using the image of the tree, work through the nodes until your can make a prediction. What outcome does your model predict?
50
+
> b) What is the gini impurity of the final node, and why?
51
+
> c) Does the decision that led to this final node seem sensible to you? Why?
52
+
> > ## Answer
53
+
> > a) From the top of the tree, we would work our way down:
54
+
> >
55
+
> > - acutePhysiologyScore <= 78.5? No.
56
+
> > - acutePhysiologyScore <= 104.5? Yes.
57
+
> > - age <= 76.5? Yes
58
+
> > - age <= 55.5. Yes.
59
+
> > - acutePhysiologyScore <= 96.5? No.
60
+
> >
61
+
> > b) This leads us to our single node with a gini impurity of 0. The node contains a single class (i.e. it is completely "pure".).
62
+
> > c) Having an entire rule based upon this one observation seems silly, but it is perfectly logical at the moment. The only objective the algorithm cares about is minimizing the gini impurity.
63
+
> {: .solution}
64
+
{: .challenge}
55
65
56
66
Overfitting is a problem that occurs when our algorithm is too closely aligned to our training data. The result is that the model may not generalise well to "unseen" data, such as observations for new patients entering a critical care unit. This is where "pruning" comes in.
57
67
@@ -61,7 +71,7 @@ Let's prune the model and look again.
{: width="900px"}
114
124
115
-
Above we can see that we are using random subsets of data, and as a result, our decision boundary can change quite a bit. As you could guess, we actually don't want a model that randomly works well and randomly works poorly, so you may wonder why this is useful.
125
+
Above we can see that we are using random subsets of data, and as a result, our decision boundary can change quite a bit. As you could guess, we actually don't want a model that randomly works well and randomly works poorly.
116
126
117
-
The trick is that by combining many of instances of "high variance" classifiers (decision trees), we can end up with a single classifier with low variance. There is an old joke: two farmers and a statistician go hunting. They see a deer: the first farmer shoots, and misses to the left. The next farmer shoots, and misses to the right. The statistician yells "We got it!!".
127
+
There is an old joke: two farmers and a statistician go hunting. They see a deer: the first farmer shoots, and misses to the left. The next farmer shoots, and misses to the right. The statistician yells "We got it!!".
118
128
119
129
While it doesn't quite hold in real life, it turns out that this principle does hold for decision trees. Combining them in the right way ends up building powerful models.
120
130
131
+
> ## Question
132
+
> a) Why are decision trees considered have high variance?
133
+
> b) An "ensemble" is the name used for a machine learning model that aggregates the decisions of multiple sub-models. Why might creating ensembles of decision trees be a good idea?
134
+
> > ## Answer
135
+
> > a) Minor changes in the data used to train decision trees can lead to very different model performance.
136
+
> > b) By combining many of instances of "high variance" classifiers (decision trees), we can end up with a single classifier with low variance.
0 commit comments