|
| 1 | +--- |
| 2 | +tags: |
| 3 | + - OMSCS |
| 4 | + - ML |
| 5 | +--- |
| 6 | +# SL04 - Instance Based Learning |
| 7 | + |
| 8 | +- Other Supervised Learning algorithms train a model based on the data then throw the data away. |
| 9 | +- In IBL, we put the data into a database and then lookup data in the database when we need to train on a new datapoint. |
| 10 | + |
| 11 | +- Simple Database |
| 12 | + - Just store the examples, look them up when asked |
| 13 | + - Advantages |
| 14 | + - Reliable / dependable ((X, Y) -> DB, Lookup(X) -> Y) |
| 15 | + - Fast (to "train", there's basically no training) |
| 16 | + - Simple |
| 17 | + - Disadvantages |
| 18 | + - No generalization |
| 19 | + - Overfitting (querying datapoints that had mistakes always yields the same mistake) |
| 20 | + |
| 21 | +## Cost of a House |
| 22 | +- Have a DB of house costs |
| 23 | + - size |
| 24 | + - date sold |
| 25 | + - price of property when sold |
| 26 | + - location |
| 27 | + - zip code |
| 28 | +- Nearest Neighbor |
| 29 | + - Find the nearest existing datapoint, use that cost |
| 30 | + - Falls apart if the unclassified datapoint is too far from a neighbor |
| 31 | +- K Nearest Neighbor (KNN) |
| 32 | + - Take the $K$ nearest existing datapoints |
| 33 | + - Take the average of those $K$ datapoints |
| 34 | + - Can (should?) be a weighted average based on "distance"/"similarity" |
| 35 | + - The weighting function used is a "hyperparameter" of KNN |
| 36 | + - weighting function can be as simple as $1/k$ (unweighted) |
| 37 | + |
| 38 | +## Comparison |
| 39 | + |
| 40 | +![[Pasted image 20250128103204.png]] |
| 41 | + |
| 42 | +- Do all the work upfront? (eager learner) |
| 43 | +- Do all the work on the backend (at query time)? (lazy learner) |
| 44 | +- Combination of approaches? No reason why you can't "cache" the result via a linear regression. |
| 45 | + |
| 46 | +## KNN Example |
| 47 | + |
| 48 | +![[Pasted image 20250128103648.png]] |
| 49 | + |
| 50 | +```python |
| 51 | +import sklearn.neighbors |
| 52 | + |
| 53 | +X = [[1, 6], [2, 4], [3, 7], [6, 8], [7, 1], [8, 4]] |
| 54 | +Y = [7, 8, 13, 44, 50, 68] |
| 55 | +Q = [[4,2]] |
| 56 | + |
| 57 | +# We need to use KNeighborsRegressor. KNeighborsClassifier |
| 58 | +# uses "majority vote" or "weighted majority vote". |
| 59 | +# KNeighborsRegressor returns the "average" or "weighted avg". |
| 60 | + |
| 61 | +sklearn.neighbors\ |
| 62 | + .KNeighborsRegressor(1, metric='euclidean')\ |
| 63 | + .fit(X, Y)\ |
| 64 | + .predict(Q) |
| 65 | +# 8 |
| 66 | + |
| 67 | +sklearn.neighbors\ |
| 68 | + .KNeighborsRegressor(3, metric='euclidean')\ |
| 69 | + .fit(X, Y)\ |
| 70 | + .predict(Q) |
| 71 | +# 42 |
| 72 | + |
| 73 | +sklearn.neighbors\ |
| 74 | + .KNeighborsRegressor(1, metric='manhattan')\ |
| 75 | + .fit(X, Y)\ |
| 76 | + .predict(Q) |
| 77 | +# 8 |
| 78 | + |
| 79 | +sklearn.neighbors\ |
| 80 | + .KNeighborsRegressor(3, metric='manhattan')\ |
| 81 | + .fit(X, Y)\ |
| 82 | + .predict(Q) |
| 83 | +# 23.66666667 |
| 84 | +``` |
| 85 | + |
| 86 | +| $d()$ | $K$ | `sklearn` | Empirical | Notes | |
| 87 | +| --------- | --- | ---------------- | --------- | --------------------------------------- | |
| 88 | +| Euclidean | 1 | $8$ | | | |
| 89 | +| Euclidean | 3 | $42$ | | | |
| 90 | +| Manhattan | 1 | $8$ | 29 | (4,2) is equidistant to (2,4) and (7,1) | |
| 91 | +| Manhattan | 3 | $23 \frac{2}{3}$ | 35.5 | (4,2) is equidistant to (3,7) and (8,4) | |
| 92 | +> Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor `k+1` and `k`, have identical distances but different labels, the results will depend on the ordering of the training data. |
| 93 | +
|
| 94 | +![[Pasted image 20250128110752.png]] |
| 95 | + |
| 96 | +## KNN Bias |
| 97 | +Preference Bias |
| 98 | +- Locality -> Near Points are Similar |
| 99 | +- Smoothness -> Averaging |
| 100 | +- **All features matter equally** |
| 101 | + |
| 102 | +## Curse of Dimensionality |
| 103 | +> As the number of features or dimensions grows, the amount of data we need to generalize accurately grows exponentially. |
| 104 | +
|
| 105 | +Intuition says "let's add more features, that'll help it classify better". In reality, that makes the problem worse if you have insufficient data. |
| 106 | + |
| 107 | +If you have one dimension, the information density of the space is $1/N$, where $N$ is the number of datapoints. If you add another dimension, you need $N^2$ datapoints in order to achieve the same level of information density. If you add a 3rd dimension, you need $N^3$ datapoints to achieve the same level of information density. |
| 108 | + |
| 109 | +![[Pasted image 20250128204050.png]] |
| 110 | + |
| 111 | +Weighting different dimensions differently can help with the curse of dimensionality. |
| 112 | + |
| 113 | +## Other Stuff |
| 114 | +- Distance functions |
| 115 | + - Euclidean |
| 116 | + - Manhattan |
| 117 | + - Hamming |
| 118 | +- Weighted vs Unweighted distances |
| 119 | +- What's the best value for $K$? |
| 120 | +- Weighted vs unweighted average |
| 121 | +- Locally weighted regression |
| 122 | +- Locally weighted linear regression |
| 123 | +- Locally weighted quadratic regression |
| 124 | +- Locally weighted $WHATEVER regression |
| 125 | + |
| 126 | +## Summary |
| 127 | +- lazy vs eager learning |
| 128 | +- knn |
| 129 | +- similarity = distance |
| 130 | +- classification vs regression |
| 131 | +- averaging |
| 132 | +- domain knowledge matters |
0 commit comments