Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,22 +229,24 @@ plt.close()
```

Your results should look like the following:
In these plots score is conveyed by the color of the point.

**LoOP Scores without Clustering**
![LoOP Scores without Clustering](https://github.com/vc1492a/PyNomaly/blob/main/images/scores.png)

**LoOP Scores with Clustering**
![LoOP Scores with Clustering](https://github.com/vc1492a/PyNomaly/blob/main/images/scores_clust.png)

-
**DBSCAN Cluster Assignments**
![DBSCAN Cluster Assignments](https://github.com/vc1492a/PyNomaly/blob/main/images/cluster_assignments.png)

-

Note the differences between using LocalOutlierProbability with and without clustering. In the example without clustering, samples are
scored according to the distribution of the entire data set. In the example with clustering, each sample is scored
according to the distribution of each cluster. Which approach is suitable depends on the use case.

**NOTE**: Data was not normalized in this example, but it's probably a good idea to do so in practice.
- Why?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is to note give extra weight to a particular column of data. Data normalization ensures that all features in a dataset are on a similar scale, preventing features with larger values from disproportionately influencing algorithms and improving model performance.


## Using Numpy

Expand All @@ -264,6 +266,7 @@ scores = loop.LocalOutlierProbability(data, n_neighbors=3).fit().local_outlier_p
print(scores)

```
-- I'll insert a table here
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good ✅


The shape of the input array shape corresponds to the rows (observations) and columns (features) in the data:

Expand All @@ -279,7 +282,7 @@ data = np.random.rand(100, 5)
scores = loop.LocalOutlierProbability(data).fit().local_outlier_probabilities
print(scores)
```

-- I'll insert a table of the scores here
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good ✅

## Specifying a Distance Matrix

PyNomaly provides the ability to specify a distance matrix so that any
Expand Down Expand Up @@ -317,6 +320,8 @@ distances = np.delete(distances, 0, 1)
m = loop.LocalOutlierProbability(distance_matrix=d, neighbor_matrix=idx, n_neighbors=n_neighbors+1).fit()
scores = m.local_outlier_probabilities
```
-- insert a table of the results
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full table of the values may be too large, but showing a truncated view would be helpful.

What are the results telling us

The below visualization shows the results by a few known distance metrics:

Expand Down Expand Up @@ -375,7 +380,7 @@ print(rmse)
```

The root mean squared error (RMSE) between the two approaches is approximately 0.199 (your scores will vary depending on the data and specification).
The plot below shows the scores from the stream approach.
The plot below shows the scores from the stream approach as a colormap on the figures.

```python
fig = plt.figure(figsize=(7, 7))
Expand Down