-
Notifications
You must be signed in to change notification settings - Fork 15
Added DBSCAN algorithm #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ken1000minus7
wants to merge
2
commits into
main
Choose a base branch
from
ken
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| # DBSCAN | ||
|
|
||
| DBSCAN stands for **D**ensity **B**ased **S**patial **C**lustering of **A**pplications with **N**oise | ||
|
|
||
| The model clusters the given training set based on density of the given data points i.e. a point belongs to a cluster based on how close it is to its neighbouring points. This model is capable of finding arbitrary shaped clusters and identifying outliers. | ||
|
|
||
| ## Parameters | ||
|
|
||
| | Name | Definition | Defaults | Type | | ||
| |-------------| ------------------------------------------------------------------------------------------- |----------|---------------| | ||
| | `eps` | Measure of how close a point should be to be considered in the vicinity of another point | 0.5 | `long double` | | ||
| | `minSamples` | Minimum number of points that should lie in the vicinity of a point for it to be considered a core point | 5 | `int` | | ||
|
|
||
| ## Attributes | ||
|
|
||
| | Name | Definition | Shape | | ||
| |--------|------------------------------------------------------------------------------------|-----------------------------------| | ||
| | `labels` | Labels assigned to each data point of the training set fitted into the model | No of data points in training set | | ||
|
|
||
| ## Methods | ||
|
|
||
| | Name | Definition | Return value | | ||
| |--------------------------------------|----------------------------------------|---------------| | ||
| | `fit(std::vector<std::vector<T>> x)` | Fits and clusters the given training set | `void` | | ||
| | `fitPredict(vector<T> x)` | Fits and clusters the given training set and returns the labels assigned to each data point | `vector<int>` | | ||
| | `getLabels()` | Returns the labels assigned to each data point of the training set fitted into the model | `vector<int>` | | ||
|
|
||
| ## Example | ||
|
|
||
| ```cpp | ||
| DBSCAN<double> db(0.6, 4); | ||
| std::vector<std::vector<double>> x = { | ||
| {1, 2}, | ||
| {3, 4}, | ||
| {2.5, 4}, | ||
| {1.5, 2.5}, | ||
| {3, 5}, | ||
| {2.8, 4.5}, | ||
| {2.5, 4.5}, | ||
| {1.2, 2.5}, | ||
| {1, 3}, | ||
| {1, 5}, | ||
| {1, 2.5}, | ||
| {5, 6}, | ||
| {4, 3} | ||
| }; | ||
| std::vector<int> labels = db.fitPredict(x); | ||
| std::cout << "X Y Cluster\n"; | ||
| for(int i = 0; i < x.size(); i++) { | ||
| std::cout << x[i][0] << " " << x[i][1] << " " << labels[i] << "\n"; | ||
| } | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| //#include "../../src/slowmokit/methods/cluster/DBSCAN/DBSCAN.cpp" | ||
| // | ||
| //int main() { | ||
| // DBSCAN<double> db(0.6, 4); | ||
| // std::vector<std::vector<double>> x = { | ||
| // {1, 2}, | ||
| // {3, 4}, | ||
| // {2.5, 4}, | ||
| // {1.5, 2.5}, | ||
| // {3, 5}, | ||
| // {2.8, 4.5}, | ||
| // {2.5, 4.5}, | ||
| // {1.2, 2.5}, | ||
| // {1, 3}, | ||
| // {1, 5}, | ||
| // {1, 2.5}, | ||
| // {5, 6}, | ||
| // {4, 3} | ||
| // }; | ||
| // std::vector<int> labels = db.fitPredict(x); | ||
| // std::cout << "X Y Cluster\n"; | ||
| // for(int i = 0; i < x.size(); i++) { | ||
| // std::cout << x[i][0] << " " << x[i][1] << " " << labels[i] << "\n"; | ||
| // } | ||
| // return 0; | ||
| //} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| /** | ||
| * @file methods/cluster/DBSCAN.hpp | ||
| * | ||
| * Easy include for DBSCAN algorithm | ||
| */ | ||
|
|
||
| #ifndef SLOWMOKIT_DBSCAN_HPP | ||
| #define SLOWMOKIT_DBSCAN_HPP | ||
|
|
||
| #include "DBSCAN/DBSCAN.hpp" | ||
|
|
||
| #endif //SLOWMOKIT_DBSCAN_HPP |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| /** | ||
| * @file methods/neighbors/DBSCAN/DBSCAN.cpp | ||
| * | ||
| * Implementation of the DBSCAN class | ||
| */ | ||
|
|
||
| #include "DBSCAN.hpp" | ||
|
|
||
| template<class T> | ||
| DBSCAN<T>::DBSCAN(long double eps, int minSamples) { | ||
| if(eps < 0 || minSamples < 0) { | ||
| throw std::invalid_argument("Values can't be negative"); | ||
| } | ||
| this->eps = eps; | ||
| this->minSamples = minSamples; | ||
| } | ||
|
|
||
| template<class T> | ||
| long double DBSCAN<T>::euclideanDistance(std::vector<T> p1, std::vector<T> p2) { | ||
| long double distance = 0.0; | ||
| if(p1.size() != p2.size()) { | ||
| throw std::invalid_argument("Feature vectors are unequal in size"); | ||
| } | ||
| int n = p1.size(); | ||
| for(int i = 0; i < n; i++) { | ||
| distance += (long double) (p1[i] - p2[i]) * (p1[i] - p2[i]); | ||
| } | ||
| return sqrtl(distance); | ||
| } | ||
|
|
||
| template<class T> | ||
| void DBSCAN<T>::cluster(int i, std::vector<int> &core, std::vector<std::vector<int>> &neighbours, int &label) { | ||
| if(labels[i] != -1) { | ||
| return; | ||
| } | ||
| labels[i] = label; | ||
| if(core[i] != 0) { | ||
| for(int j : neighbours[i]) { | ||
| cluster(j, core, neighbours, label); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| template<class T> | ||
| void DBSCAN<T>::fit(std::vector<std::vector<T>> x) { | ||
| int n = x.size(); | ||
|
|
||
| std::vector<int> core(n); | ||
| std::vector<std::vector<int>> neighbours(n, std::vector<int>()); | ||
|
|
||
| labels = std::vector<int>(n, -1); | ||
|
|
||
| for(int i = 0; i < n; i++) { | ||
| std::vector<int> neighbourIndices; | ||
| for(int j = 0; j < n; j++) { | ||
| if(i == j) { | ||
| continue; | ||
| } | ||
| if(euclideanDistance(x[i], x[j]) <= eps) { | ||
| neighbourIndices.push_back(j); | ||
| } | ||
| } | ||
| int const samples = neighbourIndices.size(); | ||
| if(samples >= minSamples) { | ||
| core[i]++; | ||
| neighbours[i] = neighbourIndices; | ||
| } | ||
| } | ||
| int clusters = 0; | ||
| for(int i = 0; i < n; i++) { | ||
| if(core[i] == 0 || labels[i] != -1) { | ||
| continue; | ||
| } | ||
| cluster(i, core, neighbours, clusters); | ||
| clusters++; | ||
| } | ||
| } | ||
|
|
||
| template<class T> | ||
| std::vector<int> DBSCAN<T>::fitPredict(std::vector<std::vector<T>> x) { | ||
| fit(x); | ||
| return labels; | ||
| } | ||
|
|
||
| template<class T> | ||
| std::vector<int> DBSCAN<T>::getLabels() { | ||
| return labels; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| /** | ||
| * @file methods/cluster/DBSCAN/DBSCAN.hpp | ||
| * | ||
| * The header file for DBSCAN | ||
| */ | ||
| #ifndef SLOWMOKIT_DBSCAN_HPP | ||
| #define SLOWMOKIT_DBSCAN_HPP | ||
|
|
||
| #include "core.hpp" | ||
| /** | ||
| * Class carrying implementation of the DBSCAN clustering algorithm | ||
| * @tparam T type of the data to be clustered | ||
| */ | ||
| template<class T> | ||
| class DBSCAN | ||
| { | ||
| private: | ||
|
|
||
| /** | ||
| * Measure of how close a point should be to be considered in the vicinity of another point, default value is 0.5 | ||
| */ | ||
| long double eps; | ||
|
|
||
| /** | ||
| * Minimum number of points that should lie in the vicinity of a point for it to be considered a core point, default value is 5 | ||
| */ | ||
| int minSamples; | ||
|
|
||
| /** | ||
| * Labels assigned to each data point after fitting, the values range from 0 to clusters - 1, outliers are assigned -1 | ||
| */ | ||
| std::vector<int> labels; | ||
|
|
||
| /** | ||
| * Evaluates the euclidean distance between two feature vectors | ||
| * @param p1 the first feature vector | ||
| * @param p2 the second feature vector | ||
| * @return the euclidean distance between the two vectors | ||
| * @throws invalid_argument exception when the feature vectors are unequal in size | ||
| */ | ||
| long double euclideanDistance(std::vector<T> p1, std::vector<T> p2); | ||
|
|
||
| /** | ||
| * Helper function for recursively clustering the points using DBSCAN | ||
| * @param i index of the the point that is to be assigned a cluster | ||
| * @param core boolean vector indicating whether a point is a core point or not | ||
| * @param neighbours 2D vector carrying neighbours of each of the core points | ||
| * @param label label of the cluster to be assigned to this point | ||
| */ | ||
| void cluster(int i, std::vector<int> &core, std::vector<std::vector<int>> &neighbours, int &label); | ||
|
|
||
| public: | ||
|
|
||
| /** | ||
| * Constructor for creating an instance of the DBSCAN class | ||
| * @param eps measure of how close a point should be to be considered in the vicinity of another point, default is 0.5 | ||
| * @param minSamples minimum number of points that should lie in the vicinity of a point for it to be considered a core point, default is 5 | ||
| * @throws invalid_argument exception when eps or minSamples is less than 0 | ||
| */ | ||
| DBSCAN(long double = 0.5, int = 5); | ||
|
|
||
| /** | ||
| * Fits and clusters the given training set | ||
| * @param x list of feature vectors to be clustered | ||
| */ | ||
| void fit(std::vector<std::vector<T>>); | ||
|
|
||
| /** | ||
| * Fits and clusters the given training set and returns the labels assigned to each data point | ||
| * @param x list of feature vectors | ||
| * @return vector of labels assigned to each data point | ||
| */ | ||
| std::vector<int> fitPredict(std::vector<std::vector<T>>); | ||
|
|
||
| /** | ||
| * Returns the labels assigned to each data point of the training set fitted into the model | ||
| * @return vector of labels assigned to each data point | ||
| */ | ||
| std::vector<int> getLabels(); | ||
| }; | ||
|
|
||
| #endif //SLOWMOKIT_DBSCAN_HPP |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.