-
Notifications
You must be signed in to change notification settings - Fork 76
Knn and nystrom #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
huddyyeo
wants to merge
116
commits into
getkeops:main
Choose a base branch
from
huddyyeo:knn_and_convos
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Knn and nystrom #143
Changes from all commits
Commits
Show all changes
116 commits
Select commit
Hold shift + click to select a range
0138a99
Create Nystrom.py
huddyyeo d73a55b
adding code and unit test for nystrom
hl-anna ef95f4c
added ivf_np tests
Gantrithor-AI 4c4cf47
added tests for ivf
Gantrithor-AI dad6759
added ivf numpy tests
Gantrithor-AI de88326
added tests for ivf_pytorch
Gantrithor-AI 21207c5
final edit
huddyyeo 9f3b308
add empty init files
huddyyeo bbf9876
make lint happy
huddyyeo b685143
changed default use_gpu setting to false
huddyyeo 9da09b9
added unit tests for nystrom
hl-anna f0da4b1
linter
huddyyeo 8c6124b
removing sklearn function
hl-anna e7e980c
applied black linting
hl-anna e4be739
applied black linting
hl-anna 73a58c3
minor changes and black linting
hl-anna 70c6718
changing maximum -> max for older torch
hl-anna fab4ae3
updated exp kernel
hl-anna 50aa46d
updated exp kernel
hl-anna 17cd242
add IVF superclass
huddyyeo d541aeb
typo correction
huddyyeo e311bab
Revert "typo correction"
huddyyeo fc60335
changing tests
huddyyeo 3686d61
import utils correctly
huddyyeo 63d1782
add lazytensor import to base ivf class
huddyyeo 86c2380
black
huddyyeo 8765257
add clustering functions as input
huddyyeo 838f68a
added unused device to np utils zeros
huddyyeo 96536b3
updated utils
hl-anna f627cf8
added utils
hl-anna 14774a1
added numpy utils
hl-anna 5e8b39e
testing rearranging np utils
huddyyeo 6e80436
added numpy utils
hl-anna 32a3fe7
remove 1 space
huddyyeo e83034e
Merge branch 'knn_and_convos' of https://github.com/huddyyeo/keops in…
hl-anna 83d2e64
removed LazyTensor from utils
hl-anna 5c00d65
update to add kmeans optimisation approximation
huddyyeo 73a6b5a
Merge branch 'knn_and_convos' of https://github.com/huddyyeo/keops in…
huddyyeo 46ba1fc
changing kmeans inputs
huddyyeo 9ea0af6
typo
huddyyeo 0337e7f
edit spacing to match
huddyyeo 8fcdade
change tab to space
huddyyeo b6a4c68
add dummy inputs to np kmeans
huddyyeo ad342f4
remove normalising in kmeans
huddyyeo 8d23e6c
update var name
huddyyeo 3fa2782
correction
huddyyeo 9429712
change angular to negative dot product
huddyyeo 085f071
add import ivf to init files
huddyyeo 785b038
trying to resolve merge conflict
huddyyeo cc5c84d
moving around code
huddyyeo 17435c8
rearrange torch init
huddyyeo 765279b
removing space
huddyyeo a4e6c9b
Revert "removing space"
huddyyeo 4bea5b2
add space
huddyyeo 41ddcba
moving code around
huddyyeo c1a79f0
running black
huddyyeo 03952a7
test
huddyyeo 9de2b85
changed import structure
huddyyeo ac33af7
changed import structure again
huddyyeo 8e1b404
adding angular full metric
huddyyeo 020d468
added angular, manhattan metrics to numpy test
Gantrithor-AI 2a929a5
added metrics to torch unit test (ivf)
Gantrithor-AI d407a62
calc angular distances without torch.linalg - test
Gantrithor-AI fb6b5fb
delete normalise
huddyyeo f3f57a5
Merge branch 'knn_and_convos' of https://github.com/huddyyeo/keops in…
huddyyeo 7a6dcc7
black
huddyyeo d8deff2
add docstrings + NND
huddyyeo 73ff718
black
huddyyeo d4b2ca3
add imports for NND
huddyyeo 403be68
fixed euclidean typo
Gantrithor-AI 6b7e77a
typo + changed leaf_multiplier default
Gantrithor-AI 330645a
Add files via upload
Gantrithor-AI 66e500c
Add files via upload
Gantrithor-AI df92907
add ivf torch tut
huddyyeo a1a3b0f
rearranging code to avoid conflict
huddyyeo 9ccc927
add np tutorial for ivf
huddyyeo 6df8737
Merge branch 'master' into knn_and_convos
huddyyeo 59a7deb
add spaces
huddyyeo d3cf556
adding back new code for knn benchmark
huddyyeo 47da991
NNDescent version with clusters
Gantrithor-AI 58a2098
Merge branch 'master' into knn_and_convos
jeanfeydy a495a26
requested edits 1
huddyyeo af1d29b
edit tests to reflect correct import structure
huddyyeo 3c9d184
full stops on generic ivf class
huddyyeo dd0d702
change doc strings for parent classes
huddyyeo 6982e8a
change numpy ivf approximation error message
huddyyeo 8fc2640
update utils to add comments
huddyyeo 301e807
nn descent code update
huddyyeo 1ec3c02
updated as per jean's comments
Gantrithor-AI 95378e9
black
huddyyeo 8307208
updated tutorials
huddyyeo 4a73bfd
added nystrom scripts and unit tests
hl-anna 13d52f7
updated imports in unit tests
hl-anna c2b12d6
updated imports and added note to kmeans
hl-anna 3d204ba
changed torch init
huddyyeo 5d7f1bb
changed capitalisation
huddyyeo 4621fd0
updated nystrom
huddyyeo b6962bc
testing updated import structure
huddyyeo d30e00f
moved imports around again
huddyyeo 129d216
Update plot_nnd_torch.py
Gantrithor-AI ddbdefe
Update utils.py
Gantrithor-AI 02f0fee
Create utils.py
Gantrithor-AI 7c793e8
Update plot_nnd_torch.py
Gantrithor-AI 48959da
Update plot_ivf_torch.py
Gantrithor-AI b1c3095
Update plot_nnd_torch.py
Gantrithor-AI f3ccd59
reorganised accuracy computations
huddyyeo 5390fcc
added in updates for Nystroem
hl-anna 1e23100
Delete nystrom_generic.py
hl-anna ce55836
Delete nystrom.py
hl-anna e3b2f07
Delete Nystrom.py
hl-anna 0d2eba7
Rename nystroem.py to nystrom.py
hl-anna cd3fb2d
Rename nystroem.py to nystrom.py
hl-anna 8c4e079
Rename nystroem_generic.py to nystrom_generic.py
hl-anna bd4bf43
rename
huddyyeo 8778cbe
shifting import
huddyyeo ce36828
add packages
huddyyeo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,228 @@ | ||
| class GenericIVF: | ||
| """Abstract class to compute IVF functions. | ||
|
|
||
| End-users should use 'pykeops.numpy.ivf' or 'pykeops.torch.ivf'. | ||
|
|
||
| """ | ||
|
|
||
| def __init__(self, k, metric, normalise, lazytensor): | ||
|
|
||
| self.__k = k | ||
| self.__normalise = normalise | ||
| self.__update_metric(metric) | ||
| self.__LazyTensor = lazytensor | ||
| self.__c = None | ||
|
|
||
| def __update_metric(self, metric): | ||
| """ | ||
| Update the metric used in the class. | ||
| """ | ||
| if isinstance(metric, str): | ||
| self.__distance = self.tools.distance_function(metric) | ||
| self.__metric = metric | ||
| elif callable(metric): | ||
| self.__distance = metric | ||
| self.__metric = "custom" | ||
| else: | ||
| raise ValueError( | ||
| f"The 'metric' argument has type {type(metric)}, but only strings and functions are supported." | ||
| ) | ||
|
|
||
| @property | ||
| def metric(self): | ||
| """Returns the metric used in the search.""" | ||
| return self.__metric | ||
|
|
||
| @property | ||
| def clusters(self): | ||
| """Returns the clusters obtained through K-Means.""" | ||
| if self.__c is not None: | ||
| return self.__c | ||
| else: | ||
| raise NotImplementedError("Run .fit() first!") | ||
|
|
||
| def __get_tools(self): | ||
| pass | ||
huddyyeo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| def __k_argmin(self, x, y, k=1): | ||
huddyyeo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """ | ||
| Compute the k nearest neighbors between x and y, for various k. | ||
| """ | ||
| x_i = self.__LazyTensor( | ||
| self.tools.to(self.tools.unsqueeze(x, 1), self.__device) | ||
| ) | ||
| y_j = self.__LazyTensor( | ||
| self.tools.to(self.tools.unsqueeze(y, 0), self.__device) | ||
| ) | ||
|
|
||
| D_ij = self.__distance(x_i, y_j) | ||
| if not self.tools.is_tensor(x): | ||
| if self.__backend: | ||
| D_ij.backend = self.__backend | ||
|
|
||
| if k == 1: | ||
| return self.tools.view(self.tools.long(D_ij.argmin(dim=1)), -1) | ||
| else: | ||
| return self.tools.long(D_ij.argKmin(K=k, dim=1)) | ||
|
|
||
| def __sort_clusters(self, x, lab, store_x=True): | ||
huddyyeo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """ | ||
| Takes in a dataset and sorts according to its labels. | ||
|
|
||
| Args: | ||
| x ((N, D) array): Input dataset of N points in dimension D. | ||
| lab ((N) array): Labels for each point in x. | ||
| store_x (bool): Store the sort permutations for use later. | ||
| """ | ||
| lab, perm = self.tools.sort(self.tools.view(lab, -1)) | ||
| if store_x: | ||
| self.__x_perm = perm | ||
| else: | ||
| self.__y_perm = perm | ||
| return x[perm], lab | ||
|
|
||
| def __unsort(self, indices): | ||
| """ | ||
| Given an input indices, undo and prior sorting operations. | ||
| First, select the true x indices with __x_perm[indices]. | ||
| Then, use index_select to choose the indices in true x, for each true y. | ||
| """ | ||
| return self.tools.index_select( | ||
| self.__x_perm[indices], 0, self.__y_perm.argsort() | ||
| ) | ||
|
|
||
| def _fit( | ||
| self, | ||
| x, | ||
| clusters=50, | ||
| a=5, | ||
| Niter=15, | ||
| device=None, | ||
| backend=None, | ||
| approx=False, | ||
| n=50, | ||
| ): | ||
| """ | ||
| Fits the main dataset | ||
| """ | ||
|
|
||
| # basic checks that the hyperparameters are as expected | ||
| if type(clusters) != int: | ||
| raise ValueError("Clusters must be an integer") | ||
| if clusters >= len(x): | ||
| raise ValueError("Number of clusters must be less than length of dataset") | ||
| if type(a) != int: | ||
| raise ValueError("Number of clusters to search over must be an integer") | ||
| if a > clusters: | ||
| raise ValueError( | ||
| "Number of clusters to search over must be less than total number of clusters" | ||
| ) | ||
| if len(x.shape) != 2: | ||
| raise ValueError("Input must be a 2D array") | ||
| # normalise the input if selected | ||
| if self.__normalise: | ||
| x = x / self.tools.repeat(self.tools.norm(x, 2, -1), x.shape[1]).reshape( | ||
| -1, x.shape[1] | ||
| ) | ||
|
|
||
| # if we want to use the approximation in Kmeans, and our metric is angular, switch to full angular metric | ||
| if approx and self.__metric == "angular": | ||
| self.__update_metric("angular_full") | ||
|
|
||
| x = self.tools.contiguous(x) | ||
| self.__device = device | ||
| self.__backend = backend | ||
|
|
||
| # perform K-Means | ||
| cl, c = self.tools.kmeans( | ||
| x, | ||
| self.__distance, | ||
| clusters, | ||
| Niter=Niter, | ||
| device=self.__device, | ||
| approx=approx, | ||
| n=n, | ||
| ) | ||
|
|
||
| self.__c = c | ||
| # perform one final cluster assignment, since K-Means ends on cluster update step | ||
| cl = self.__assign(x) | ||
|
|
||
| # obtain the nearest clusters to each cluster | ||
| ncl = self.__k_argmin(c, c, k=a) | ||
| self.__x_ranges, _, _ = self.tools.cluster_ranges_centroids(x, cl) | ||
|
|
||
| x, x_labels = self.__sort_clusters(x, cl, store_x=True) | ||
| self.__x = x | ||
| r = self.tools.repeat(self.tools.arange(clusters, device=self.__device), a) | ||
| # create a [clusters, clusters] sized boolean matrix | ||
| self.__keep = self.tools.to( | ||
| self.tools.zeros([clusters, clusters], dtype=bool), self.__device | ||
| ) | ||
| # set the indices of the nearest clusters to each cluster to True | ||
| self.__keep[r, ncl.flatten()] = True | ||
|
|
||
| return self | ||
|
|
||
| def __assign(self, x, c=None): | ||
| """ | ||
| Assigns nearest clusters to a dataset. | ||
| If no clusters are given, uses the clusters found through K-Means. | ||
|
|
||
| Args: | ||
| x ((N, D) array): Input dataset of N points in dimension D. | ||
| c ((M, D) array): Cluster locations of M points in dimension D. | ||
| """ | ||
| if c is None: | ||
| c = self.__c | ||
| return self.__k_argmin(x, c) | ||
|
|
||
| def _kneighbors(self, y): | ||
| """ | ||
| Obtain the k nearest neighbors of the query dataset y. | ||
| """ | ||
| if self.__x is None: | ||
| raise ValueError("Input dataset not fitted yet! Call .fit() first!") | ||
| if self.__device and self.tools.device(y) != self.__device: | ||
| raise ValueError("Input dataset and query dataset must be on same device") | ||
| if len(y.shape) != 2: | ||
| raise ValueError("Query dataset must be a 2D tensor") | ||
| if self.__x.shape[-1] != y.shape[-1]: | ||
| raise ValueError("Query and dataset must have same dimensions") | ||
| if self.__normalise: | ||
| y = y / self.tools.repeat(self.tools.norm(y, 2, -1), y.shape[1]).reshape( | ||
| -1, y.shape[1] | ||
| ) | ||
| y = self.tools.contiguous(y) | ||
| # assign y to the previously found clusters and get labels | ||
| y_labels = self.__assign(y) | ||
|
|
||
| # obtain y_ranges | ||
| y_ranges, _, _ = self.tools.cluster_ranges_centroids(y, y_labels) | ||
| self.__y_ranges = y_ranges | ||
|
|
||
| # sort y contiguous | ||
| y, y_labels = self.__sort_clusters(y, y_labels, store_x=False) | ||
|
|
||
| # perform actual knn computation | ||
| x_i = self.__LazyTensor(self.tools.unsqueeze(self.__x, 0)) | ||
| y_j = self.__LazyTensor(self.tools.unsqueeze(y, 1)) | ||
| D_ij = self.__distance(y_j, x_i) | ||
| ranges_ij = self.tools.from_matrix(y_ranges, self.__x_ranges, self.__keep) | ||
| D_ij.ranges = ranges_ij | ||
| indices = D_ij.argKmin(K=self.__k, axis=1) | ||
| return self.__unsort(indices) | ||
|
|
||
| def brute_force(self, x, y, k=5): | ||
| """Performs a brute force search with KeOps | ||
|
|
||
| Args: | ||
| x ((N, D) array): Input dataset of N points in dimension D. | ||
| y ((M, D) array): Query dataset of M points in dimension D. | ||
| k (int): Number of nearest neighbors to obtain. | ||
|
|
||
| """ | ||
| x_LT = self.__LazyTensor(self.tools.unsqueeze(x, 0)) | ||
| y_LT = self.__LazyTensor(self.tools.unsqueeze(y, 1)) | ||
| D_ij = self.__distance(y_LT, x_LT) | ||
| return D_ij.argKmin(K=k, axis=1) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.