-
Notifications
You must be signed in to change notification settings - Fork 172
copy languages package from Sourcegraph to Zoekt #979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
copy languages package from Sourcegraph to Zoekt #979
Conversation
varungandhi-src
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one minor comment.
index/shard_builder.go
Outdated
| langs := languages.GetLanguagesFromContent(doc.Name, nil) | ||
| if len(langs) > 0 { | ||
| doc.Language = langs[0] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify the logic a bit be something like:
var content []byte
if doc.SkipReason == SkipReasonNone {
content = doc.Content
}
langs := languages.GetLanguagesFromContent(doc.Name, content)
if len(langs) > 0 {
doc.Language = langs[0]
}
|
One minor request: can you add a package docstring. Given this is now an exported api seems useful to ensure everything looks good via |
262c82b to
2dc96a0
Compare
* bump go-ctags for handling non-fatal startup errors (sourcegraph#972) Currently if you switch to universal-ctags 6.2.0 it will log a few warnings on startup. The old version of go-ctags would error out. Now instead it will log to its info log. Test Plan: Added the problematic file in the linked issue to ./foo/repo directory. Then ran "go run ./cmd/zoekt-index -require_ctags -index ./foo/index ./foo/repo" with uctags 6.2.0. * switch to github.com/sourcegraph/cody-public-snapshot for e2e test (sourcegraph#974) e2e tests were failing because github.com/sourcegraph/cody repository has been set to private and all tags were removed Test plan: CI * sourcegraph: remove GRPC index methods (sourcegraph#977) This was part of an effort to move the queue from Zoekt to Sourcegraph. However, we are not going to pursue this for now and we can remove the corresponging grpc methods. Sourcegraph never had callers of Index and Delete, so both are save to remove. `DeleteAllData` is still being called from Sourcegraph. Test plan: CI * chore: fix grpc drift (sourcegraph#978) In sourcegraph#962 a new field metadata was added to zoekt.Repository, but it was not added to the grpc types and converter methods, which is why the fuzz test occasionally fails. Test plan: CI * copy languages package from Sourcegraph to Zoekt (sourcegraph#979) We want Zoekt and Sourcegraph to use the same language package. In this PR we move the languages package from Sourcegraph to Zoekt, so that Zoekt can use it and Sourcegraph can import it. Notes: - Zoekt doesn't need to fetch content async which is why I added a little helper func `GetLanguagesFromContent` to make the call sites in Zoekt less awkward. - Sourcegraph's languages package always classified .cls files as Apex, while Zoekt did a content based check. With this PR we follow Zoekt's approach. Specifically, I removed .cls from `unsupportedByEnryExtensionToNameMap`. I added an additional unit test to cover this case. Test plan: I appended the test cases from the old Zoekt languages packages to the tests I copied over from Sourcegraph * gitindex: move package from internal (sourcegraph#981) Co-authored-by: ARyzhikov <ARyzhikov@nota.tech> * Cache docMatchTree results for Meta conditions (sourcegraph#982) * go mod tidy (sourcegraph#983) * Add support for indexing submodules without repocache (sourcegraph#985) * Fix git.Open() fs argument for .git dir instead of worktree The filesystem argument submitted to git.Open() is of the .git directory, it should be for the worktree, as in go-git this logic is copied from: https://github.com/go-git/go-git/blob/145daf2492dd86b1d7e7787555ebd9837b93bff8/repository.go#L373-L375 One side effect of this bug was that go-git would refuse to fetch repo submodules * Fix missing RepoURLs and LineFragments for result subrepositories RepoURLs for subrepositories are currently filtered out when results are streamed per-repo from shards. * Add submodule indexing without repo cache When not using repo cache, index git submodules recursively into subrepos using the go-git API * Adjust subrepo naming conventions - Revert repo-cache behavior to unchanged - Add support for naming repos and modules with no remote (use the dirname for the former and subrepopath for the latter) - Add test for submodule no-repo-cache behavior * Fix overallocation in repoURL and lineFragment filtering * Revert basename treatment from 08f6759 Missed the fact that it is already done in zoekt-git-index command, the test was improperly configured https://github.com/sourcegraph/zoekt/blob/4e4a529c3b63c7d4c7897ba736f1cd52cc163134/cmd/zoekt-git-index/main.go#L93-L100 --------- Co-authored-by: Keegan Carruthers-Smith <keegan.csmith@gmail.com> Co-authored-by: Stefan Hengl <stefan@sourcegraph.com> Co-authored-by: Aleksander Ryzhickov <makseljoinb@gmail.com> Co-authored-by: ARyzhikov <ARyzhikov@nota.tech> Co-authored-by: John Mason <john@johnmason.io> Co-authored-by: Phillip Dykman <Phil.d324@gmail.com>
We want Zoekt and Sourcegraph to use the same language package. In this PR we move the languages package from Sourcegraph to Zoekt, so that Zoekt can use it and Sourcegraph can import it.
Notes:
GetLanguagesFromContentto make the call sites in Zoekt less awkward.unsupportedByEnryExtensionToNameMap. I added an additional unit test to cover this case.Test plan:
I appended the test cases from the old Zoekt languages packages to the tests I copied over from Sourcegraph