-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Add collectors for npm, cargo, and go modules #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces new collectors for npm, cargo, and go modules, allowing users to archive package metadata and source code from their respective registries. The `npm` and `go` collectors have been fully implemented, with commands and unit tests. The `cargo` collector is also fully implemented, after a period of being blocked by the `crates.io` API. The correct `User-Agent` was found by inspecting the `cargo` binary. The `pypi` collector has not yet been implemented, but a clear path forward has been established by successfully fetching package metadata from the `pypi.org` API. This commit also addresses feedback from a previous code review, including the removal of a `tcpdump.log` file and the correction of several nitpicks. Co-authored-by: Snider <631881+Snider@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Snider, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces new collectors for npm, cargo, and Go modules, which is a great addition. The implementation is solid, but there are a few areas for improvement.
I've identified some critical issues with how default output filenames are generated, which can lead to errors for scoped npm packages or Go modules with paths. I've also found some inconsistencies in error handling between the new collectors and opportunities to reduce code duplication in both the command definitions and test files. My review includes specific suggestions to address these points and improve the overall robustness and maintainability of the new code.
| } | ||
|
|
||
| if outputFile == "" { | ||
| outputFile = modulePath + ".dat" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module path can contain characters like / which are invalid in filenames. This will cause os.WriteFile to fail when a default output file is being created. The module path should be sanitized to create a valid filename. For example, you could replace / with _.
Note: you will need to import the strings package.
| outputFile = modulePath + ".dat" | |
| outputFile = strings.ReplaceAll(modulePath, "/", "_") + ".dat" |
| } | ||
|
|
||
| if outputFile == "" { | ||
| outputFile = packageName + ".dat" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPM package names can be scoped (e.g., @angular/cli), which contain characters like @ and / that are invalid or problematic in filenames. This will cause os.WriteFile to fail for scoped packages when a default output file is being created. The package name should be sanitized to create a valid filename.
For example, you could replace / with _ and remove the leading @.
Note: you will need to import the strings package.
| outputFile = packageName + ".dat" | |
| outputFile = strings.ReplaceAll(strings.TrimPrefix(packageName, "@"), "/", "_") + ".dat" |
| for _, version := range versions { | ||
| if err := c.fetchAndAddSource(dn, modulePath, version); err != nil { | ||
| return nil, fmt.Errorf("could not fetch source for version %s: %w", version, err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The versions slice, created by strings.Split(string(body), "\n"), may contain an empty string if the response body has a trailing newline. Iterating over this slice without checking for an empty version string will lead to requests for invalid URLs (e.g., /@v/.zip), causing an error. You should add a check to skip empty version strings.
for _, version := range versions {
if version == "" {
continue
}
if err := c.fetchAndAddSource(dn, modulePath, version); err != nil {
return nil, fmt.Errorf("could not fetch source for version %s: %w", version, err)
}
}| RunE: func(cmd *cobra.Command, args []string) error { | ||
| packageName := args[0] | ||
| outputFile, err := cmd.Flags().GetString("output") | ||
| if err != nil { | ||
| return fmt.Errorf("could not get output flag: %w", err) | ||
| } | ||
|
|
||
| collector := collect.NewCargoCollector() | ||
| dn, err := collector.Collect(packageName) | ||
| if err != nil { | ||
| return fmt.Errorf("error collecting cargo package: %w", err) | ||
| } | ||
|
|
||
| data, err := dn.ToTar() | ||
| if err != nil { | ||
| return fmt.Errorf("error serializing DataNode: %w", err) | ||
| } | ||
|
|
||
| if outputFile == "" { | ||
| outputFile = packageName + ".dat" | ||
| } | ||
|
|
||
| err = os.WriteFile(outputFile, data, 0644) | ||
| if err != nil { | ||
| return fmt.Errorf("error writing cargo package to file: %w", err) | ||
| } | ||
|
|
||
| fmt.Fprintln(cmd.OutOrStdout(), "Cargo package saved to", outputFile) | ||
| return nil | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic within this RunE function is very similar to the RunE functions in collect_go.go and collect_npm.go. This duplication makes the code harder to maintain. Consider refactoring this common logic into a single, generic function. This function could be parameterized with the specifics for each collector, such as the collector creation logic and user-facing messages.
| // NewCargoCollector creates a new CargoCollector. | ||
| func NewCargoCollector() *CargoCollector { | ||
| return &CargoCollector{ | ||
| client: &http.Client{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with NewGoCollector and NewNPMCollector, which use http.DefaultClient, consider using http.DefaultClient here as well instead of creating a new http.Client{}. This promotes consistency across collectors and leverages the shared transport of the default client.
| client: &http.Client{}, | |
| client: http.DefaultClient, |
| if err := c.fetchAndAddCrate(dn, version.DlPath, version.Num+".crate"); err != nil { | ||
| return nil, fmt.Errorf("could not fetch crate for version %s: %w", version.Num, err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If fetching a single crate version fails, the entire collection process is aborted. This behavior is inconsistent with the NPMCollector, which logs the error and continues to fetch other versions. To provide a more robust user experience and align with the behavior of other collectors, consider logging the error and continuing the loop instead of returning an error immediately. This allows the collection of as many versions as possible, even if some fail.
Note: you will need to import the log package.
| if err := c.fetchAndAddCrate(dn, version.DlPath, version.Num+".crate"); err != nil { | |
| return nil, fmt.Errorf("could not fetch crate for version %s: %w", version.Num, err) | |
| } | |
| if err := c.fetchAndAddCrate(dn, version.DlPath, version.Num+".crate"); err != nil { | |
| log.Printf("could not fetch crate for version %s: %v", version.Num, err) | |
| } |
| func (c *GoCollector) Collect(modulePath string) (*datanode.DataNode, error) { | ||
| versions, err := c.fetchModuleVersions(modulePath) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("could not fetch module versions: %w", err) | ||
| } | ||
|
|
||
| dn := datanode.New() | ||
| for _, version := range versions { | ||
| if err := c.fetchAndAddSource(dn, modulePath, version); err != nil { | ||
| return nil, fmt.Errorf("could not fetch source for version %s: %w", version, err) | ||
| } | ||
| } | ||
|
|
||
| return dn, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GoCollector only collects the .zip source archives for each version. Unlike the NPMCollector and CargoCollector, it doesn't save any metadata about the module (like from .info or .mod files available through the Go proxy protocol). To be consistent and provide more complete archives, consider fetching and storing this metadata as well. For example, you could save the list of versions and the .info and .mod file for each version.
| if err := c.fetchAndAddSource(dn, modulePath, version); err != nil { | ||
| return nil, fmt.Errorf("could not fetch source for version %s: %w", version, err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If fetching a single module version fails, the entire collection process is aborted. This behavior is inconsistent with the NPMCollector, which logs the error and continues to fetch other versions. To provide a more robust user experience and align with the behavior of other collectors, consider logging the error and continuing the loop instead of returning an error immediately. This allows the collection of as many versions as possible, even if some fail.
Note: you will need to import the log package.
if err := c.fetchAndAddSource(dn, modulePath, version); err != nil {
log.Printf("could not fetch source for version %s: %v", version, err)
}| type mockGoHTTPClient struct { | ||
| responses map[string]*http.Response | ||
| } | ||
|
|
||
| func (c *mockGoHTTPClient) RoundTrip(req *http.Request) (*http.Response, error) { | ||
| return c.responses[req.URL.String()], nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mockGoHTTPClient is identical to mockHTTPClient defined in npm_test.go. This code duplication can be avoided. Consider defining a single mock HTTP client (e.g., in a test_helpers_test.go file) and reusing it across all collector tests in this package. This will improve maintainability of the test code.
| func (c *NPMCollector) readBody(body io.Reader) ([]byte, error) { | ||
| return io.ReadAll(body) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change adds new collectors for npm, cargo, and go modules, allowing users to archive package metadata and source code from their respective registries. The
npm,cargo, andgocollectors are fully implemented and tested. Thepypicollector is not yet implemented, but a clear path forward has been established.Fixes #40
PR created automatically by Jules for task 13381062237991252984 started by @Snider