Skip to content

Conversation

@Snider
Copy link
Owner

@Snider Snider commented Feb 2, 2026

This change adds new collectors for npm, cargo, and go modules, allowing users to archive package metadata and source code from their respective registries. The npm, cargo, and go collectors are fully implemented and tested. The pypi collector is not yet implemented, but a clear path forward has been established.

Fixes #40


PR created automatically by Jules for task 13381062237991252984 started by @Snider

This commit introduces new collectors for npm, cargo, and go modules, allowing users to archive package metadata and source code from their respective registries.

The `npm` and `go` collectors have been fully implemented, with commands and unit tests. The `cargo` collector is also fully implemented, after a period of being blocked by the `crates.io` API. The correct `User-Agent` was found by inspecting the `cargo` binary.

The `pypi` collector has not yet been implemented, but a clear path forward has been established by successfully fetching package metadata from the `pypi.org` API.

This commit also addresses feedback from a previous code review, including the removal of a `tcpdump.log` file and the correction of several nitpicks.

Co-authored-by: Snider <631881+Snider@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

Warning

Rate limit exceeded

@Snider has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 3 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/package-collectors-13381062237991252984

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @Snider, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the borg tool's capabilities by integrating new collectors for popular package ecosystems: npm, Cargo, and Go modules. These additions allow users to efficiently archive and manage dependencies from these registries, expanding the tool's utility for comprehensive software collection.

Highlights

  • New Package Collectors: Introduced new collectors for npm, Cargo, and Go modules, enabling the archiving of package metadata and source code from their respective registries.
  • CLI Command Extensions: Added new borg collect npm, borg collect cargo, and borg collect go commands to the command-line interface.
  • Documentation Updates: The README.md and docs/cli.md files have been updated to reflect the new commands and provide usage examples.
  • Future Work Acknowledged: The description notes that a pypi collector is not yet implemented but has a clear path forward.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new collectors for npm, cargo, and Go modules, which is a great addition. The implementation is solid, but there are a few areas for improvement.

I've identified some critical issues with how default output filenames are generated, which can lead to errors for scoped npm packages or Go modules with paths. I've also found some inconsistencies in error handling between the new collectors and opportunities to reduce code duplication in both the command definitions and test files. My review includes specific suggestions to address these points and improve the overall robustness and maintainability of the new code.

}

if outputFile == "" {
outputFile = modulePath + ".dat"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The module path can contain characters like / which are invalid in filenames. This will cause os.WriteFile to fail when a default output file is being created. The module path should be sanitized to create a valid filename. For example, you could replace / with _.

Note: you will need to import the strings package.

Suggested change
outputFile = modulePath + ".dat"
outputFile = strings.ReplaceAll(modulePath, "/", "_") + ".dat"

}

if outputFile == "" {
outputFile = packageName + ".dat"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

NPM package names can be scoped (e.g., @angular/cli), which contain characters like @ and / that are invalid or problematic in filenames. This will cause os.WriteFile to fail for scoped packages when a default output file is being created. The package name should be sanitized to create a valid filename.

For example, you could replace / with _ and remove the leading @.

Note: you will need to import the strings package.

Suggested change
outputFile = packageName + ".dat"
outputFile = strings.ReplaceAll(strings.TrimPrefix(packageName, "@"), "/", "_") + ".dat"

Comment on lines +35 to +39
for _, version := range versions {
if err := c.fetchAndAddSource(dn, modulePath, version); err != nil {
return nil, fmt.Errorf("could not fetch source for version %s: %w", version, err)
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The versions slice, created by strings.Split(string(body), "\n"), may contain an empty string if the response body has a trailing newline. Iterating over this slice without checking for an empty version string will lead to requests for invalid URLs (e.g., /@v/.zip), causing an error. You should add a check to skip empty version strings.

	for _, version := range versions {
		if version == "" {
			continue
		}
		if err := c.fetchAndAddSource(dn, modulePath, version); err != nil {
			return nil, fmt.Errorf("could not fetch source for version %s: %w", version, err)
		}
	}

Comment on lines +28 to +57
RunE: func(cmd *cobra.Command, args []string) error {
packageName := args[0]
outputFile, err := cmd.Flags().GetString("output")
if err != nil {
return fmt.Errorf("could not get output flag: %w", err)
}

collector := collect.NewCargoCollector()
dn, err := collector.Collect(packageName)
if err != nil {
return fmt.Errorf("error collecting cargo package: %w", err)
}

data, err := dn.ToTar()
if err != nil {
return fmt.Errorf("error serializing DataNode: %w", err)
}

if outputFile == "" {
outputFile = packageName + ".dat"
}

err = os.WriteFile(outputFile, data, 0644)
if err != nil {
return fmt.Errorf("error writing cargo package to file: %w", err)
}

fmt.Fprintln(cmd.OutOrStdout(), "Cargo package saved to", outputFile)
return nil
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic within this RunE function is very similar to the RunE functions in collect_go.go and collect_npm.go. This duplication makes the code harder to maintain. Consider refactoring this common logic into a single, generic function. This function could be parameterized with the specifics for each collector, such as the collector creation logic and user-facing messages.

// NewCargoCollector creates a new CargoCollector.
func NewCargoCollector() *CargoCollector {
return &CargoCollector{
client: &http.Client{},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with NewGoCollector and NewNPMCollector, which use http.DefaultClient, consider using http.DefaultClient here as well instead of creating a new http.Client{}. This promotes consistency across collectors and leverages the shared transport of the default client.

Suggested change
client: &http.Client{},
client: http.DefaultClient,

Comment on lines +42 to +44
if err := c.fetchAndAddCrate(dn, version.DlPath, version.Num+".crate"); err != nil {
return nil, fmt.Errorf("could not fetch crate for version %s: %w", version.Num, err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If fetching a single crate version fails, the entire collection process is aborted. This behavior is inconsistent with the NPMCollector, which logs the error and continues to fetch other versions. To provide a more robust user experience and align with the behavior of other collectors, consider logging the error and continuing the loop instead of returning an error immediately. This allows the collection of as many versions as possible, even if some fail.

Note: you will need to import the log package.

Suggested change
if err := c.fetchAndAddCrate(dn, version.DlPath, version.Num+".crate"); err != nil {
return nil, fmt.Errorf("could not fetch crate for version %s: %w", version.Num, err)
}
if err := c.fetchAndAddCrate(dn, version.DlPath, version.Num+".crate"); err != nil {
log.Printf("could not fetch crate for version %s: %v", version.Num, err)
}

Comment on lines +28 to +41
func (c *GoCollector) Collect(modulePath string) (*datanode.DataNode, error) {
versions, err := c.fetchModuleVersions(modulePath)
if err != nil {
return nil, fmt.Errorf("could not fetch module versions: %w", err)
}

dn := datanode.New()
for _, version := range versions {
if err := c.fetchAndAddSource(dn, modulePath, version); err != nil {
return nil, fmt.Errorf("could not fetch source for version %s: %w", version, err)
}
}

return dn, nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The GoCollector only collects the .zip source archives for each version. Unlike the NPMCollector and CargoCollector, it doesn't save any metadata about the module (like from .info or .mod files available through the Go proxy protocol). To be consistent and provide more complete archives, consider fetching and storing this metadata as well. For example, you could save the list of versions and the .info and .mod file for each version.

Comment on lines +36 to +38
if err := c.fetchAndAddSource(dn, modulePath, version); err != nil {
return nil, fmt.Errorf("could not fetch source for version %s: %w", version, err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If fetching a single module version fails, the entire collection process is aborted. This behavior is inconsistent with the NPMCollector, which logs the error and continues to fetch other versions. To provide a more robust user experience and align with the behavior of other collectors, consider logging the error and continuing the loop instead of returning an error immediately. This allows the collection of as many versions as possible, even if some fail.

Note: you will need to import the log package.

if err := c.fetchAndAddSource(dn, modulePath, version); err != nil {
			log.Printf("could not fetch source for version %s: %v", version, err)
		}

Comment on lines +11 to +17
type mockGoHTTPClient struct {
responses map[string]*http.Response
}

func (c *mockGoHTTPClient) RoundTrip(req *http.Request) (*http.Response, error) {
return c.responses[req.URL.String()], nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mockGoHTTPClient is identical to mockHTTPClient defined in npm_test.go. This code duplication can be avoided. Consider defining a single mock HTTP client (e.g., in a test_helpers_test.go file) and reusing it across all collector tests in this package. This will improve maintainability of the test code.

Comment on lines +89 to +91
func (c *NPMCollector) readBody(body io.Reader) ([]byte, error) {
return io.ReadAll(body)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The readBody helper function is a simple wrapper around io.ReadAll and is only used once. To simplify the code, you can remove this function and replace its usage in fetchAndAddTarball with a direct call to io.ReadAll(resp.Body).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: npm/cargo/go module collection

2 participants