Skip to content

Failure Reporting and Dead Letter Queue#87

Open
Snider wants to merge 1 commit intomainfrom
feat/failure-reporting-8552583492686157106
Open

Failure Reporting and Dead Letter Queue#87
Snider wants to merge 1 commit intomainfrom
feat/failure-reporting-8552583492686157106

Conversation

@Snider
Copy link
Owner

@Snider Snider commented Feb 2, 2026

This submission implements a new failure reporting and dead letter queue feature. It includes a new failures package, failures and retry commands, and integration with the collect commands. It also includes refactoring of the single-repository cloning logic and implementation of failure categorization and attempt tracking.

Fixes #55


PR created automatically by Jules for task 8552583492686157106 started by @Snider

This change introduces a new failure handling system for collection tasks.

- Created a new package `pkg/failures` to manage failure reporting, including a `Manager` to handle the lifecycle of a failure report, and `Failure` and `FailureReport` structs for storing failure data. The manager creates a `.borg-failures/<timestamp>` directory for each run, containing a `failures.json` report and a `retry.sh` script.
- Added a `borg failures` command with `show` and `clear` subcommands to manage failure reports.
- Added a `borg retry` command to retry failed collections.
- Added `--on-failure` and `--failures-dir` flags to the `collect` command.
- Refactored the `collect github repo` command to make the single-repository cloning logic reusable.
- Updated the `collect github repos` command to use the reusable cloning function and implement failure handling, including the `--on-failure=stop` and `--on-failure=prompt` options.
- Implemented failure categorization to distinguish between retryable and permanent failures.
- Implemented tracking of the number of attempts for each failed item.
- Created a placeholder file for a missing asset to fix the build.

Co-authored-by: Snider <631881+Snider@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

Warning

Rate limit exceeded

@Snider has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 15 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/failure-reporting-8552583492686157106

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @Snider, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's resilience and error handling capabilities by introducing a robust failure reporting and dead letter queue system. It provides users with greater control over how collection failures are managed, offering options to continue, stop, or interactively prompt for retries. The new failures and retry commands empower users to inspect past failures and easily re-execute failed tasks, ultimately improving the reliability and user experience of data collection operations.

Highlights

  • New Failure Reporting System: Introduced a comprehensive failure reporting and dead letter queue mechanism, including a new failures package to manage collection errors.
  • Enhanced collect Command Behavior: The collect commands now support --on-failure and --failures-dir flags, allowing users to define how failures are handled (continue, stop, prompt) and where reports are stored.
  • New failures Command: Added a new top-level failures command with subcommands show to view detailed failure reports and clear to remove old reports.
  • New retry Command: Implemented a retry command that allows users to re-attempt previously failed collection tasks, with an option to retry only 'retryable' failures.
  • Code Refactoring: Refactored the single-repository cloning logic in collect_github_repo.go into a reusable function, improving modularity and maintainability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable failure reporting and dead letter queue feature, along with new failures and retry commands. The refactoring of the repository collection logic is a good step towards better code structure.

My review has identified a critical bug where collecting multiple repositories results in files being overwritten. I've also provided several suggestions to improve the robustness and maintainability of the new feature, such as using more reliable error handling, improving user input prompts, and making the failure reporting and retry mechanisms more generic to support future collection types. Additionally, I've noted an inconsistency where failure reporting is not applied to the single repository collection command.

Comment on lines +112 to +117
if outputFile == "" {
outputFile = "repo." + format
if compression != "none" {
outputFile += "." + compression
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation for generating a default output filename will cause issues when collecting multiple repositories, as each collected repository will overwrite the previous one. The filename repo.<format> is not unique. The default filename should be derived from the repository URL to ensure uniqueness.

Note: This change requires importing the path and strings packages.

if outputFile == "" {
	// Generate a unique filename from the repo URL to avoid overwriting files.
	// e.g., https://github.com/owner/name.git -> name.<format>
	base := path.Base(repoURL)
	repoName := strings.TrimSuffix(base, path.Ext(base))
	outputFile = repoName + "." + format
	if compression != "none" {
		outputFile += "." + compression
	}
}


fmt.Fprintln(cmd.OutOrStdout(), "Repository saved to", outputFile)
return nil
return collectRepo(repoURL, outputFile, format, compression, password, cmd)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency across the tool, the collect-github-repo command should also implement failure reporting, similar to how it's done for collect-github-repos. Currently, if this command fails, no failure report is generated, which might be unexpected for users given the new --on-failure and --failures-dir flags on the parent collect command.

fmt.Fprintln(cmd.OutOrStdout(), "Collecting", repo)
err := collectRepo(repo, "", "datanode", "none", "", cmd)
if err != nil {
retryable := !strings.Contains(err.Error(), "not found")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Determining if an error is retryable by checking for the substring "not found" in the error message is brittle and not reliable. A better approach would be to use typed errors. The function that performs the cloning (GitCloner.CloneGitRepository) could return a specific error type (e.g., vcs.ErrRepositoryNotFound) when a repository is not found. This would allow for a more robust check using errors.Is.

Comment on lines +58 to +59
var response string
fmt.Scanln(&response)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using fmt.Scanln for user input can be fragile. It doesn't handle errors and can leave unread characters in the input buffer if the user enters more than one word, which might affect subsequent operations. For more robust interactive prompts, consider using bufio.NewScanner(os.Stdin) to read a full line of input and then process it.

Note: This change requires importing the bufio and os packages.

var response string
					scanner := bufio.NewScanner(os.Stdin)
					if scanner.Scan() {
						response = scanner.Text()
					}

Comment on lines +60 to +62
if failuresDir == "" {
failuresDir = ".borg-failures"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default value for the failures directory, ".borg-failures", is hardcoded here and in several other places (cmd/collect.go, pkg/failures/manager.go). This makes it difficult to update the default value consistently. It would be better to define this as a public constant in the pkg/failures package and reuse it across the application.

Suggested change
if failuresDir == "" {
failuresDir = ".borg-failures"
}
if failuresDir == "" {
failuresDir = failures.DefaultFailuresDir // Assuming DefaultFailuresDir is the new constant
}

}

fmt.Printf("Retrying %s...\n", failure.URL)
retryCmd := exec.Command("borg", "collect", "github", "repo", failure.URL)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The retry command is hardcoded to execute borg collect github repo. This limits its utility to only one type of collection. To make this feature more scalable, the command for retrying should be constructed dynamically. The FailureReport could be extended to include the necessary information to reconstruct the original failed command.

Comment on lines +72 to +74
for _, failure := range m.report.Failures {
retryScript.WriteString(fmt.Sprintf("borg collect github repo %s\n", failure.URL))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The generated retry.sh script hardcodes the borg collect github repo command. This couples the failure manager to a specific collection command and limits the utility of the generated script. The script should be generated based on the actual command that failed, making the failure reporting mechanism more generic and scalable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Failure reporting and dead letter queue

1 participant