Skip to content

feat: Parallel/concurrent collection #28

@Snider

Description

@Snider

Summary

Add support for parallel downloads to speed up large collections.

Use Case

Collecting 50+ repos or crawling large websites is slow sequentially. Parallelism can dramatically improve speed.

Commands

# Parallel GitHub collection
borg collect github repos LetheanNetwork --parallel 5

# Parallel website crawl
borg collect website https://docs.example.com --parallel 3

# Global setting
borg config set parallelism 4

Implementation

  • Worker pool pattern
  • Per-domain rate limiting (even with parallelism)
  • Progress bar showing all workers
  • Graceful shutdown on interrupt

Options

Flag Default Description
--parallel N 1 Number of concurrent workers
--rate-limit none Max requests per second per domain

Considerations

  • Respect robots.txt crawl-delay
  • Don't hammer single domains
  • Handle worker failures gracefully

Acceptance Criteria

  • Worker pool implementation
  • Per-domain rate limiting
  • Progress reporting for parallel tasks
  • Graceful error handling
  • Memory-efficient for large queues

Metadata

Metadata

Assignees

No one assigned

    Labels

    julesFor Jules AI to work on

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions