Skip to content

Conversation

@elangelo
Copy link

@elangelo elangelo commented Jan 6, 2026

Description

This PR ensures multiple threads are used to create backups and to restore backups. This ensures a considerate speedup when using cloud storage such as S3.
For comparison a backup to s3 of 1.8TiB takes roughly 16 minutes with this code. a 340GiB collection on the old code takes roughly 50 minutes.
Restoring the same collection took 7 minutes instead of 1 hour and 20 minutes (on a 6 node cluster)

Solution

As the previous implementation already had a loop over all files that needed to be backed up to the backup repository I simply wrapped that in a ThreadPool Executor

Tests

I have ran this code locally on a solrcloud cluster

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • The current tests cover the improvement
  • I have added documentation for the Reference Guide
  • I have added a changelog entry for my change

@elangelo elangelo marked this pull request as draft January 6, 2026 16:20
…to 1 thread, allow overriding with system properties
@elangelo elangelo marked this pull request as ready for review January 6, 2026 17:31
Copy link
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the multi thread chops to approve this, but reaidn through it looks good. I wanted a change to the variable name. Do we need any new tests for this capablity, or do the existing ones cover it well enough?

* SOLR_BACKUP_MAX_PARALLEL_UPLOADS}.
*/
private static final int DEFAULT_MAX_PARALLEL_UPLOADS =
EnvUtils.getPropertyAsInteger("solr.backup.maxParallelUploads", 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern we are using now is dot cased, so solr.backup.maxparalleluploads, or maybe if we had mulitple properties solr.backup.parraleluploads.max....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use of EnvUtils, we need them everywhere.

Copy link
Author

@elangelo elangelo Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this.

Backup and restore operations can transfer multiple index files in parallel to improve throughput, especially when using cloud storage repositories like S3 or GCS where latency is higher.
The parallelism is controlled via system properties or environment variables:

`solr.backup.maxParallelUploads`::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solr.backup.maxparalleluploads ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@elangelo
Copy link
Author

elangelo commented Jan 7, 2026

I don't have the multi thread chops to approve this, but reaidn through it looks good. I wanted a change to the variable name. Do we need any new tests for this capablity, or do the existing ones cover it well enough?

I think the current tests actually cover everything already. Mind that I did change the gcsrepository and s3repository tests to have some parallelism. Unfortunately I was limited to only 2 threads as with more I got an OutOfMemoryException. But I think it still covers what needs covering.

Copy link
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd love another committer who is more comfortable with this code base and especially the multithreaded nature of it to review as well.

…was referred to by the non-canonical name `ExecutorUtil.MDCAwareThreadPoolExecutor.CallerRunsPolicy`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants