Skip to content

Conversation

@nalinigans
Copy link
Member

@nalinigans nalinigans commented Jan 13, 2025

Mainly allow for multiprocessing of samples in callset based on chunk size if no samples were specified to genomicsdb_query.

Additionally add cli support to specify chunk size, add bypass-intersecting-intervals-phase and allow a dryrun to see what may be processed without actually executing the query.

@nalinigans nalinigans changed the title Split callset into chunk for processing and add additional cli args Split callset into chunks for processing and add additional cli args Jan 13, 2025
@nalinigans nalinigans requested a review from mlathara January 13, 2025 19:42
Copy link
Member

@mlathara mlathara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question: otherwise lgtm

# Check if there is room for row_tuples to be parallelized
chunk_size = int(args.chunk_size)
if len(configs) < args.nproc and chunk_size > 1:
if row_tuples is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider only doing the splitting by row tuples if all samples are being queried? I'm worried about the case where only a few random (but disjoint) samples are queried and that could lead to us creating a whole bunch of configs where each config is only querying a single sample

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you have a point and we could do the splitting only if all samples are being queried.

@nalinigans nalinigans requested a review from mlathara January 14, 2025 23:46
Copy link
Member

@mlathara mlathara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nalinigans nalinigans merged commit 83161e0 into develop Jan 15, 2025
8 checks passed
@nalinigans nalinigans deleted the ng_split_callset branch January 15, 2025 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants