-
Notifications
You must be signed in to change notification settings - Fork 0
Split callset into chunks for processing and add additional cli args #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mlathara
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question: otherwise lgtm
examples/genomicsdb_query
Outdated
| # Check if there is room for row_tuples to be parallelized | ||
| chunk_size = int(args.chunk_size) | ||
| if len(configs) < args.nproc and chunk_size > 1: | ||
| if row_tuples is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider only doing the splitting by row tuples if all samples are being queried? I'm worried about the case where only a few random (but disjoint) samples are queried and that could lead to us creating a whole bunch of configs where each config is only querying a single sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, you have a point and we could do the splitting only if all samples are being queried.
mlathara
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Mainly allow for multiprocessing of samples in callset based on
chunk sizeif no samples were specified togenomicsdb_query.Additionally add cli support to specify
chunk size, addbypass-intersecting-intervals-phaseand allow adryrunto see what may be processed without actually executing the query.