-
Notifications
You must be signed in to change notification settings - Fork 122
FEAT: Improve interactions of user pools #1009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
mj-will
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I look forward to trying it! I've added some initial comments but might need to have a second look.
| result = apply_conversion_function( | ||
| result=result, | ||
| likelihood=likelihood, | ||
| conversion_function=conversion_function, | ||
| npool=npool, | ||
| pool=pool, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always reapply the conversion function? If so, I'm unsure if this is desirable since rather than bilby quickly exiting when a run is already done it will spend time doing the conversion. That said, i don't feel strongly about this, so happy to keep it. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will reapply it, this maintains the current behaviour. I'd be open to changing that behaviour, maybe by adding a flag to the result file to say if the conversion has been applied, but I would say to do that as a separate change.
| parameters=priors.sample(), | ||
| ) as _pool: | ||
| start_time = datetime.datetime.now() | ||
| sampler.pool = _pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe to do? Depending on how the sampler uses the pool, are there settings where the pool the sampler has stored is not updated?
For example, if the sampler has constructed a likelihood using the pool.map from the initial input pool will this break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to work through what this would look like.
If the initial input pool is not None, all of the pool objects reference here should be the same, so I don't think that will make a difference.
One potential issue is that if a specific sampler implementation handles the pool internally by itself this will create two pools and then in the best case, we have a bunch of extra processes we don't need. I think this is what nessai does, so maybe we should game through that specific case.
| if ( | ||
| os.path.isfile(self.resume_file) | ||
| and os.path.getsize(self.resume_file) | ||
| and os.path.getsize(self.resume_file) > 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty pickle files have nonzero size (5B) so currently the code thinks that should be a valid resume file and fails. Probably this should be fixed on main too.
| npool=None, | ||
| pool=None, | ||
| parameters=None, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would benefit from a doc-string with example usage.
| npool=None, | ||
| pool=None, | ||
| parameters=None, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would benefit from a doc-string. In particular, I think we should make clear what happens in npool is None or 1.
| my_pool = create_pool(likelihood=this_logl, npool=npool) | ||
| if my_pool is None: | ||
| map_fn = map | ||
| else: | ||
| map_fn = partial(my_pool.imap, chunksize=chunksize) | ||
| likelihood_fn = partial(_safe_likelihood_call, this_logl) | ||
|
|
||
| log_l = list(tqdm( | ||
| map_fn(likelihood_fn, dict_samples[starting_index:]), | ||
| desc='Computing likelihoods', | ||
| total=n, | ||
| )) | ||
| close_pool(my_pool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use open/close over the context manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems not.
| if _pool is not None: | ||
| subset_samples = _pool.map( | ||
| _compute_per_detector_log_likelihoods, | ||
| fill_args[ii: ii + block] | ||
| ) | ||
| else: | ||
| from ..core.sampler.base_sampler import _sampling_convenience_dump | ||
| _sampling_convenience_dump.likelihood = likelihood | ||
| subset_samples = [ | ||
| list(_compute_per_detector_log_likelihoods(xx)) | ||
| for xx in fill_args[ii: ii + block] | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth simplifying this with something like this?
if _pool is None:
map_fn = map
_sampling_convenience_dump.likelihood = likelihood
else:
map_fn = pool.map
subset_samples = map_fn(
_compute_per_detector_log_likelihoods,
fill_args[ii: ii + block]
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
| if _pool is not None: | ||
| subset_samples = _pool.map(fill_sample, fill_args[ii: ii + block]) | ||
| else: | ||
| subset_samples = [list(fill_sample(xx)) for xx in fill_args[ii: ii + block]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about map but not sure it's worth it. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified it to look closer, but I agree it probably isn't worth it in this case.
ColmTalbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember exactly what the behaviour for some of these questions is, so I'll go back and check, and ideally write docstrings about them.
| if ( | ||
| os.path.isfile(self.resume_file) | ||
| and os.path.getsize(self.resume_file) | ||
| and os.path.getsize(self.resume_file) > 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty pickle files have nonzero size (5B) so currently the code thinks that should be a valid resume file and fails. Probably this should be fixed on main too.
| result = apply_conversion_function( | ||
| result=result, | ||
| likelihood=likelihood, | ||
| conversion_function=conversion_function, | ||
| npool=npool, | ||
| pool=pool, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will reapply it, this maintains the current behaviour. I'd be open to changing that behaviour, maybe by adding a flag to the result file to say if the conversion has been applied, but I would say to do that as a separate change.
| parameters=priors.sample(), | ||
| ) as _pool: | ||
| start_time = datetime.datetime.now() | ||
| sampler.pool = _pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to work through what this would look like.
If the initial input pool is not None, all of the pool objects reference here should be the same, so I don't think that will make a difference.
One potential issue is that if a specific sampler implementation handles the pool internally by itself this will create two pools and then in the best case, we have a bunch of extra processes we don't need. I think this is what nessai does, so maybe we should game through that specific case.
I noticed that it was difficult to pass a user-specified pool through run_sampler, and it isn't used at all in the post processing. This PR:
with multiprocessing.pool()...)Sampler._setup_poolandSampler._close_poolparallel_bilbymoot.