Skip to content

New multithreaded API#43

Open
projekter wants to merge 9 commits into
tpapp:masterfrom
projekter:master
Open

New multithreaded API#43
projekter wants to merge 9 commits into
tpapp:masterfrom
projekter:master

Conversation

@projekter
Copy link
Copy Markdown

This is my go for implementing a multithreaded API. Initially, I tried what you suggested in #36 and used Folds.jl; however, this turned out to be very impractical. With Folds.jl, you have to specify a basesize if you somehow want to control how many tasks to spawn. But a static division strategy does not really make a lot of sense for optimizations, where the landscape could be wildly different and therefore the optimization times might vary a great deal depending on the initial point. So instead, I moved to OhMyThreads.jl and use two different strategies:

  • To generate the initial points, we'll have to evaluate the objective function at each initial point candidate once, which should always take pretty much the same time, so I'm using a StaticScheduler.
  • To optimize locally, I'm using a GreedyScheduler with fixed number of tasks in order to keep the computer busy.
    Of course, this is just the default strategy, the custom options or scheduler from OhMyThreads may be used to provide an own strategy.

What I don't like yet, but don't see a good alternative: The possible keyword arguments are currently named explicitly. This means that once OhMyThread.jl changes/adds/removes a keyword argument, the list here has to be synchronized. However, given that we have just one function but need to construct two different objects, what would be the alternative?

  • Dropping keyword argument support completely and requiring a scheduler_initpoints and a scheduler_localopt instead?
  • Using Base.kwarg_decl and which to dynamically generate the possible keyword arguments and intersect with a kwargs..., plus raising an error for unused kwargs? Actually, this would work rather decently when converting the multistart_minimization to a generated function.
  • Of course, your ignoring unused keyword arguments JuliaLang/julia#10279 would be great, which just got closed.

How relevant would this be: I hope that my PR to OhMyThreads.jl will receive some attention sooner or later, which would add a migration keyword argument to the GreedyScheduler, which should also be passed through here.

What I discovered while developing this: MultistartOptimization.jl is terribly type-unstable, which is Sobol.jl's fault. I just filed a PR to fix this, after which everything, including the new parallelized code, should be statically inferrable.

Closes #36

Copy link
Copy Markdown
Owner

@tpapp tpapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this contribution. I think it will improve the package a lot but I have a few questions and suggestions.

Comment thread src/tiktak.jl
s = SobolSeq(lower_bounds, upper_bounds)
skip(s, N) # better uniformity
points = Iterators.take(s, N)
points = collect(Iterators.take(s, N))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the source (and the fix) for the type instability? I am OK with this workaround but please leave a comment, eg

Suggested change
points = collect(Iterators.take(s, N))
# temporary workaround for type instability in Sobol.jl, cf
# https://github.com/JuliaMath/Sobol.jl/pull/42
points = collect(Iterators.take(s, N))

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this has nothing to do with type instability. I did not introduce any kind of workaround, I just made sure that I don't introduce any new instabilities unrelated to the Sobol issue.
This is related to the way OhMyThreads does multithreading. Previously, the multithreaded iteration was

map(fetch, map(x -> @spawn(_initial(x)), points))

so that the iteration over the points is done in the main thread, which then spawns starting from the point that we already got. OMT instead might wants to index (with ranges) into the points, depending on the scheduler. So basically, we need something that
a) supports an AbstractArray interface - which Iterators.take doesn't do, but this could easily be mitigated
b) can be indexed randomly - but the s is an iterator and the next points depends on the previous one. So I don't really see a good way around this pre-allocation of the points. It could be made a bit more efficiently by pre-allocating a matrix and then iterating into its columns instead of having a vector of vectors, but that's the best I can currently imagine.

Comment thread src/tiktak.jl Outdated
minchunksize::Union{Integer,Nothing}=nothing,
chunking::Union{Bool,Nothing}=nothing,
nchunks::Union{Integer,Nothing}=nothing,
split::Union{OhMyThreads.Split,Symbol,Nothing}=nothing)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be fine to just deprecate use_threads (issue a deprecation warning) so it can be removed in the next major release.

Comment thread src/tiktak.jl Outdated
Comment thread src/tiktak.jl Outdated
Comment thread src/tiktak.jl Outdated
Comment thread src/tiktak.jl
@projekter
Copy link
Copy Markdown
Author

projekter commented Apr 8, 2026

I now made a couple of changes that should implement some or all of your comments:

  • Moved from keyword arguments on the main optimization function to just passing schedulers, define default functions for both schedulers, don't export but make them public. And I don't think any more that the use_threads should necessarily be deprecated.
  • The whole index and reduction stuff was not very easy to understand, in particular in connection with the index - and you also asked about it. So I split the function into two, one for multithreading, the other for single-threading. The latter is basically what you had before module one allocation optimization, the multithreaded one now uses foreach, which is much easier to understand. However, a tiny bit of synchronization is needed which I don't think is very harmful, considering we only need to do it when an actual improvement is found.1 But this is the price of it now being easier to understand - do you like this better or should I revert to the reduction?
  • I also added a short comment on why I define the EnumeratedCatVector. The "cat" part is not really necessary, we could simply use the vcat from before (just skip it if there are no prepend points), but it is an unnecessary (potentially larger) allocation plus copying stuff around.
  • And I added documentation about the non-determinism. I think that with the round-robing splitting strategy, we come closest to the serial implementation (since all the "best"/lowest initial points are tried first), but of course it is still not deterministic at all.

1 This could be superficially avoided by also storing the local results in the task-local storage and moving this storage into a Channel, so that in the end we can retrieve the best values for all tasks and take the minimum. Since internally the channels also have to synchronize, so I don't think this would give any kind of benefit.

…tion

Be careful with mutating optimizers, don't let them invalidate previous results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

any guidance on parallization

2 participants