Conversation
adamant-pwn
left a comment
There was a problem hiding this comment.
Thanks for the PR, it's nice to make it faster! Also sorry for the delay with review, the PR is quite large 👀
|
|
||
| template <typename T> | ||
| std::vector<T> | ||
| BinaryMatrix::get_rows_parallel(const std::vector<Row> &rows, size_t num_threads, |
There was a problem hiding this comment.
I feel like the naming is somewhat confusing here. If it just get_rows_..., shouldn't it be a (non-static) member functions that calls get_rows directly?
But it looks like you also use this function with some other callers, where get_rows is actually something else, like get_row_values or get_column_ranks.
Maybe name it get_row_data_parallel, or similar?
| size_t max_chunk_size, | ||
| size_t num_threads, | ||
| bool one_chunk_if_single_thread = true) { | ||
| num_threads = std::max<size_t>(1, num_threads); |
There was a problem hiding this comment.
Why and when do we ever have num_threads = 0?
| size_t num_threads, | ||
| bool one_chunk_if_single_thread = true) { | ||
| num_threads = std::max<size_t>(1, num_threads); | ||
| return (one_chunk_if_single_thread && num_threads < 2) |
There was a problem hiding this comment.
| return (one_chunk_if_single_thread && num_threads < 2) | |
| return (one_chunk_if_single_thread && num_threads == 1) |
|
|
||
| size_t batch_size = std::min(kRowBatchSize, | ||
| (codes.size() + num_threads - 1) / num_threads); | ||
| const size_t batch_size = std::max<size_t>(1, std::min(kRowBatchSize, codes.size() / num_threads)); |
There was a problem hiding this comment.
Why not call get_chunk_size?
| } | ||
|
|
||
| std::vector<BinaryMatrix::SetBitPositions> | ||
| RowMajor::get_rows(const std::vector<Row> &row_ids, size_t num_threads) const { |
There was a problem hiding this comment.
Why this version doesn't use get_rows_parallel?
| rd_ids = std::vector<Row>(); | ||
| std::vector<BinaryMatrix::Row>().swap(rd_ids); | ||
|
|
||
| #pragma omp parallel for num_threads(num_threads) schedule(dynamic, 1000) |
| std::atomic<uint64_t> num_found_kmers = 0; | ||
| #pragma omp parallel for num_threads(num_threads) schedule(dynamic, 100) | ||
| assert(num_threads); | ||
| size_t chunk_size = min<size_t>(max<size_t>(1, contigs.size() / (num_threads * 10)), 100); |
There was a problem hiding this comment.
Let's make 10 and 100 named constants?
| tsl::hopscotch_map<std::string, std::vector<std::pair<std::string, std::string>>> indexes; | ||
|
|
||
| ThreadPool graphs_pool(get_num_threads()); | ||
| ThreadPool graphs_pool(get_num_threads(), 1000); |
There was a problem hiding this comment.
| ThreadPool graphs_pool(get_num_threads(), 1000); | |
| ThreadPool graphs_pool(get_num_threads(), /*max_num_tasks*/1000); |
| if(APPLE) | ||
| if(${CMAKE_OSX_ARCHITECTURES} MATCHES "arm64") | ||
| set(HOMEBREW_DIR /opt/homebrew) | ||
| else() | ||
| set(HOMEBREW_DIR /usr/local) | ||
| endif() | ||
| endif() |
There was a problem hiding this comment.
| if(APPLE) | |
| if(${CMAKE_OSX_ARCHITECTURES} MATCHES "arm64") | |
| set(HOMEBREW_DIR /opt/homebrew) | |
| else() | |
| set(HOMEBREW_DIR /usr/local) | |
| endif() | |
| endif() | |
| if(APPLE) | |
| execute_process(COMMAND brew --prefix | |
| OUTPUT_VARIABLE HOMEBREW_DIR | |
| OUTPUT_STRIP_TRAILING_WHITESPACE) | |
| endif() |
Would something like this work? I feel like it's semantically better approach than guessing from architecture.
| elseif(${CMAKE_SYSTEM_PROCESSOR} MATCHES "x86_64") | ||
| set(HOMEBREW_DIR /usr/local) | ||
| else() | ||
| set(HOMEBREW_DIR ~/.linuxbrew) |
There was a problem hiding this comment.
See the comment in main CMakeLists.txt file.
Status of parallelization for different query setups with batch query: