Skip to content

Parallel Multi-BRWT query with one traversal#559

Open
karasikov wants to merge 69 commits intomasterfrom
mk/brwt
Open

Parallel Multi-BRWT query with one traversal#559
karasikov wants to merge 69 commits intomasterfrom
mk/brwt

Conversation

@karasikov
Copy link
Member

@karasikov karasikov commented Nov 3, 2025

  • parallel querying to BRWT
  • enabled parallel batch query of annotations with counts
  • refactoring and other improvements

Status of parallelization for different query setups with batch query:

Annotation type \ Query type matches counts coords
basic na na
with counts ❌ -> ✅ na
with coords ❌ -> ✅ - (always non-batch)

Copy link
Contributor

@adamant-pwn adamant-pwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, it's nice to make it faster! Also sorry for the delay with review, the PR is quite large 👀


template <typename T>
std::vector<T>
BinaryMatrix::get_rows_parallel(const std::vector<Row> &rows, size_t num_threads,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the naming is somewhat confusing here. If it just get_rows_..., shouldn't it be a (non-static) member functions that calls get_rows directly?

But it looks like you also use this function with some other callers, where get_rows is actually something else, like get_row_values or get_column_ranks.

Maybe name it get_row_data_parallel, or similar?

size_t max_chunk_size,
size_t num_threads,
bool one_chunk_if_single_thread = true) {
num_threads = std::max<size_t>(1, num_threads);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why and when do we ever have num_threads = 0?

size_t num_threads,
bool one_chunk_if_single_thread = true) {
num_threads = std::max<size_t>(1, num_threads);
return (one_chunk_if_single_thread && num_threads < 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (one_chunk_if_single_thread && num_threads < 2)
return (one_chunk_if_single_thread && num_threads == 1)


size_t batch_size = std::min(kRowBatchSize,
(codes.size() + num_threads - 1) / num_threads);
const size_t batch_size = std::max<size_t>(1, std::min(kRowBatchSize, codes.size() / num_threads));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call get_chunk_size?

}

std::vector<BinaryMatrix::SetBitPositions>
RowMajor::get_rows(const std::vector<Row> &row_ids, size_t num_threads) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this version doesn't use get_rows_parallel?

rd_ids = std::vector<Row>();
std::vector<BinaryMatrix::Row>().swap(rd_ids);

#pragma omp parallel for num_threads(num_threads) schedule(dynamic, 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should 1000 be named?

std::atomic<uint64_t> num_found_kmers = 0;
#pragma omp parallel for num_threads(num_threads) schedule(dynamic, 100)
assert(num_threads);
size_t chunk_size = min<size_t>(max<size_t>(1, contigs.size() / (num_threads * 10)), 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make 10 and 100 named constants?

tsl::hopscotch_map<std::string, std::vector<std::pair<std::string, std::string>>> indexes;

ThreadPool graphs_pool(get_num_threads());
ThreadPool graphs_pool(get_num_threads(), 1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ThreadPool graphs_pool(get_num_threads(), 1000);
ThreadPool graphs_pool(get_num_threads(), /*max_num_tasks*/1000);

Comment on lines +308 to +314
if(APPLE)
if(${CMAKE_OSX_ARCHITECTURES} MATCHES "arm64")
set(HOMEBREW_DIR /opt/homebrew)
else()
set(HOMEBREW_DIR /usr/local)
endif()
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(APPLE)
if(${CMAKE_OSX_ARCHITECTURES} MATCHES "arm64")
set(HOMEBREW_DIR /opt/homebrew)
else()
set(HOMEBREW_DIR /usr/local)
endif()
endif()
if(APPLE)
execute_process(COMMAND brew --prefix
OUTPUT_VARIABLE HOMEBREW_DIR
OUTPUT_STRIP_TRAILING_WHITESPACE)
endif()

Would something like this work? I feel like it's semantically better approach than guessing from architecture.

elseif(${CMAKE_SYSTEM_PROCESSOR} MATCHES "x86_64")
set(HOMEBREW_DIR /usr/local)
else()
set(HOMEBREW_DIR ~/.linuxbrew)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment in main CMakeLists.txt file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants