Skip to content

scheduler remove lock for scheduled|resumed tasks#444

Merged
jbaldwin merged 9 commits intomainfrom
issue-404/scheduler-atomic-waiter-list
Mar 5, 2026
Merged

scheduler remove lock for scheduled|resumed tasks#444
jbaldwin merged 9 commits intomainfrom
issue-404/scheduler-atomic-waiter-list

Conversation

@jbaldwin
Copy link
Copy Markdown
Owner

@jbaldwin jbaldwin commented Feb 6, 2026

  • Use atomic waiter list instead of std::mutex + std::vector to pass tasks scheduled through to the coro::scheduler.

* Uses atomic waiter list to enqueue schedule_operations.
* List is reversed prior to execution so they are in order.
* Normally co_await'ed tasks do not allocate
* std::coroutine_handle<>'s that are directly resumed require a
  schedule_operation to be allocated and deleted upon completion.
* Consider adding a pool of pre-allocated schedule_operations to reuse
  for direct coroutine handle resumes.
@jbaldwin jbaldwin force-pushed the issue-404/scheduler-atomic-waiter-list branch 2 times, most recently from c37e7f9 to 4356df0 Compare February 22, 2026 17:59
@jbaldwin jbaldwin force-pushed the issue-404/scheduler-atomic-waiter-list branch from 4356df0 to 4b9c114 Compare February 22, 2026 18:13
@jbaldwin jbaldwin marked this pull request as ready for review March 1, 2026 17:16
@jbaldwin
Copy link
Copy Markdown
Owner Author

jbaldwin commented Mar 1, 2026

@PyXiion been toying around with #404 and #446 solutions, this by no means will resolve both issues but I found in performance testing its somewhere between 5% to 10% quicker for scheduling tasks on a scheduler using inline.

Comment thread include/coro/scheduler.hpp
Comment thread src/scheduler.cpp
else
{
m_thread_pool->resume(m_handles_to_resume);
m_size.fetch_sub(m_handles_to_resume.size(), std::memory_order::release);
Copy link
Copy Markdown
Owner Author

@jbaldwin jbaldwin Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the m_size being decremented only here by the process calls for inline and thread pool, it was difficult to merge the two patterns until I realized the other decrements should have just been here all along.

Comment thread src/scheduler.cpp
Comment thread src/scheduler.cpp Outdated
Comment thread test/bench.cpp
const constexpr std::size_t connections_per_client = 10;
// const constexpr std::size_t connections_per_client = 2;
const constexpr std::size_t messages_per_connection = 2;
const constexpr std::size_t messages_per_connection = 10'000;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this test to actually have some throughput for benchmarking, and removed most of the REQUIRE_THREAD_SAFE since they are a lock and affect benching

{
if constexpr (std::is_void_v<return_type>)
{
co_await schedule(std::move(task));
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug, added a test to fix it as well.

Comment thread src/scheduler.cpp
{
delete op;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a partial read happens? write is guaranteed to be atomic (if the size of the data written is <= PIPE_BUF), but read is not

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point, the check needs to be perfectly divided by for the sizeof, in testing this doesn't seem to happen, but that doesn't mean it won't. Caching a partial read sucks too. 😬

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe this change isn't really worth it, there are some nice cleanups around these data structures but that is a real problem.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do something similar to read_exact in TCP, but read_aligned

auto pipe_t::read_aligned(void* buffer, std::size_t n, std::size_t align) -> long
{
    if (n == 0 || align == 0) return 0;
    
    std::size_t total_read = 0;
    char* ptr = static_cast<char*>(buffer);

    while (total_read < n) {
        ssize_t result = ::read(read_fd(), ptr + total_read, n - total_read);

        if (result < 0) {
            if (errno == EINTR) continue;
            return -1;
        }
        // EOF
        if (result == 0) break;

        total_read += result;

        if (total_read % align == 0) {
            return static_cast<ssize_t>(total_read);
        }
    }

    if (total_read % align != 0) {
        // If we hit EOF but aren't aligned, we have data corruption.
        std::cerr << "cry in cerr\n";
        return -1; 
    }

    return static_cast<long>(total_read);
}

But it looks a bit too much

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it won't save us if an error happens midpointer ._.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think, this can completely break if we have multiple threads

Thread A calls read and the kernel gives it 12 bytes
Thread B calls read and the kernel gives it the remaining 4 bytes plus whatever comes next

Now they have 1,5 and 0,5 pointers and corrupted data

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah all good points, for the scheduler the event loop is currently single threaded, but it would limit us to always be singly threaded on it. I'll update this PR to revert passing the schedule_operation* through the pipe, I think there are some other nice changes to keep here though.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've reverted to using an atomic waiter list (still dropping the lock). In theory I believe this should work, but its possibly got an ABA problem?

jbaldwin added 3 commits March 1, 2026 11:03
* Code review noted that while writes to the pipe are atomic, reads are
  not, so you are not guaranteed to get a full pointer read. In testing
this didn't happen but in reality it is a failure point. This adds too
much complexity.

* Attempting to revert to using the atomic waiter list, this possibly
  has the ABA problem and needs to be worked through.
@jbaldwin
Copy link
Copy Markdown
Owner Author

jbaldwin commented Mar 1, 2026

I think this works to get rid of one lock but I think getting rid of the timer lock could help a lot too. Probably need to create the poll info on schedule op and have the process event thread use the poll map exclusively. This could introduce some latency to the timers though.

@jbaldwin
Copy link
Copy Markdown
Owner Author

jbaldwin commented Mar 5, 2026

I'm gonna give this a merge to move forward, if we start seeing failures we can revert.

@jbaldwin jbaldwin merged commit a1aeb6c into main Mar 5, 2026
61 checks passed
@jbaldwin jbaldwin deleted the issue-404/scheduler-atomic-waiter-list branch March 5, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants