Fix Q-Block re-entrancy in rate-limiting pacing#1973
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a re-entrancy corner case where Q-Block retransmissions (triggered by 4.08 / 2.31 handling) can occur while NON pacing is already active, causing retransmitted blocks to bypass pacing and/or overlap other Q-Block send flows.
Changes:
- Adds a per-session
is_doing_q_blockflag to mark “Q-Block send/handler in progress” conditions. - Updates NON pacing gating in
coap_send_internal()to allow pacing to run even during rate-limited re-entrant paths when Q-Block processing is active. - Prevents overlapping timer-driven Q-Block “next MAX_PAYLOADS set” pushes and ensures nested Q-Block retransmit handlers save/restore the flag correctly.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/coap_net.c |
Alters the NON pacing gate in coap_send_internal() to treat Q-Block re-entrancy specially. |
src/coap_block.c |
Adds is_doing_q_block guards/set-reset around timer-driven Q-Block sends and 4.08 retransmit handling to avoid overlaps and preserve state in nested paths. |
include/coap3/coap_session_internal.h |
Introduces the new coap_session_t::is_doing_q_block state flag and documents its intent. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…cy issue When coap_send_internal() is called recursively from a Q-Block response handler (4.08, 2.31), the inner pacing loop would clear is_rate_limiting to 0 on exit, clobbering the outer frame's state. Save the previous value at entry and restore it at exit so nested invocations don't corrupt the outer wait loop's guard. Addresses Copilot review feedback on PR obgm#1973.
|
I'm doing code testing under different failure scenarios which has highlighted a couple of things not affected by this PR that could be handled better - in particular the effects of different timer timeouts when rate limiting. |
|
Thanks for the heads-up — let me know if anything you find ends up overlapping with this PR or if you'd like me to look into a specific timer interaction. |
|
I have come up with an alternative solution in #1976 which gives better recovery and only transmitting packets once unless the packet is reported as missing. It should solve the recursive issues and all transmitted packets are sent at the appropriate rate. Essentially it sets up a list of blocks to be sent, with remaining blocks updated whenever there is a successful transmission. If a packet is reporting as missing, the remaining blocks gets updated unless the block already exists in the list. The next packet to send is always the first in the remaining blocks list. Please let me know if #1976 work for you. I have done a reasonable amount of testing, but it is possible that there are still some corner cases that are not handled. |
Problem
When a 4.08 (Request Entity Incomplete) or 2.31 (Continue) response triggers retransmission of Q-Block packets, the response handler calls
coap_send_internal()re-entrantly from inside an outer pacing wait loop. The recursive entry was bypassing pacing entirely becausesession->is_rate_limitingwas already set by the outer frame, causing retransmitted blocks to burst out without delay.Fix
Two-part fix:
Add a new
session->is_doing_q_blockflag, set when processing a Q-Block response handler or timer-driven Q-Block restart. The pacing gate incoap_send_internal()now applies pacing when eitheris_rate_limitingis unset ORis_doing_q_blockis set, so re-entrant Q-Block sends still go through the wait loop.Save and restore the previous value of
is_rate_limitingaround the inner pacing loop, so nested invocations don't clobber the outer frame's state when they exit.Testing
Tested against Contiki-NG / 6TiSCH constrained nodes running RFC 9177 Q-Block subsystem. Retransmitted blocks now respect the configured pacing interval under
-I rate_limit_ppm.