Skip to content

Q-Block: Better timing / recovering when rate limiting#1976

Open
mrdeep1 wants to merge 1 commit into
obgm:developfrom
mrdeep1:q_block_fixes
Open

Q-Block: Better timing / recovering when rate limiting#1976
mrdeep1 wants to merge 1 commit into
obgm:developfrom
mrdeep1:q_block_fixes

Conversation

@mrdeep1
Copy link
Copy Markdown
Collaborator

@mrdeep1 mrdeep1 commented May 4, 2026

Make use of an outstanding list of blocks to transmit to remove duplicates and only send one at a time if rate-limiting.

Make use of an outstanding list of blocks to transmit to remove duplicates
and only send one at a time if rate-limiting.
@yavuzhaliloglu
Copy link
Copy Markdown

Thanks a lot for putting this together. The queue-based approach looks much cleaner than mine — handling duplicates at the data structure level instead of the call flow makes a lot of sense, and it should resolve the timer interaction concerns you mentioned earlier as well.

I'll test #1976 against my Contiki-NG / 6TiSCH setup with rate limiting under various packet loss scenarios and report back as soon as I have results.

@yavuzhaliloglu
Copy link
Copy Markdown

Hi Jon, here are my test results against my Contiki-NG / 6TiSCH setup with -I 60 rate limiting and %0-20 induced packet loss on a 43-block payload.

Quick disclaimer: during testing I also have a couple of local changes on top of your branch (NON_TIMEOUT increased from 2s to 8s, NON_RECEIVE_TIMEOUT from 4s to 16s, and STATE_MAX_BLK_CNT_BITS set to a larger value for shorter tokens in my constrained setup). I tried to identify behaviors that don't seem to depend on those values, but please factor in this context.

I observed three behaviors that may indicate bugs, and one possible off-by-one I'd like to flag for verification:

  1. Long pause after retransmitting a payload-set-boundary block (e.g., 9, 19, 29, 39)
    When the missing-blocks list contains a block at the boundary of a MAX_PAYLOADS set, the boundary block is sent correctly but transmission stalls for roughly NON_TIMEOUT seconds before the natural progression resumes. Reading coap_send_q_blocks() it looks like lg_xmit->last_payload is unconditionally refreshed on every call regardless of whether a full burst was sent, so the scheduler treats the partial recovery burst as if a complete set were just transmitted.

  2. Some blocks from a 3+ missing-block list are not retransmitted promptly
    When the missing list contains a boundary block in the middle (e.g., [3, 9, 12]), only the entries up to and including the boundary are sent; the ones after it are deferred and only picked up later through the natural-progression path. The set-boundary check in the main loop ((block.num % MAX_PAYLOADS) + 1 == MAX_PAYLOADS) seems to fire for resend context just like for natural progression, breaking the loop early.

  3. When the missing list contains only the last block (e.g., body's final block 43), it is not retransmitted
    In coap_send_q_blocks() the pre-loop block_pdu is only allocated when block.m is true. For COAP_SEND_SKIP_PDU recovery where the missing list contains only the final block (m=0), no PDU is allocated, the main loop never runs, and last_all_sent is refreshed as if the transfer were complete. The server then times out waiting for the block.

Happy to share full logs for any of these or run additional scenarios you'd like to see. Thanks again for the queue-based redesign — the overall recovery behavior in this PR is significantly cleaner than before.

@mrdeep1
Copy link
Copy Markdown
Collaborator Author

mrdeep1 commented May 13, 2026

Thanks for doing this testing and coming back with the information. I have started to look at how to address them - I had missed them due to the shorter default timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants