Skip to content

Allow concurrent sending of CAN frames - partial fix for issue #40#41

Merged
jsphuebner merged 3 commits intojsphuebner:can_print_timeoutfrom
davefiddes:fix-concurrent-can-send
Oct 6, 2025
Merged

Allow concurrent sending of CAN frames - partial fix for issue #40#41
jsphuebner merged 3 commits intojsphuebner:can_print_timeoutfrom
davefiddes:fix-concurrent-can-send

Conversation

@davefiddes
Copy link
Contributor

Stm32Can::Send() can be called from multiple contexts in a typical libopeninv application (main loop, CAN RX interrupt or timer interrupt(s)). To avoid CAN TX frames being lost we should disable interrupts while we manipulate the TX mailbox and queue.

Tests:

  • On a ZombieVerterVCU experiencing issue Under heavy load parameter database download fails #40 repeatedly request clean parameter databases with: while true; do oic cache clean && oic -n 3 -t 300 read opmode; done
  • Verify databases are downloaded correctly every time
  • Verify ox100 and 0x300 CAN frames sent to VW SBox continue to be sent

davefiddes and others added 2 commits September 24, 2025 15:51
This exports the list of errors and time at which they
occurred as two CANopen SDO arrays. The sub-
index is used as an index into the list of errors. Any
invalid indices return zero.

Clients can use the "lasterr" parameter database
spot-value which is present in all libopeninv projects
to map the error number to a string.

Tests:
 - Verify with updated OpenInverter CAN Tool that
   errors can be listed.
 - Initiate an new error after a period of time and
   ensure the list tallies with the elapsed wall-clock
   time.
 - Check CANopen SDO decodes as expected in
   Wireshark
…bner#40

Stm32Can::Send() can be called from multiple contexts in a typical
libopeninv application (main loop, CAN RX interrupt or timer interrupt(s)). To
avoid CAN TX frames being lost we should disable interrupts while we
manipulate the TX mailbox and queue.

Tests:
 - On a ZombieVerterVCU experiencing issue jsphuebner#40 repeatedly request clean
   parameter databases with:
 while true; do oic cache clean && oic -n 3 -t 300 read opmode; done
 - Verify databases are downloaded correctly every time
 - Verify ox100 and 0x300 CAN frames sent to VW SBox continue to be sent
This changes uses the capability of ARM Cortex-M3 CPUs to
raise the interrupt priority mask so that low priority interrupts
do not interrupt CAN sending. High priority interrupts form
PWM, overcurrent and emergency stop in applications like
stm32-sine will be processed unimpeded. High priority
interrupts, obviously, cannot send CAN messages.

The change requires:
 - CAN_MAX_IRQ_PRIORITY to be defined in hwdefs.h
 - The priority of all interrupts that can send CAN frames be 1
    or lower (bigger number, lower priority)

Tests:
 - Patch code into Stm32-vcu and stm32-sine
 - Torture test database download with:
  while true; do oic cache clean && oic read opmode; done
 - On stm32-sine set canperiod to 10ms
 - Ensure stm32-sine is in Run. Check jitter on exciter output
   with a DSO with infinite perrsistence. The new code shows
   no jitter. The older code shows a jitter of around 5.2 uSec.
@davefiddes
Copy link
Contributor Author

davefiddes commented Oct 3, 2025

It has been bugging me all week the impact on hard real-time performance this PR would have. I didn't know how to spell it in ARM speak but have done the necessary research and testing. The jitter added seems to be around 5.2 uSec which I don't think is going to destroy the operation of stm32-sine but it is completely avoidable.

The bad code looks like this:
SDS00023

The new code, of course, shows no measurable jitter at all.

@davefiddes davefiddes changed the base branch from master to can_print_timeout October 3, 2025 17:10
@jsphuebner jsphuebner merged commit 9fdcdc6 into jsphuebner:can_print_timeout Oct 6, 2025
1 check passed
@jsphuebner
Copy link
Owner

For backward compatibility could you specify a default value CAN_MAX_IRQ_PRIORITY that behaves as before?

@davefiddes
Copy link
Contributor Author

It could default to 0 and issue a warning. This would essentially remove the mutex.

There isn't a nice default that would just result in a bit of interrupt latency but otherwise correct behaviour on all projects. The ZombieVerter VCU has to have its main timer priority lowered from 0 to 1 before this code does anything at all.

@davefiddes
Copy link
Contributor Author

I've looked at things some more. If CAN_MAX_IRQ_PRIORITY defaults to (0xe << 4) it should work perfectly on stm32-sine and ccs32clara. ZombieVerter VCU and FlyingAdcBms will continue to function as they currently do but need to chill out a little and drop their timer priority to (0x1 << 4) to get the full fix.

Would you be happy with this change?

jsphuebner added a commit that referenced this pull request Oct 8, 2025
* Here is an untested implementation of timeout mechanism for PutChar to handle #39
Call TriggerTimeout() in a fixed period interval

* Fix CAN SDO print timeout (#42)

The class needs to be properly initialised as
the TriggerTimeout() method will likely be
called before any incoming SDO frames. On
the VCU I observed unitialised member
variables being filled with garbage from the
stack painting.

The previous TriggerTimeout() logic causes
timeouts even when no timeout period has
elapsed. This corrupts the parameter
database download. Overshooting and setting
to zero if required seems to work reliably. I
don't understand.

Tests:
 - Repeatedly start a db download, abort and
   immediate retry. This should fail and it does.
  - Repeatedly start a db download, abort and
    wait a few seconds to retry. This should
    succeed and it does.
  - Use callingFrequency values of 97 and 103
    to force over and undershoots. Verify the
    timeout behaviour works as expected.

* Cleanup CanMap, CanSdo and Stm32Can declarations (#43)

Make single parameter constructors explicit.
Use "override" on all overridden virtual methods.
Remove unused timestamp from CanMap.
Remove unused GetFlashAddress from Stm32Can.

Tests:
 - Build as part of VCU project

* Allow concurrent sending of CAN frames - partial fix for issue #40 (#41)

* Allow listing of errors via CANopen SDO (#37)

This exports the list of errors and time at which they
occurred as two CANopen SDO arrays. The sub-
index is used as an index into the list of errors. Any
invalid indices return zero.

Clients can use the "lasterr" parameter database
spot-value which is present in all libopeninv projects
to map the error number to a string.

Tests:
 - Verify with updated OpenInverter CAN Tool that
   errors can be listed.
 - Initiate an new error after a period of time and
   ensure the list tallies with the elapsed wall-clock
   time.
 - Check CANopen SDO decodes as expected in
   Wireshark

* Allow concurrent sending of CAN frames - partial fix for issue #40

Stm32Can::Send() can be called from multiple contexts in a typical
libopeninv application (main loop, CAN RX interrupt or timer interrupt(s)). To
avoid CAN TX frames being lost we should disable interrupts while we
manipulate the TX mailbox and queue.

Tests:
 - On a ZombieVerterVCU experiencing issue #40 repeatedly request clean
   parameter databases with:
 while true; do oic cache clean && oic -n 3 -t 300 read opmode; done
 - Verify databases are downloaded correctly every time
 - Verify ox100 and 0x300 CAN frames sent to VW SBox continue to be sent

* Fix hard realtime performance and have reliable CAN send

This changes uses the capability of ARM Cortex-M3 CPUs to
raise the interrupt priority mask so that low priority interrupts
do not interrupt CAN sending. High priority interrupts form
PWM, overcurrent and emergency stop in applications like
stm32-sine will be processed unimpeded. High priority
interrupts, obviously, cannot send CAN messages.

The change requires:
 - CAN_MAX_IRQ_PRIORITY to be defined in hwdefs.h
 - The priority of all interrupts that can send CAN frames be 1
    or lower (bigger number, lower priority)

Tests:
 - Patch code into Stm32-vcu and stm32-sine
 - Torture test database download with:
  while true; do oic cache clean && oic read opmode; done
 - On stm32-sine set canperiod to 10ms
 - Ensure stm32-sine is in Run. Check jitter on exciter output
   with a DSO with infinite perrsistence. The new code shows
   no jitter. The older code shows a jitter of around 5.2 uSec.

* Don't force projects to define interrupt priorities to fix CAN send (#44)

Realistically only stm32-sine needs to keep PWM timer interrupts
running while CAN frames are being queued for sending. To ease
updating of libopeninv just disable interrupts for all projects
unless CAN_MAX_IRQ_PRIORITY is defined.

Tests:
 - Verify correct operation on an otherwise unmodified Stm32-vcu
 - Verify correct operation with stm32-sine where
    CAN_MAX_IRQ_PRIORITY is defined

---------

Co-authored-by: David J. Fiddes <35607151+davefiddes@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants