[thermalctld] Add per-component polling intervals from platform.json#785
Open
vvolam wants to merge 5 commits intosonic-net:masterfrom
Open
[thermalctld] Add per-component polling intervals from platform.json#785vvolam wants to merge 5 commits intosonic-net:masterfrom
vvolam wants to merge 5 commits intosonic-net:masterfrom
Conversation
Add support for configurable per-component polling intervals via platform.json. Each component type can specify its own polling rate: - fan_drawers[0].polling_interval: interval for fan updates - psus[0].polling_interval: interval for PSU thermal updates - thermals[*].polling_interval: per-thermal sensor interval When any custom thermal intervals are configured, thermals without an explicit polling_interval fall back to the original 60s default instead of running at the fast-loop rate. This prevents unconfigured sensors from being polled too frequently when the main loop interval is reduced to accommodate fast-polling sensors. The main loop interval adjusts to the minimum of all configured intervals so that fast-polling components are serviced on time, while slower components are throttled via per-component timestamp tracking. Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Junchao-Mellanox
requested changes
Apr 2, 2026
Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Address review feedback: - Default _fan_update_interval to update_interval (60s) instead of None so fans are properly throttled when the main loop is sped up. - Add polling interval gating to _collect_sfp_thermals so SFP thermals also respect the configured default polling interval. - Update tests accordingly. Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add support for configurable per-component polling intervals in thermalctld via platform.json. Each component type can specify its own polling rate:
fan_drawers[0].polling_interval: interval in seconds for fan updatespsus[0].polling_interval: interval in seconds for PSU thermal updatesthermals[*].polling_interval: per-thermal sensor interval in secondsConfig entries (without a
namekey) are inserted at the beginning offan_drawers/psusarrays. Each thermal device entry with anamecan have its ownpolling_interval.When any custom thermal intervals are configured, thermals without an explicit
polling_intervalfall back to the original 60s default instead of running at the fast-loop rate. This prevents unconfigured sensors from being polled too frequently when the main loop interval is reduced to accommodate fast-polling sensors.The main loop interval adjusts to the minimum of all configured intervals so fast-polling components are serviced on time, while slower components are throttled via per-component timestamp tracking.
Motivation and Context
On platforms with many thermal sensors, polling all sensors every cycle can add unnecessary overhead. Some sensors (e.g., ASIC temperature) need frequent monitoring while others (e.g., ambient temps, SODIMM) can be polled less often. This change allows platform vendors to fine-tune polling rates per component via platform.json without code changes.
How Has This Been Tested?
Unit tests: Added tests for
_parse_platform_json_polling_intervals(),_should_update_thermal()(includingdefault_intervalbehavior), and PSU interval gating. All pass locally viapytest.Testbed verification on SN5640: Ran tests on a physical testbed with the following platform.json intervals:
fan_drawers:polling_interval: 40psus:polling_interval: 30thermals: ASIC=5s, Ambient Fan Side=10s, Port Side=15s, CPU Pack=20s, SODIMM 2=25sVerified:
Backward compatibility: Without any
polling_intervalin platform.json, behavior is identical to before (all components update every 60s cycle).Additional Information (Optional)
Example platform.json snippet:
{ "chassis": { "fan_drawers": [ {"polling_interval": "40"}, {"name": "drawer1", "fans": [{"name": "fan1"}]} ], "psus": [ {"polling_interval": "30"}, {"name": "PSU 1", "thermals": [{"name": "PSU-1 Temp"}]} ], "thermals": [ {"name": "ASIC", "polling_interval": "5"}, {"name": "CPU Pack Temp", "polling_interval": "20"} ] } }