Skip to content

[thermalctld] Add per-component polling intervals from platform.json#785

Open
vvolam wants to merge 5 commits intosonic-net:masterfrom
vvolam:add-in-timers
Open

[thermalctld] Add per-component polling intervals from platform.json#785
vvolam wants to merge 5 commits intosonic-net:masterfrom
vvolam:add-in-timers

Conversation

@vvolam
Copy link
Copy Markdown
Contributor

@vvolam vvolam commented Apr 2, 2026

Description

Add support for configurable per-component polling intervals in thermalctld via platform.json. Each component type can specify its own polling rate:

  • fan_drawers[0].polling_interval: interval in seconds for fan updates
  • psus[0].polling_interval: interval in seconds for PSU thermal updates
  • thermals[*].polling_interval: per-thermal sensor interval in seconds

Config entries (without a name key) are inserted at the beginning of fan_drawers / psus arrays. Each thermal device entry with a name can have its own polling_interval.

When any custom thermal intervals are configured, thermals without an explicit polling_interval fall back to the original 60s default instead of running at the fast-loop rate. This prevents unconfigured sensors from being polled too frequently when the main loop interval is reduced to accommodate fast-polling sensors.

The main loop interval adjusts to the minimum of all configured intervals so fast-polling components are serviced on time, while slower components are throttled via per-component timestamp tracking.

Motivation and Context

On platforms with many thermal sensors, polling all sensors every cycle can add unnecessary overhead. Some sensors (e.g., ASIC temperature) need frequent monitoring while others (e.g., ambient temps, SODIMM) can be polled less often. This change allows platform vendors to fine-tune polling rates per component via platform.json without code changes.

How Has This Been Tested?

  1. Unit tests: Added tests for _parse_platform_json_polling_intervals(), _should_update_thermal() (including default_interval behavior), and PSU interval gating. All pass locally via pytest.

  2. Testbed verification on SN5640: Ran tests on a physical testbed with the following platform.json intervals:

    • fan_drawers: polling_interval: 40
    • psus: polling_interval: 30
    • thermals: ASIC=5s, Ambient Fan Side=10s, Port Side=15s, CPU Pack=20s, SODIMM 2=25s

    Verified:

    • Each component updates at its configured rate (±tolerance for execution overhead)
    • Unconfigured thermals fall back to 60s default
    • Cross-check: fastest thermal (ASIC=5s) updates more often than fans (40s)
    • No errors in thermalctld syslog
  3. Backward compatibility: Without any polling_interval in platform.json, behavior is identical to before (all components update every 60s cycle).

Additional Information (Optional)

Example platform.json snippet:

{
    "chassis": {
        "fan_drawers": [
            {"polling_interval": "40"},
            {"name": "drawer1", "fans": [{"name": "fan1"}]}
        ],
        "psus": [
            {"polling_interval": "30"},
            {"name": "PSU 1", "thermals": [{"name": "PSU-1 Temp"}]}
        ],
        "thermals": [
            {"name": "ASIC", "polling_interval": "5"},
            {"name": "CPU Pack Temp", "polling_interval": "20"}
        ]
    }
}

Add support for configurable per-component polling intervals via
platform.json. Each component type can specify its own polling rate:

- fan_drawers[0].polling_interval: interval for fan updates
- psus[0].polling_interval: interval for PSU thermal updates
- thermals[*].polling_interval: per-thermal sensor interval

When any custom thermal intervals are configured, thermals without an
explicit polling_interval fall back to the original 60s default instead
of running at the fast-loop rate. This prevents unconfigured sensors
from being polled too frequently when the main loop interval is reduced
to accommodate fast-polling sensors.

The main loop interval adjusts to the minimum of all configured
intervals so that fast-polling components are serviced on time, while
slower components are throttled via per-component timestamp tracking.

Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam vvolam marked this pull request as ready for review April 2, 2026 01:53
@vvolam vvolam requested a review from judyjoseph April 2, 2026 01:53
@Junchao-Mellanox Junchao-Mellanox requested a review from keboliu April 2, 2026 02:05
Comment thread sonic-thermalctld/scripts/thermalctld Outdated
Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam vvolam requested a review from Junchao-Mellanox April 3, 2026 21:09
Comment thread sonic-thermalctld/scripts/thermalctld Outdated
Comment thread sonic-thermalctld/scripts/thermalctld Outdated
vvolam added 2 commits April 10, 2026 04:14
Address review feedback:
- Default _fan_update_interval to update_interval (60s) instead of None
  so fans are properly throttled when the main loop is sped up.
- Add polling interval gating to _collect_sfp_thermals so SFP thermals
  also respect the configured default polling interval.
- Update tests accordingly.

Signed-off-by: Vasundhara Volam <vvolam@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam vvolam requested a review from Junchao-Mellanox April 10, 2026 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants