Skip to content

amdxdna: per-context DPM scaling, PMF counter integration, and DPM shim tests#1249

Merged
maxzhen merged 2 commits intoamd:mainfrom
NishadSaraf:dpm-test
Apr 13, 2026
Merged

amdxdna: per-context DPM scaling, PMF counter integration, and DPM shim tests#1249
maxzhen merged 2 commits intoamd:mainfrom
NishadSaraf:dpm-test

Conversation

@NishadSaraf
Copy link
Copy Markdown
Member

@NishadSaraf NishadSaraf commented Apr 10, 2026

Summary

  • Per-context DPM level reference counting (staging driver): Introduce
    per-DPM-level refcounts so each hardware context requests the minimum DPM
    level satisfying its QoS parameters (GOPs, FPS, latency) at creation and
    releases it at destruction. The driver programs the hardware to the highest
    level with active references, skipping redundant set_dpm calls. Separate
    aie2_pm_init() into init and resume paths, add POWER_MODE_LOW/MEDIUM, set
    sys_eff_factor to 2, and add per-platform col_opc for DPM calculation.

  • PMF counter integration (OOT driver): Port Platform Management Framework
    support from the staging driver. When CONFIG_AMD_PMF is available (kernel
    7.0+), query live NPU clock frequencies, power draw, and per-column
    utilization via amd_pmf_get_npu_data() instead of returning stale cached
    values. Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the UAPI. Unify
    config_kernel.h generation for both PCI and OF Makefile builds so --nocmake
    builds also detect kernel features.

  • DPM shim tests (NPU4-series only): Add three tests exercised via
    dpm_test_bo_set (inherits elf_io_test_bo_set):

    • TEST_dpm_noop_no_qos: context without fps/latency QoS defaults to max DPM
    • TEST_dpm_power_modes: cycle LOW/MEDIUM/HIGH/TURBO and verify H-clock
    • TEST_dpm_refcount_scaling: incrementally create contexts with increasing
      GOPs demand, verify DPM scales up, destroy in reverse, verify scale-down

@NishadSaraf
Copy link
Copy Markdown
Member Author

CI failed due to dependency on #1221. Marking this as do-not-merge until #1221 is merged

Port the Platform Management Framework (PMF) integration from the
staging driver to the OOT driver. When CONFIG_AMD_PMF is available
(kernel 7.0+), the driver queries live NPU clock frequencies, power
draw and per-column utilization from the PMF subsystem instead of
relying on stale cached values.

Changes:
- Add HAVE_7_0_amd_pmf_get_npu_data compile test to configure_kernel.sh
- Add update_counters callback to aie_hw_ops and wire npu4_update_counters
- Implement npu4_update_counters() in aie2_smu.c using amd_pmf_get_npu_data
- Update aie2_query_clock_metadata() and aie2_query_resource_info() to
  refresh counters via PMF before reporting
- Rewrite aie2_query_sensors() to return real power and per-column
  utilization from PMF, adding AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION
- Add MODULE_IMPORT_NS(AMD_PMF) with kernel version compat
- Unify config_kernel.h generation for both PCI and OF Makefile builds

Signed-off-by: Nishad Saraf <nishads@amd.com>
Add three DPM tests for NPU4-series devices:

- TEST_dpm_noop_no_qos: verify that a context without fps/latency QoS
  defaults to the maximum DPM level
- TEST_dpm_refcount_scaling: incrementally create contexts with
  increasing GOPs demand, verify DPM level scales up to match, then
  destroy them in reverse order and verify DPM scales back down
- TEST_dpm_power_modes: cycle through LOW, MEDIUM, HIGH, and TURBO
  power modes and verify H-clock matches the expected DPM table entry

Supporting changes:
- Add QoS-parameterized hw_ctx constructor to hwctx.h
- Add submit_noop() helper to force FW context creation on OOT driver
  where contexts are virtual until first command submission
- Add verify_hclk() with polling to account for clock ramp-up latency
- Use 2% H-clock margin for hardware tolerance

Signed-off-by: Nishad Saraf <nishads@amd.com>
@maxzhen maxzhen merged commit c695997 into amd:main Apr 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants