Skip to content

Conversation

@oreomaker
Copy link
Collaborator

@oreomaker oreomaker commented Jan 1, 2026

Build system and environment variable standardization:

  • Replaced all references to the QNN_SDK_ROOT environment variable with QAIRT_SDK_ROOT in documentation and CMake files to standardize SDK path usage and avoid confusion. [1] [2] [3] [4] [5]

  • Updated CMake build logic to modularize QNN backend source inclusion and subdirectory addition, ensuring that QNN sources are correctly included when building for Qualcomm QNN AOT or QNN backend support. [1] [2] [3] [4]

QNN backend code simplification and robustness:

  • Refactored QNNAllocator by removing unused buffer tracking, statistics, and fallback logic, and simplified the destructor to ensure proper deregistration and memory cleanup. The registerQnnTensorToSharedBuffer interface was also simplified. [1] [2] [3] [4]

  • Simplified the QNNTensorWrapper allocation logic, removing complex buffer reuse and fallback code, and added helpers for setting quantization parameters with proper memory management. [1] [2]

  • Centralized and expanded QNN data type conversion logic, adding mllmDataTypeToQnnDataType and qnnDataTypeToSize utility functions for robust and maintainable type mapping. [1] [2] [3] [4]

These changes collectively improve maintainability, reduce potential for errors, and make the QNN backend codebase cleaner and easier to understand.

Summary by CodeRabbit

  • Documentation

    • Updated SDK environment variable references from QNN_SDK_ROOT to QAIRT_SDK_ROOT across setup and build docs.
  • New Features

    • Added tensor quantization configuration (scale-offset and blockwise) and improved datatype translation utilities.
    • Introduced wrapper-based backend model for more robust graph and tensor handling.
  • Bug Fixes

    • Simplified memory allocation/registration and consolidated backend build logic for more stable builds and reduced overhead.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 1, 2026

📝 Walkthrough

Walkthrough

Refactors the QNN backend: standardizes SDK env var to QAIRT_SDK_ROOT, simplifies QNNAllocator memory management, adds datatype/quantization utilities, and rewires AOT compilation from direct qnn_interface calls to wrapper-based QNNModel abstractions.

Changes

Cohort / File(s) Summary
Documentation & Setup
docs/qnn_backend/setup_env.rst
Replace QNN_SDK_ROOT references with QAIRT_SDK_ROOT in install and verification steps.
Build Configuration – Core & Backend
mllm/CMakeLists.txt, mllm/backends/qnn/CMakeLists.txt
Centralize inclusion of backends/qnn under OR condition for AOT/on-build flags; remove conditional Qualcomm AOT src glob and explicit SDK include blocks; switch include refs to QAIRT_SDK_ROOT; add MllmRT public linkage.
Build Configuration – Examples
examples/qwen3_qnn_aot/CMakeLists.txt
Add MllmQNNBackend to target link libraries; remove explicit QAIRT_SDK_ROOT include block.
Allocator Simplification
mllm/backends/qnn/QNNAllocator.hpp, mllm/backends/qnn/QNNAllocator.cpp
Drastically simplify allocation/register/free flows, remove registration/alias maps and helper APIs, change registerQnnTensorToSharedBuffer signature to (void*, Qnn_Tensor_t&), and add inline destructor cleanup in header.
Utilities & Tensor Wrappers
mllm/backends/qnn/QNNUtils.hpp, mllm/backends/qnn/QNNUtils.cpp
Add mllmDataTypeToQnnDataType and qnnDataTypeToSize; add quantization helpers and state in QNNTensorWrapper; make alloc() idempotent and register tensors via backend allocator; remove global QNNDataTypeToSize map and resetAlloc.
AOT Wrapper API Refactor
mllm/backends/qnn/aot/QnnWrappersAPI.hpp, mllm/backends/qnn/aot/QnnWrappersAPI.cpp
Replace direct qnn_interface usage with QNNModel wrapper paths; change many public/internal signatures to use QNNParam*Wrapper and QNNTensorWrapper; alter QnnAOTGraph constructor to accept QNN_INTERFACE_VER_TYPE, backend/context handles, and use qnn_model_ for graph lifecycle.
AOT Visitor Behavior
mllm/backends/qnn/aot/visitor/Base.hpp
QnnAOTBasePattern::rewrite now delegates to virtual compile(writer, node) instead of returning false, allowing compile to drive rewrite.
Custom Op Package Makefile
mllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile
Switch env var to QAIRT_SDK_ROOT for include/lib paths and add guards that error/warn when target lib missing.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Env as QnnAOTEnv
    participant Model as QNNModel (wrapper)
    participant Backend as QNN Backend / QNN Interface
    participant Context as Qnn_Context
    participant Graph as QNN Graph

    Env->>Model: request/create graph (qnnInterface, backendHandle, contextHandle, graphName)
    alt context present
        Model->>Backend: initialize model (create graph state)
        Backend-->>Model: model handle / capabilities
        Model->>Context: attach context & allocate resources
        Model-->>Env: return wrapped graph handle
        Env->>Graph: compile via qnn_model_->finalizeGraph(...)
        Graph-->>Model: finalize result
        Model-->>Env: report success / graph ready
    else missing context
        Env-->>Env: return nullptr (graph not captured)
    end
    note right of Model: Quant/Type handling delegated\nto QNNTensorWrapper utilities
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • liang1232018
  • yirongjie

Poem

🐰
Hops and hops through QAIRT light,
Wrappers snug and allocs light,
Tensors dressed in quantized art,
Graphs assemble, ready to start,
A rabbit cheers this tidy part!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.27% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Qnn aot refactor' is too generic and vague; it does not clearly convey the specific changes made in this comprehensive refactoring. Consider a more descriptive title that highlights the main objectives, such as 'Refactor QNN backend with QAIRT_SDK_ROOT standardization and allocator simplification' or 'Standardize QNN SDK paths and simplify QNN allocator logic'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive, well-structured, and covers all major changes with relevant reference links to specific diffs for each category of modifications.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fbccd0 and 0d08209.

📒 Files selected for processing (12)
  • docs/qnn_backend/setup_env.rst
  • examples/qwen3_qnn_aot/CMakeLists.txt
  • mllm/CMakeLists.txt
  • mllm/backends/qnn/CMakeLists.txt
  • mllm/backends/qnn/QNNAllocator.cpp
  • mllm/backends/qnn/QNNAllocator.hpp
  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile
🧰 Additional context used
📓 Path-based instructions (5)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/CMakeLists.txt
  • mllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile
  • mllm/CMakeLists.txt
  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/QNNAllocator.hpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/QNNAllocator.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{toml,json,txt,md}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

When suggesting new dependencies, evaluate their maturity, licensing, and maintenance status. Prefer well-established and actively maintained libraries.

Files:

  • mllm/backends/qnn/CMakeLists.txt
  • mllm/CMakeLists.txt
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/QNNAllocator.hpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/QNNAllocator.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/QNNAllocator.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/QNNAllocator.hpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/QNNAllocator.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
🧬 Code graph analysis (4)
mllm/backends/qnn/QNNUtils.cpp (1)
mllm/backends/qnn/QNNUtils.hpp (8)
  • mllmDataTypeToQnnDataType (52-52)
  • dataContainer_ (204-204)
  • qnnTensor_ (196-196)
  • scaleOffsets (208-208)
  • blockwise (209-209)
  • qnnDataTypeToSize (53-53)
  • qnnParam_ (242-242)
  • qnnParam_ (243-243)
mllm/backends/qnn/QNNAllocator.hpp (1)
mllm/backends/qnn/QNNAllocator.cpp (2)
  • QNNAllocator (16-23)
  • QNNAllocator (25-35)
mllm/backends/qnn/QNNAllocator.cpp (1)
mllm/backends/qnn/QNNAllocator.hpp (16)
  • ptr (58-61)
  • ptr (58-58)
  • ptr (62-62)
  • ptr (62-62)
  • ptr (78-78)
  • ptr (80-80)
  • storage (49-49)
  • storage (51-51)
  • storage (51-51)
  • storage (53-53)
  • storage (55-55)
  • storage (55-55)
  • storage (64-69)
  • storage (64-64)
  • storage (71-71)
  • storage (71-71)
mllm/backends/qnn/QNNUtils.hpp (1)
mllm/core/Tensor.cpp (2)
  • dtype (468-468)
  • dtype (468-468)
🪛 Clang (14.0.6)
mllm/backends/qnn/QNNUtils.cpp

[error] 303-303: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 415-415: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 454-454: variable 'dataType' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 454-454: invalid case style for variable 'dataType'

(readability-identifier-naming,-warnings-as-errors)


[error] 504-504: 2 adjacent parameters of 'setScaleOffsetQuantization' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 515-515: 2 adjacent parameters of 'setBlockwiseQuantization' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 584-584: variable 'dataSize' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 584-584: invalid case style for variable 'dataSize'

(readability-identifier-naming,-warnings-as-errors)

mllm/backends/qnn/QNNAllocator.hpp

[error] 33-33: use '= default' to define a trivial destructor

(modernize-use-equals-default,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 51-51: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

🪛 Cppcheck (2.19.0)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

[error] 417-417: There is an unknown macro here somewhere. Configuration is required. If MLLM_RT_ASSERT_EQ is a macro then please configure it.

(unknownMacro)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-x86
  • GitHub Check: build-android
  • GitHub Check: build-macos
🔇 Additional comments (25)
docs/qnn_backend/setup_env.rst (1)

34-34: LGTM! Environment variable migration is clear and consistent.

The documentation correctly updates all references from QNN_SDK_ROOT to QAIRT_SDK_ROOT, maintaining consistency across installation instructions, verification steps, and prerequisites.

Also applies to: 71-71, 87-87

mllm/backends/qnn/CMakeLists.txt (2)

16-22: LGTM! AOT source inclusion is properly conditional.

The conditional compilation logic correctly includes AOT sources only when MLLM_QUALCOMM_QNN_AOT_ON_X86_ENABLE is set, using GLOB_RECURSE to gather all relevant source files and appending them to the main source list.


35-35: LGTM! Status message updated consistently.

The status message correctly references QAIRT_SDK_ROOT, maintaining consistency with the environment variable migration.

mllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile (1)

4-5: LGTM! Makefile properly migrates to QAIRT_SDK_ROOT with robust error handling.

The environment variable migration is cleanly executed with clear informational messages and proper fallback logic. The existing error checking (lines 7-20) ensures that missing SDK paths are caught early with helpful error messages.

Also applies to: 11-12

mllm/CMakeLists.txt (1)

117-119: LGTM! Backend inclusion properly consolidated.

The centralized conditional inclusion using an OR operation correctly handles both QNN AOT and QNN backend scenarios, eliminating duplication and improving maintainability. The backend's own CMakeLists.txt now handles the conditional source inclusion internally.

examples/qwen3_qnn_aot/CMakeLists.txt (1)

2-2: LGTM! QNN backend properly linked.

Adding MllmQNNBackend to the link libraries correctly enables QNN AOT functionality. The removal of explicit QAIRT_SDK_ROOT include paths (noted in the summary) is appropriate since these are now provided transitively through MllmQNNBackend's PUBLIC interface.

mllm/backends/qnn/QNNAllocator.hpp (2)

33-40: Destructor cleanup logic looks correct but could silently fail partially.

The destructor correctly iterates and cleans up resources. However, if memDeRegister fails for one entry, the error is logged but iteration continues. This is reasonable for a destructor (shouldn't throw), but consider whether partial cleanup failures could leave the system in an inconsistent state.

Note: The static analysis suggestion to use = default is a false positive since this destructor has non-trivial cleanup logic.


78-78: Signature change simplifies the interface.

Changing from Storage* to void* and removing the return type aligns with the refactored implementation that uses assertions for error conditions rather than return values. This is acceptable for internal APIs where failures are programming errors.

mllm/backends/qnn/QNNUtils.hpp (2)

52-54: Good addition of centralized datatype utilities.

These utility functions centralize datatype translation logic, reducing code duplication across the codebase. The declarations align with implementations in QNNUtils.cpp.


207-222: Quantization helpers with proper lifetime management.

The new setScaleOffsetQuantization and setBlockwiseQuantization methods, along with private storage members scaleOffsets_ and blockwiseExpansion_, ensure quantization parameter data outlives the tensor wrapper. This addresses potential dangling pointer issues when QNN references this data.

mllm/backends/qnn/aot/visitor/Base.hpp (1)

17-22: Pattern delegation to compile() enables extensibility.

The rewrite() method now delegates to the pure virtual compile() method, enabling derived classes to implement the actual rewriting logic. This is a clean abstraction that separates the pattern interface from the implementation.

mllm/backends/qnn/QNNUtils.cpp (3)

487-502: Idempotent allocation with proper registration flow.

The early return guard prevents double allocation, and the conditional check before alloc() allows pre-allocated storage to be registered. The flow correctly obtains the QNN allocator from context and registers the tensor.


504-525: Quantization setters correctly manage memory lifetime.

Both methods copy input data to member storage (scaleOffsets_, blockwiseExpansion_) before setting pointers in the QNN tensor structure. This ensures the quantization data remains valid for the lifetime of the wrapper.


302-392: Verify kInt4/kUInt4 QNN datatype mapping against QNN SDK source.

The mllmDataTypeToQnnDataType function comprehensively handles data types including quantized variants. However, the FIXME comment at line 349 ("Maybe error here") for kInt4/kUInt4 mapping to QNN_DATATYPE_SFIXED_POINT_4/QNN_DATATYPE_UFIXED_POINT_4 indicates unresolved uncertainty by the original author. Public QNN SDK documentation does not explicitly confirm this mapping. Verify against the Qualcomm QNN SDK header files (e.g., QnnDataTypes.h) to confirm the mapping is correct, or update the code and FIXME comment based on findings.

Note: The static analysis warning about ret being uninitialized is a false positive—it is correctly initialized to QNN_DATATYPE_UNDEFINED on line 303.

mllm/backends/qnn/QNNAllocator.cpp (4)

37-46: Simplified allocation path is clean.

The allocation flow directly uses allocSize(storage) and inserts the pointer into the tracking set. The assertion ensures allocation succeeded.


48-55: Free handles both registered and unregistered memory.

The conditional check before deregistration correctly handles memory that may or may not have been registered with QNN. This is important for cleanup of storage that was allocated but never used with registerQnnTensorToSharedBuffer.


57-92: Registration flow handles re-registration gracefully.

The early return for already-registered pointers avoids duplicate registration while still updating the tensor's mem handle. The implementation correctly creates the memory descriptor and registers with QNN.


94-98: Deregistration asserts pointer exists before cleanup.

The assertion MLLM_RT_ASSERT_EQ(ptrToFdAndMemHandleMap_.count(ptr), 1) will abort if called with an unregistered pointer. This is appropriate for internal APIs where calling deregister on unregistered memory is a programming error.

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (3)

43-65: QnnAOTNodeTensor now uses wrapper abstraction.

The transition from raw Qnn_Tensor_t to QNNTensorWrapper improves resource management. The getWrapper() accessor provides access to the underlying wrapper.

Note: The static analysis warning about parameter name 'v' being too short is a style preference that can be addressed later if desired.


85-108: Parameter handling uses wrapper types consistently.

The addParamScalar, emplaceParamScalar, addParamTensor, and emplaceParamTensor methods now accept QNNParamScalarWrapper and QNNParamTensorWrapper shared pointers, aligning with the wrapper-based architecture.


115-128: QnnAOTGraph delegates to QNNModel.

The constructor now accepts explicit QNN interface handles and delegates graph management to QNNModel. The private qnn_model_ member encapsulates the underlying QNN operations.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (4)

24-34: Constructor delegates to wrapper creation methods.

The constructor correctly determines whether to create a static or non-static tensor wrapper based on the force_static_weight parameter and tensor type. This simplifies the tensor creation logic.


99-101: Datatype parsing now uses centralized utility.

Delegating to mllmDataTypeToQnnDataType reduces code duplication and ensures consistent datatype mapping across the codebase.


239-266: QnnAOTGraph uses QNNModel for graph operations.

The constructor creates a QNNModel and initializes it with the context and graph name. addOperation adds tensors and nodes via the model, and compile() calls finalizeGraph(). This abstracts away direct QNN interface calls.


548-559: Graceful handling of missing context.

Returning nullptr instead of asserting when the context is not found allows callers to handle the error condition more gracefully. This is an improvement over hard assertions for recoverable error cases.

Comment on lines +157 to +166
Qnn_BlockwiseExpansion_t blockwise_expansion;
blockwise_expansion.axis = cfg->ch_axis;
blockwise_expansion.axis = cfg->ch_axis;
blockwise_expansion.scaleOffsets = nullptr; // Will be set by setBlockwiseQuantization
blockwise_expansion.numBlocksPerAxis = v->tensor_.size(cfg->ch_axis) / cfg->block_size;
blockwise_expansion.blockScaleBitwidth = 12; // 12 bits for 4 to 16 expansion
blockwise_expansion.blockScaleStorageType = QNN_BLOCKWISE_EXPANSION_BITWIDTH_SCALE_STORAGE_8;
blockwise_expansion.blocksScale8 = cfg->scale_level_0_int.ptr<mllm_uint8_t>();

ret.encodingDefinition = QNN_DEFINITION_DEFINED;
ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_BLOCKWISE_EXPANSION;
ret.blockwiseExpansion = block_scale_array;
tensor_wrapper_->setBlockwiseQuantization(blockwise_expansion, scale_offsets);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate assignment of blockwise_expansion.axis.

Line 158 and 159 both assign to blockwise_expansion.axis. This appears to be a copy-paste error.

🔎 Proposed fix to remove duplicate assignment
       Qnn_BlockwiseExpansion_t blockwise_expansion;
       blockwise_expansion.axis = cfg->ch_axis;
-      blockwise_expansion.axis = cfg->ch_axis;
       blockwise_expansion.scaleOffsets = nullptr;  // Will be set by setBlockwiseQuantization
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 157 to 166 there is a
duplicate assignment to blockwise_expansion.axis (lines 158-159); remove the
redundant second assignment so axis is set only once (keep the correct
cfg->ch_axis assignment), leaving the rest of blockwise_expansion initialization
unchanged.


target_include_directories(MllmQNNBackend PUBLIC
$ENV{QNN_SDK_ROOT}/include/QNN # QNN SDK include
$ENV{QAIRT_SDK_ROOT}/include/QNN # QNN SDK include
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the CMakeLists.txt file around line 31
cat -n mllm/backends/qnn/CMakeLists.txt | head -50

Repository: UbiquitousLearning/mllm

Length of output: 1379


🏁 Script executed:

# Search for all QAIRT_SDK_ROOT usage in CMake files
rg -n "QAIRT_SDK_ROOT" --type cmake -B 2 -A 2

Repository: UbiquitousLearning/mllm

Length of output: 1040


🏁 Script executed:

# Search for validation patterns of environment variables in CMake files
rg -n "if.*NOT.*DEFINED.*ENV" --type cmake -B 1 -A 3

Repository: UbiquitousLearning/mllm

Length of output: 399


🏁 Script executed:

# Search for Makefile references mentioned in the review
fd -e Makefile | head -20

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Check for QNN_SDK_ROOT references to understand migration context
rg -n "QNN_SDK_ROOT" -B 1 -A 1

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Search for how QAIRT_SDK_ROOT is set or documented
rg -n "QAIRT_SDK_ROOT" -B 3 -A 3

Repository: UbiquitousLearning/mllm

Length of output: 4854


🏁 Script executed:

# Check if there's a custom-op-package directory and Makefile referenced in review
find . -type d -name "*custom*op*" | head -10
find . -name "Makefile" | head -20

Repository: UbiquitousLearning/mllm

Length of output: 180


🏁 Script executed:

# Check if there's any validation or setup script for QAIRT_SDK_ROOT
fd -e "\.sh$|\.md$" | xargs rg -l "QAIRT_SDK_ROOT" 2>/dev/null | head -10

Repository: UbiquitousLearning/mllm

Length of output: 243


Add validation for QAIRT_SDK_ROOT environment variable before use.

The CMakeLists.txt uses $ENV{QAIRT_SDK_ROOT} without checking if the environment variable is defined. While the associated Makefile in custom-op-package/LLaMAPackage/ validates this with error handling, the CMake build files do not. If QAIRT_SDK_ROOT is unset, the include path becomes /include/QNN, causing a confusing compile failure rather than a clear configuration error.

Add validation similar to other backends in the codebase:

if(NOT DEFINED ENV{QAIRT_SDK_ROOT})
  message(FATAL_ERROR "QAIRT_SDK_ROOT environment variable is not set. Please set it to point to the Qualcomm AI SDK root directory.")
endif()

This check should be added early in the CMakeLists.txt files (both mllm/backends/qnn/CMakeLists.txt and mllm/ffi/CMakeLists.txt) before the environment variable is referenced.

Comment on lines +394 to +422
size_t qnnDataTypeToSize(Qnn_DataType_t dtype) {
switch (dtype) {
case QNN_DATATYPE_INT_8:
case QNN_DATATYPE_UINT_8:
case QNN_DATATYPE_BOOL_8:
case QNN_DATATYPE_SFIXED_POINT_8:
case QNN_DATATYPE_UFIXED_POINT_8: return 1;

case QNN_DATATYPE_INT_16:
case QNN_DATATYPE_UINT_16:
case QNN_DATATYPE_FLOAT_16:
case QNN_DATATYPE_SFIXED_POINT_16:
case QNN_DATATYPE_UFIXED_POINT_16: return 2;

case QNN_DATATYPE_INT_32:
case QNN_DATATYPE_UINT_32:
case QNN_DATATYPE_FLOAT_32:
case QNN_DATATYPE_SFIXED_POINT_32:
case QNN_DATATYPE_UFIXED_POINT_32: return 4;

case QNN_DATATYPE_INT_64:
case QNN_DATATYPE_UINT_64: return 8;

default:
MLLM_ERROR("qnnDataTypeToSize: unsupported Qnn_DataType_t {}", static_cast<int>(dtype));
MLLM_RT_ASSERT(false);
return 0;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Size mapping covers most QNN datatypes.

The function handles common QNN datatypes correctly. However, it doesn't cover QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 which are used in mllmDataTypeToQnnDataType for kInt4/kUInt4. This could cause a runtime assertion failure if 4-bit types are used with qnnDataTypeToSize.

🔎 Proposed fix to add 4-bit type handling
 size_t qnnDataTypeToSize(Qnn_DataType_t dtype) {
   switch (dtype) {
+    // 4-bit types are packed, return 1 byte as minimum addressable unit
+    case QNN_DATATYPE_SFIXED_POINT_4:
+    case QNN_DATATYPE_UFIXED_POINT_4: return 1;  // Note: actual size is 0.5 bytes
+
     case QNN_DATATYPE_INT_8:
     case QNN_DATATYPE_UINT_8:
🧰 Tools
🪛 Clang (14.0.6)

[error] 415-415: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/backends/qnn/QNNUtils.cpp around lines 394-422, qnnDataTypeToSize is
missing cases for QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4
which leads to the default assert if 4-bit types are passed; add those two enum
cases to the switch and return an appropriate byte-size (use 1 to represent the
storage unit) for each, and optionally add a brief comment noting these are
4-bit types packed two-per-byte so callers should handle packing/unpacking if
needed.

Copy link
Owner

@UbiquitousLearning UbiquitousLearning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Fix all issues with AI Agents 🤖
In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp:
- Around line 261-266: The compile() implementation sets is_compiled_
unconditionally which can mark a failed compilation as successful; change
QnnAOTGraph::compile so it calls qnn_model_->finalizeGraph(...) and only sets
is_compiled_ = true when that call returns mllm::qnn::MODEL_NO_ERROR (i.e., on
success), returning the success boolean otherwise; keep the early-return when
is_compiled_ is already true.

In @mllm/backends/qnn/QNNUtils.hpp:
- Around line 218-221: The member Qnn_BlockwiseExpansion_t blockwiseExpansion_
is left uninitialized; update the class constructor to explicitly initialize
blockwiseExpansion_ (for example via the constructor's member initializer list
using blockwiseExpansion_{} or equivalent) so it has a well-defined state from
construction; keep setBlockwiseQuantization() usage unchanged but ensure
blockwiseExpansion_ is default-initialized in the constructor that declares
scaleOffsets_ and blockwiseExpansion_.
♻️ Duplicate comments (1)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

157-166: Duplicate assignment of blockwise_expansion.axis.

Lines 158 and 159 both assign to blockwise_expansion.axis. This appears to be a copy-paste error.

🔎 Proposed fix to remove duplicate assignment
       Qnn_BlockwiseExpansion_t blockwise_expansion;
       blockwise_expansion.axis = cfg->ch_axis;
-      blockwise_expansion.axis = cfg->ch_axis;
       blockwise_expansion.scaleOffsets = nullptr;  // Will be set by setBlockwiseQuantization
🧹 Nitpick comments (1)
mllm/backends/qnn/QNNUtils.hpp (1)

207-209: Consider adding detailed documentation for quantization methods.

While a brief comment is present, these public methods would benefit from more comprehensive documentation explaining:

  • When these methods should be called relative to alloc()
  • Parameter constraints (e.g., valid range for axis)
  • Expected behavior and side effects
  • Example usage patterns
📝 Suggested documentation format
-  // Helper to set complex quantization params and manage memory
+  /**
+   * @brief Sets scale-offset quantization parameters for the tensor
+   * @param scaleOffsets Vector of scale-offset pairs for quantization
+   * @param axis The axis along which quantization is applied (-1 for per-tensor)
+   * @note Must be called before alloc() to take effect
+   */
   void setScaleOffsetQuantization(const std::vector<Qnn_ScaleOffset_t>& scaleOffsets, int32_t axis);
+  
+  /**
+   * @brief Sets blockwise quantization parameters for the tensor
+   * @param blockwise Blockwise expansion configuration
+   * @param scaleOffsets Vector of scale-offset pairs for blockwise quantization
+   * @note Must be called before alloc() to take effect
+   */
   void setBlockwiseQuantization(const Qnn_BlockwiseExpansion_t& blockwise, const std::vector<Qnn_ScaleOffset_t>& scaleOffsets);
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d08209 and 550380c.

📒 Files selected for processing (2)
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
🧬 Code graph analysis (2)
mllm/backends/qnn/QNNUtils.hpp (2)
mllm/core/Tensor.hpp (1)
  • dtype (412-412)
mllm/core/Tensor.cpp (2)
  • dtype (506-506)
  • dtype (506-506)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
mllm/backends/qnn/QNNUtils.hpp (7)
  • name (185-186)
  • name (189-190)
  • create (255-257)
  • create (255-255)
  • mllmDataTypeToQnnDataType (52-52)
  • __mllmQnnLoggerCallback (92-130)
  • __mllmQnnLoggerCallback (92-92)
mllm/backends/qnn/QNNUtils.cpp (8)
  • createStaticTensor (466-485)
  • createStaticTensor (466-467)
  • create (447-464)
  • create (447-448)
  • create (551-555)
  • create (551-553)
  • mllmDataTypeToQnnDataType (302-392)
  • mllmDataTypeToQnnDataType (302-302)
🪛 Cppcheck (2.19.0)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

[error] 417-417: There is an unknown macro here somewhere. Configuration is required. If MLLM_RT_ASSERT_EQ is a macro then please configure it.

(unknownMacro)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-macos
  • GitHub Check: build-android
  • GitHub Check: build-x86
🔇 Additional comments (9)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (7)

24-34: Well-structured constructor refactor.

The constructor now properly delegates tensor creation to wrapper methods based on tensor type, with clean separation between static and dynamic tensor handling.


99-101: Good centralization of data type conversion.

Using the utility function mllmDataTypeToQnnDataType improves maintainability by centralizing type mapping logic.


122-136: Improved memory management for per-channel quantization.

The refactor replaces manual memory allocation with std::vector and delegates quantization setup to the wrapper's setScaleOffsetQuantization method, improving both safety and maintainability.


198-220: Well-designed API modernization.

The parameter handling methods now use std::shared_ptr for wrapper types instead of raw pointers, providing automatic memory management and clearer ownership semantics.


239-243: Clean wrapper-based initialization.

The constructor now properly delegates graph management to QNNModel, following the new wrapper-based architecture.


245-259: Well-structured operation addition.

The method cleanly gathers tensor names, registers wrappers with the model, and adds nodes using the wrapper-based API.


548-559: Proper error handling for missing context.

The method correctly returns nullptr with an error log when the context is not found, and cleanly manages graph creation through the wrapper-based constructor.

mllm/backends/qnn/QNNUtils.hpp (2)

23-23: Documentation updated to reflect SDK variable standardization.

The comment now correctly references QAIRT_SDK_ROOT instead of QNN_SDK_ROOT, consistent with the PR's standardization objectives.


52-53: Implementation confirmed with proper error handling.

Both utility functions are implemented in QNNUtils.cpp with comprehensive switch statements that handle data type conversions and size calculations. The qnnDataTypeToSize() function includes explicit error handling for unsupported types via MLLM_ERROR and MLLM_RT_ASSERT(false), and mllmDataTypeToQnnDataType() similarly validates input with error reporting for invalid data types.

Comment on lines 261 to 266
bool QnnAOTGraph::compile() {
if (is_compiled_) { return true; }

auto env = AOTCompileContext::getInstance().getEnv();
auto qnn_interface = env->getFuncSymbol().qnn_interface_;
qnn_interface.graphFinalize(qnn_graph_handle_, env->getContext(belongs_context_name_)->profile_bk_handle_, nullptr);

bool ret = qnn_model_->finalizeGraph(nullptr, nullptr) == mllm::qnn::MODEL_NO_ERROR;
is_compiled_ = true;
return true;
return ret;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Set is_compiled_ only on successful compilation.

Line 264 sets is_compiled_ = true unconditionally, even when finalizeGraph fails. This could lead to incorrect internal state where the graph appears compiled but actually failed.

🔎 Proposed fix to set flag only on success
 bool QnnAOTGraph::compile() {
   if (is_compiled_) { return true; }
   bool ret = qnn_model_->finalizeGraph(nullptr, nullptr) == mllm::qnn::MODEL_NO_ERROR;
-  is_compiled_ = true;
+  if (ret) { is_compiled_ = true; }
   return ret;
 }
🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 261-266, The compile()
implementation sets is_compiled_ unconditionally which can mark a failed
compilation as successful; change QnnAOTGraph::compile so it calls
qnn_model_->finalizeGraph(...) and only sets is_compiled_ = true when that call
returns mllm::qnn::MODEL_NO_ERROR (i.e., on success), returning the success
boolean otherwise; keep the early-return when is_compiled_ is already true.

Comment on lines +218 to +221

// Storage for quantization parameters to ensure lifetime matches the tensor wrapper
std::vector<Qnn_ScaleOffset_t> scaleOffsets_;
Qnn_BlockwiseExpansion_t blockwiseExpansion_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify blockwiseExpansion_ initialization

echo "=== Finding QNNTensorWrapper class and constructor ==="
rg -n "class QNNTensorWrapper" --type=hpp -A 20 | head -50

echo -e "\n=== Searching for QNNTensorWrapper constructor in implementation files ==="
rg -n "QNNTensorWrapper::QNNTensorWrapper" --type=cpp

echo -e "\n=== Checking QNNUtils.cpp for constructor implementation ==="
find . -name "QNNUtils.cpp" -o -name "QNNUtils.cc"

Repository: UbiquitousLearning/mllm

Length of output: 480


🏁 Script executed:

#!/bin/bash
# Read the constructor implementation

echo "=== QNNTensorWrapper constructor (lines 420-460) ==="
sed -n '420,460p' mllm/backends/qnn/QNNUtils.cpp

echo -e "\n=== Check for blockwiseExpansion_ initialization in constructor ==="
rg -n "blockwiseExpansion_" mllm/backends/qnn/QNNUtils.cpp

echo -e "\n=== Search for Qnn_BlockwiseExpansion_t definition ==="
rg -A 10 "Qnn_BlockwiseExpansion_t" --type-list

Repository: UbiquitousLearning/mllm

Length of output: 2544


🏁 Script executed:

#!/bin/bash
# Read the header file around the member declarations

echo "=== QNNTensorWrapper header file members (lines 210-230) ==="
sed -n '210,230p' mllm/backends/qnn/QNNUtils.hpp

echo -e "\n=== Full setBlockwiseQuantization method ==="
rg -B 2 -A 15 "void setBlockwiseQuantization" mllm/backends/qnn/QNNUtils.cpp

echo -e "\n=== Search for Qnn_BlockwiseExpansion_t in header files ==="
rg "Qnn_BlockwiseExpansion_t" --type=h --type=hpp

echo -e "\n=== Check for Qnn_BlockwiseExpansion_t definition in QNN SDK ==="
find . -name "*.h" -o -name "*.hpp" | xargs grep -l "Qnn_BlockwiseExpansion" 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 1249


🏁 Script executed:

#!/bin/bash
# Search more carefully for the struct definition and setBlockwiseQuantization

echo "=== Full setBlockwiseQuantization method implementation ==="
grep -n "setBlockwiseQuantization" mllm/backends/qnn/QNNUtils.cpp
sed -n '515,535p' mllm/backends/qnn/QNNUtils.cpp

echo -e "\n=== Check if blockwiseExpansion_ is referenced elsewhere ==="
rg "blockwiseExpansion_" mllm/backends/qnn/QNNUtils.cpp

echo -e "\n=== Look for Qnn_BlockwiseExpansion_t in QNNUtils.hpp ==="
grep -n "Qnn_BlockwiseExpansion_t" mllm/backends/qnn/QNNUtils.hpp

echo -e "\n=== Search for where Qnn_BlockwiseExpansion_t comes from ==="
rg "#include" mllm/backends/qnn/QNNUtils.hpp | head -20

echo -e "\n=== Check if blockwiseExpansion_ needs initialization before use ==="
sed -n '440,530p' mllm/backends/qnn/QNNUtils.cpp

Repository: UbiquitousLearning/mllm

Length of output: 5988


Initialize blockwiseExpansion_ member in the constructor initialization list or with default constructor.

While blockwiseExpansion_ is only accessed after setBlockwiseQuantization() is called, leaving it uninitialized after construction is unsafe. Add explicit initialization in the constructor's member initializer list (e.g., blockwiseExpansion_{}) to follow C++ best practices and ensure the member has a well-defined state from construction.

🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNUtils.hpp around lines 218-221, The member
Qnn_BlockwiseExpansion_t blockwiseExpansion_ is left uninitialized; update the
class constructor to explicitly initialize blockwiseExpansion_ (for example via
the constructor's member initializer list using blockwiseExpansion_{} or
equivalent) so it has a well-defined state from construction; keep
setBlockwiseQuantization() usage unchanged but ensure blockwiseExpansion_ is
default-initialized in the constructor that declares scaleOffsets_ and
blockwiseExpansion_.

@chenghuaWang chenghuaWang merged commit 6ada683 into UbiquitousLearning:main Jan 5, 2026
5 checks passed
@chenghuaWang chenghuaWang changed the title Qnn aot refactor fix(qualcomm): Qnn AOT refactor Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants