fix(qualcomm): Qnn AOT refactor #578

oreomaker · 2026-01-01T05:50:23Z

Build system and environment variable standardization:

Replaced all references to the QNN_SDK_ROOT environment variable with QAIRT_SDK_ROOT in documentation and CMake files to standardize SDK path usage and avoid confusion. [1] [2] [3] [4] [5]
Updated CMake build logic to modularize QNN backend source inclusion and subdirectory addition, ensuring that QNN sources are correctly included when building for Qualcomm QNN AOT or QNN backend support. [1] [2] [3] [4]

QNN backend code simplification and robustness:

Refactored QNNAllocator by removing unused buffer tracking, statistics, and fallback logic, and simplified the destructor to ensure proper deregistration and memory cleanup. The registerQnnTensorToSharedBuffer interface was also simplified. [1] [2] [3] [4]
Simplified the QNNTensorWrapper allocation logic, removing complex buffer reuse and fallback code, and added helpers for setting quantization parameters with proper memory management. [1] [2]
Centralized and expanded QNN data type conversion logic, adding mllmDataTypeToQnnDataType and qnnDataTypeToSize utility functions for robust and maintainable type mapping. [1] [2] [3] [4]

These changes collectively improve maintainability, reduce potential for errors, and make the QNN backend codebase cleaner and easier to understand.

Summary by CodeRabbit

Documentation
- Updated SDK environment variable references from QNN_SDK_ROOT to QAIRT_SDK_ROOT across setup and build docs.
New Features
- Added tensor quantization configuration (scale-offset and blockwise) and improved datatype translation utilities.
- Introduced wrapper-based backend model for more robust graph and tensor handling.
Bug Fixes
- Simplified memory allocation/registration and consolidated backend build logic for more stable builds and reduced overhead.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ation

coderabbitai · 2026-01-01T05:50:34Z

📝 Walkthrough

Walkthrough

Refactors the QNN backend: standardizes SDK env var to QAIRT_SDK_ROOT, simplifies QNNAllocator memory management, adds datatype/quantization utilities, and rewires AOT compilation from direct qnn_interface calls to wrapper-based QNNModel abstractions.

Changes

Cohort / File(s)	Summary
Documentation & Setup `docs/qnn_backend/setup_env.rst`	Replace `QNN_SDK_ROOT` references with `QAIRT_SDK_ROOT` in install and verification steps.
Build Configuration – Core & Backend `mllm/CMakeLists.txt`, `mllm/backends/qnn/CMakeLists.txt`	Centralize inclusion of `backends/qnn` under OR condition for AOT/on-build flags; remove conditional Qualcomm AOT src glob and explicit SDK include blocks; switch include refs to `QAIRT_SDK_ROOT`; add `MllmRT` public linkage.
Build Configuration – Examples `examples/qwen3_qnn_aot/CMakeLists.txt`	Add `MllmQNNBackend` to target link libraries; remove explicit `QAIRT_SDK_ROOT` include block.
Allocator Simplification `mllm/backends/qnn/QNNAllocator.hpp`, `mllm/backends/qnn/QNNAllocator.cpp`	Drastically simplify allocation/register/free flows, remove registration/alias maps and helper APIs, change `registerQnnTensorToSharedBuffer` signature to `(void*, Qnn_Tensor_t&)`, and add inline destructor cleanup in header.
Utilities & Tensor Wrappers `mllm/backends/qnn/QNNUtils.hpp`, `mllm/backends/qnn/QNNUtils.cpp`	Add `mllmDataTypeToQnnDataType` and `qnnDataTypeToSize`; add quantization helpers and state in `QNNTensorWrapper`; make `alloc()` idempotent and register tensors via backend allocator; remove global `QNNDataTypeToSize` map and `resetAlloc`.
AOT Wrapper API Refactor `mllm/backends/qnn/aot/QnnWrappersAPI.hpp`, `mllm/backends/qnn/aot/QnnWrappersAPI.cpp`	Replace direct `qnn_interface` usage with `QNNModel` wrapper paths; change many public/internal signatures to use `QNNParam*Wrapper` and `QNNTensorWrapper`; alter `QnnAOTGraph` constructor to accept `QNN_INTERFACE_VER_TYPE`, backend/context handles, and use `qnn_model_` for graph lifecycle.
AOT Visitor Behavior `mllm/backends/qnn/aot/visitor/Base.hpp`	`QnnAOTBasePattern::rewrite` now delegates to virtual `compile(writer, node)` instead of returning false, allowing compile to drive rewrite.
Custom Op Package Makefile `mllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile`	Switch env var to `QAIRT_SDK_ROOT` for include/lib paths and add guards that error/warn when target lib missing.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Env as QnnAOTEnv
    participant Model as QNNModel (wrapper)
    participant Backend as QNN Backend / QNN Interface
    participant Context as Qnn_Context
    participant Graph as QNN Graph

    Env->>Model: request/create graph (qnnInterface, backendHandle, contextHandle, graphName)
    alt context present
        Model->>Backend: initialize model (create graph state)
        Backend-->>Model: model handle / capabilities
        Model->>Context: attach context & allocate resources
        Model-->>Env: return wrapped graph handle
        Env->>Graph: compile via qnn_model_->finalizeGraph(...)
        Graph-->>Model: finalize result
        Model-->>Env: report success / graph ready
    else missing context
        Env-->>Env: return nullptr (graph not captured)
    end
    note right of Model: Quant/Type handling delegated\nto QNNTensorWrapper utilities

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(qnn): add Qualcomm QNN AOT support on x86 platforms #562 — Overlaps QNN AOT backend, CMake AOT options, and SDK include/path changes.
feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization #572 — Related AOT wrapper refactor and QNNUtils/QNNTensorWrapper changes.
feat: add LLM2QnnLoweringPass and update graph splitting logic #577 — Changes touching QnnWrappersAPI and AOT lowering pipeline with overlapping signatures.

Suggested reviewers

liang1232018
yirongjie

Poem

🐰
Hops and hops through QAIRT light,
Wrappers snug and allocs light,
Tensors dressed in quantized art,
Graphs assemble, ready to start,
A rabbit cheers this tidy part!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.27% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Qnn aot refactor' is too generic and vague; it does not clearly convey the specific changes made in this comprehensive refactoring.	Consider a more descriptive title that highlights the main objectives, such as 'Refactor QNN backend with QAIRT_SDK_ROOT standardization and allocator simplification' or 'Standardize QNN SDK paths and simplify QNN allocator logic'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description is comprehensive, well-structured, and covers all major changes with relevant reference links to specific diffs for each category of modifications.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fbccd0 and 0d08209.

📒 Files selected for processing (12)

docs/qnn_backend/setup_env.rst
examples/qwen3_qnn_aot/CMakeLists.txt
mllm/CMakeLists.txt
mllm/backends/qnn/CMakeLists.txt
mllm/backends/qnn/QNNAllocator.cpp
mllm/backends/qnn/QNNAllocator.hpp
mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile

🧰 Additional context used

📓 Path-based instructions (5)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/CMakeLists.txt
mllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile
mllm/CMakeLists.txt
mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/QNNAllocator.hpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/QNNAllocator.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{toml,json,txt,md}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

When suggesting new dependencies, evaluate their maturity, licensing, and maintenance status. Prefer well-established and actively maintained libraries.

Files:

mllm/backends/qnn/CMakeLists.txt
mllm/CMakeLists.txt

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/QNNAllocator.hpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/QNNAllocator.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/QNNAllocator.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/QNNAllocator.hpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/QNNAllocator.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

🧬 Code graph analysis (4)

mllm/backends/qnn/QNNUtils.cpp (1)

mllm/backends/qnn/QNNUtils.hpp (8)

mllmDataTypeToQnnDataType (52-52)

dataContainer_ (204-204)

qnnTensor_ (196-196)

scaleOffsets (208-208)

blockwise (209-209)

qnnDataTypeToSize (53-53)

qnnParam_ (242-242)

qnnParam_ (243-243)

mllm/backends/qnn/QNNAllocator.hpp (1)

mllm/backends/qnn/QNNAllocator.cpp (2)

QNNAllocator (16-23)

QNNAllocator (25-35)

mllm/backends/qnn/QNNAllocator.cpp (1)

mllm/backends/qnn/QNNAllocator.hpp (16)

ptr (58-61)

ptr (58-58)

ptr (62-62)

ptr (62-62)

ptr (78-78)

ptr (80-80)

storage (49-49)

storage (51-51)

storage (51-51)

storage (53-53)

storage (55-55)

storage (55-55)

storage (64-69)

storage (64-64)

storage (71-71)

storage (71-71)

mllm/backends/qnn/QNNUtils.hpp (1)

mllm/core/Tensor.cpp (2)

dtype (468-468)

dtype (468-468)

🪛 Clang (14.0.6)

mllm/backends/qnn/QNNUtils.cpp

[error] 303-303: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 415-415: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 454-454: variable 'dataType' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 454-454: invalid case style for variable 'dataType'

(readability-identifier-naming,-warnings-as-errors)

[error] 504-504: 2 adjacent parameters of 'setScaleOffsetQuantization' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 515-515: 2 adjacent parameters of 'setBlockwiseQuantization' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 584-584: variable 'dataSize' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 584-584: invalid case style for variable 'dataSize'

(readability-identifier-naming,-warnings-as-errors)

mllm/backends/qnn/QNNAllocator.hpp

[error] 33-33: use '= default' to define a trivial destructor

(modernize-use-equals-default,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 51-51: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

🪛 Cppcheck (2.19.0)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

[error] 417-417: There is an unknown macro here somewhere. Configuration is required. If MLLM_RT_ASSERT_EQ is a macro then please configure it.

(unknownMacro)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-x86
GitHub Check: build-android
GitHub Check: build-macos

🔇 Additional comments (25)

docs/qnn_backend/setup_env.rst (1)

34-34: LGTM! Environment variable migration is clear and consistent.

The documentation correctly updates all references from QNN_SDK_ROOT to QAIRT_SDK_ROOT, maintaining consistency across installation instructions, verification steps, and prerequisites.

Also applies to: 71-71, 87-87

mllm/backends/qnn/CMakeLists.txt (2)

16-22: LGTM! AOT source inclusion is properly conditional.

The conditional compilation logic correctly includes AOT sources only when MLLM_QUALCOMM_QNN_AOT_ON_X86_ENABLE is set, using GLOB_RECURSE to gather all relevant source files and appending them to the main source list.

35-35: LGTM! Status message updated consistently.

The status message correctly references QAIRT_SDK_ROOT, maintaining consistency with the environment variable migration.

mllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile (1)

4-5: LGTM! Makefile properly migrates to QAIRT_SDK_ROOT with robust error handling.

The environment variable migration is cleanly executed with clear informational messages and proper fallback logic. The existing error checking (lines 7-20) ensures that missing SDK paths are caught early with helpful error messages.

Also applies to: 11-12

mllm/CMakeLists.txt (1)

117-119: LGTM! Backend inclusion properly consolidated.

The centralized conditional inclusion using an OR operation correctly handles both QNN AOT and QNN backend scenarios, eliminating duplication and improving maintainability. The backend's own CMakeLists.txt now handles the conditional source inclusion internally.

examples/qwen3_qnn_aot/CMakeLists.txt (1)

2-2: LGTM! QNN backend properly linked.

Adding MllmQNNBackend to the link libraries correctly enables QNN AOT functionality. The removal of explicit QAIRT_SDK_ROOT include paths (noted in the summary) is appropriate since these are now provided transitively through MllmQNNBackend's PUBLIC interface.

mllm/backends/qnn/QNNAllocator.hpp (2)

33-40: Destructor cleanup logic looks correct but could silently fail partially.

The destructor correctly iterates and cleans up resources. However, if memDeRegister fails for one entry, the error is logged but iteration continues. This is reasonable for a destructor (shouldn't throw), but consider whether partial cleanup failures could leave the system in an inconsistent state.

Note: The static analysis suggestion to use = default is a false positive since this destructor has non-trivial cleanup logic.

78-78: Signature change simplifies the interface.

Changing from Storage* to void* and removing the return type aligns with the refactored implementation that uses assertions for error conditions rather than return values. This is acceptable for internal APIs where failures are programming errors.

mllm/backends/qnn/QNNUtils.hpp (2)

52-54: Good addition of centralized datatype utilities.

These utility functions centralize datatype translation logic, reducing code duplication across the codebase. The declarations align with implementations in QNNUtils.cpp.

207-222: Quantization helpers with proper lifetime management.

The new setScaleOffsetQuantization and setBlockwiseQuantization methods, along with private storage members scaleOffsets_ and blockwiseExpansion_, ensure quantization parameter data outlives the tensor wrapper. This addresses potential dangling pointer issues when QNN references this data.

mllm/backends/qnn/aot/visitor/Base.hpp (1)

17-22: Pattern delegation to compile() enables extensibility.

The rewrite() method now delegates to the pure virtual compile() method, enabling derived classes to implement the actual rewriting logic. This is a clean abstraction that separates the pattern interface from the implementation.

mllm/backends/qnn/QNNUtils.cpp (3)

487-502: Idempotent allocation with proper registration flow.

The early return guard prevents double allocation, and the conditional check before alloc() allows pre-allocated storage to be registered. The flow correctly obtains the QNN allocator from context and registers the tensor.

504-525: Quantization setters correctly manage memory lifetime.

Both methods copy input data to member storage (scaleOffsets_, blockwiseExpansion_) before setting pointers in the QNN tensor structure. This ensures the quantization data remains valid for the lifetime of the wrapper.

302-392: Verify kInt4/kUInt4 QNN datatype mapping against QNN SDK source.

The mllmDataTypeToQnnDataType function comprehensively handles data types including quantized variants. However, the FIXME comment at line 349 ("Maybe error here") for kInt4/kUInt4 mapping to QNN_DATATYPE_SFIXED_POINT_4/QNN_DATATYPE_UFIXED_POINT_4 indicates unresolved uncertainty by the original author. Public QNN SDK documentation does not explicitly confirm this mapping. Verify against the Qualcomm QNN SDK header files (e.g., QnnDataTypes.h) to confirm the mapping is correct, or update the code and FIXME comment based on findings.

Note: The static analysis warning about ret being uninitialized is a false positive—it is correctly initialized to QNN_DATATYPE_UNDEFINED on line 303.

mllm/backends/qnn/QNNAllocator.cpp (4)

37-46: Simplified allocation path is clean.

The allocation flow directly uses allocSize(storage) and inserts the pointer into the tracking set. The assertion ensures allocation succeeded.

48-55: Free handles both registered and unregistered memory.

The conditional check before deregistration correctly handles memory that may or may not have been registered with QNN. This is important for cleanup of storage that was allocated but never used with registerQnnTensorToSharedBuffer.

57-92: Registration flow handles re-registration gracefully.

The early return for already-registered pointers avoids duplicate registration while still updating the tensor's mem handle. The implementation correctly creates the memory descriptor and registers with QNN.

94-98: Deregistration asserts pointer exists before cleanup.

The assertion MLLM_RT_ASSERT_EQ(ptrToFdAndMemHandleMap_.count(ptr), 1) will abort if called with an unregistered pointer. This is appropriate for internal APIs where calling deregister on unregistered memory is a programming error.

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (3)

43-65: QnnAOTNodeTensor now uses wrapper abstraction.

The transition from raw Qnn_Tensor_t to QNNTensorWrapper improves resource management. The getWrapper() accessor provides access to the underlying wrapper.

Note: The static analysis warning about parameter name 'v' being too short is a style preference that can be addressed later if desired.

85-108: Parameter handling uses wrapper types consistently.

The addParamScalar, emplaceParamScalar, addParamTensor, and emplaceParamTensor methods now accept QNNParamScalarWrapper and QNNParamTensorWrapper shared pointers, aligning with the wrapper-based architecture.

115-128: QnnAOTGraph delegates to QNNModel.

The constructor now accepts explicit QNN interface handles and delegates graph management to QNNModel. The private qnn_model_ member encapsulates the underlying QNN operations.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (4)

24-34: Constructor delegates to wrapper creation methods.

The constructor correctly determines whether to create a static or non-static tensor wrapper based on the force_static_weight parameter and tensor type. This simplifies the tensor creation logic.

99-101: Datatype parsing now uses centralized utility.

Delegating to mllmDataTypeToQnnDataType reduces code duplication and ensures consistent datatype mapping across the codebase.

239-266: QnnAOTGraph uses QNNModel for graph operations.

The constructor creates a QNNModel and initializes it with the context and graph name. addOperation adds tensors and nodes via the model, and compile() calls finalizeGraph(). This abstracts away direct QNN interface calls.

548-559: Graceful handling of missing context.

Returning nullptr instead of asserting when the context is not found allows callers to handle the error condition more gracefully. This is an improvement over hard assertions for recoverable error cases.

coderabbitai · 2026-01-01T05:54:31Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

+      Qnn_BlockwiseExpansion_t blockwise_expansion;
+      blockwise_expansion.axis = cfg->ch_axis;
+      blockwise_expansion.axis = cfg->ch_axis;
+      blockwise_expansion.scaleOffsets = nullptr;  // Will be set by setBlockwiseQuantization
+      blockwise_expansion.numBlocksPerAxis = v->tensor_.size(cfg->ch_axis) / cfg->block_size;
+      blockwise_expansion.blockScaleBitwidth = 12;  // 12 bits for 4 to 16 expansion
+      blockwise_expansion.blockScaleStorageType = QNN_BLOCKWISE_EXPANSION_BITWIDTH_SCALE_STORAGE_8;
+      blockwise_expansion.blocksScale8 = cfg->scale_level_0_int.ptr<mllm_uint8_t>();

-      ret.encodingDefinition = QNN_DEFINITION_DEFINED;
-      ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_BLOCKWISE_EXPANSION;
-      ret.blockwiseExpansion = block_scale_array;
+      tensor_wrapper_->setBlockwiseQuantization(blockwise_expansion, scale_offsets);


⚠️ Potential issue | 🟡 Minor

Duplicate assignment of blockwise_expansion.axis.

Line 158 and 159 both assign to blockwise_expansion.axis. This appears to be a copy-paste error.

🔎 Proposed fix to remove duplicate assignment

Qnn_BlockwiseExpansion_t blockwise_expansion; blockwise_expansion.axis = cfg->ch_axis; - blockwise_expansion.axis = cfg->ch_axis; blockwise_expansion.scaleOffsets = nullptr; // Will be set by setBlockwiseQuantization

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 157 to 166 there is a duplicate assignment to blockwise_expansion.axis (lines 158-159); remove the redundant second assignment so axis is set only once (keep the correct cfg->ch_axis assignment), leaving the rest of blockwise_expansion initialization unchanged.

coderabbitai · 2026-01-01T05:54:31Z

mllm/backends/qnn/CMakeLists.txt


 target_include_directories(MllmQNNBackend PUBLIC
-    $ENV{QNN_SDK_ROOT}/include/QNN # QNN SDK include
+    $ENV{QAIRT_SDK_ROOT}/include/QNN # QNN SDK include


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the CMakeLists.txt file around line 31 cat -n mllm/backends/qnn/CMakeLists.txt | head -50

Repository: UbiquitousLearning/mllm

Length of output: 1379

🏁 Script executed:

# Search for all QAIRT_SDK_ROOT usage in CMake files rg -n "QAIRT_SDK_ROOT" --type cmake -B 2 -A 2

Repository: UbiquitousLearning/mllm

Length of output: 1040

🏁 Script executed:

# Search for validation patterns of environment variables in CMake files rg -n "if.*NOT.*DEFINED.*ENV" --type cmake -B 1 -A 3

Repository: UbiquitousLearning/mllm

Length of output: 399

🏁 Script executed:

# Search for Makefile references mentioned in the review fd -e Makefile | head -20

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Check for QNN_SDK_ROOT references to understand migration context rg -n "QNN_SDK_ROOT" -B 1 -A 1

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Search for how QAIRT_SDK_ROOT is set or documented rg -n "QAIRT_SDK_ROOT" -B 3 -A 3

Repository: UbiquitousLearning/mllm

Length of output: 4854

🏁 Script executed:

# Check if there's a custom-op-package directory and Makefile referenced in review find . -type d -name "*custom*op*" | head -10 find . -name "Makefile" | head -20

Repository: UbiquitousLearning/mllm

Length of output: 180

🏁 Script executed:

# Check if there's any validation or setup script for QAIRT_SDK_ROOT fd -e "\.sh$|\.md$" | xargs rg -l "QAIRT_SDK_ROOT" 2>/dev/null | head -10

Repository: UbiquitousLearning/mllm

Length of output: 243

Add validation for QAIRT_SDK_ROOT environment variable before use.

The CMakeLists.txt uses $ENV{QAIRT_SDK_ROOT} without checking if the environment variable is defined. While the associated Makefile in custom-op-package/LLaMAPackage/ validates this with error handling, the CMake build files do not. If QAIRT_SDK_ROOT is unset, the include path becomes /include/QNN, causing a confusing compile failure rather than a clear configuration error.

Add validation similar to other backends in the codebase:

if(NOT DEFINED ENV{QAIRT_SDK_ROOT}) message(FATAL_ERROR "QAIRT_SDK_ROOT environment variable is not set. Please set it to point to the Qualcomm AI SDK root directory.") endif()

This check should be added early in the CMakeLists.txt files (both mllm/backends/qnn/CMakeLists.txt and mllm/ffi/CMakeLists.txt) before the environment variable is referenced.

coderabbitai · 2026-01-01T05:54:32Z

mllm/backends/qnn/QNNUtils.cpp

+size_t qnnDataTypeToSize(Qnn_DataType_t dtype) {
+  switch (dtype) {
+    case QNN_DATATYPE_INT_8:
+    case QNN_DATATYPE_UINT_8:
+    case QNN_DATATYPE_BOOL_8:
+    case QNN_DATATYPE_SFIXED_POINT_8:
+    case QNN_DATATYPE_UFIXED_POINT_8: return 1;
+
+    case QNN_DATATYPE_INT_16:
+    case QNN_DATATYPE_UINT_16:
+    case QNN_DATATYPE_FLOAT_16:
+    case QNN_DATATYPE_SFIXED_POINT_16:
+    case QNN_DATATYPE_UFIXED_POINT_16: return 2;
+
+    case QNN_DATATYPE_INT_32:
+    case QNN_DATATYPE_UINT_32:
+    case QNN_DATATYPE_FLOAT_32:
+    case QNN_DATATYPE_SFIXED_POINT_32:
+    case QNN_DATATYPE_UFIXED_POINT_32: return 4;
+
+    case QNN_DATATYPE_INT_64:
+    case QNN_DATATYPE_UINT_64: return 8;
+
+    default:
+      MLLM_ERROR("qnnDataTypeToSize: unsupported Qnn_DataType_t {}", static_cast<int>(dtype));
+      MLLM_RT_ASSERT(false);
+      return 0;
+  }
+}


⚠️ Potential issue | 🟡 Minor

Size mapping covers most QNN datatypes.

The function handles common QNN datatypes correctly. However, it doesn't cover QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 which are used in mllmDataTypeToQnnDataType for kInt4/kUInt4. This could cause a runtime assertion failure if 4-bit types are used with qnnDataTypeToSize.

🔎 Proposed fix to add 4-bit type handling

size_t qnnDataTypeToSize(Qnn_DataType_t dtype) { switch (dtype) { + // 4-bit types are packed, return 1 byte as minimum addressable unit + case QNN_DATATYPE_SFIXED_POINT_4: + case QNN_DATATYPE_UFIXED_POINT_4: return 1; // Note: actual size is 0.5 bytes + case QNN_DATATYPE_INT_8: case QNN_DATATYPE_UINT_8:

🧰 Tools

🪛 Clang (14.0.6)

[error] 415-415: 8 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

🤖 Prompt for AI Agents

In mllm/backends/qnn/QNNUtils.cpp around lines 394-422, qnnDataTypeToSize is missing cases for QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 which leads to the default assert if 4-bit types are passed; add those two enum cases to the switch and return an appropriate byte-size (use 1 to represent the storage unit) for each, and optionally add a brief comment noting these are 4-bit types packed two-per-byte so callers should handle packing/unpacking if needed.

UbiquitousLearning

LGTM

coderabbitai

Actionable comments posted: 2

Fix all issues with AI Agents 🤖

In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp:
- Around line 261-266: The compile() implementation sets is_compiled_
unconditionally which can mark a failed compilation as successful; change
QnnAOTGraph::compile so it calls qnn_model_->finalizeGraph(...) and only sets
is_compiled_ = true when that call returns mllm::qnn::MODEL_NO_ERROR (i.e., on
success), returning the success boolean otherwise; keep the early-return when
is_compiled_ is already true.

In @mllm/backends/qnn/QNNUtils.hpp:
- Around line 218-221: The member Qnn_BlockwiseExpansion_t blockwiseExpansion_
is left uninitialized; update the class constructor to explicitly initialize
blockwiseExpansion_ (for example via the constructor's member initializer list
using blockwiseExpansion_{} or equivalent) so it has a well-defined state from
construction; keep setBlockwiseQuantization() usage unchanged but ensure
blockwiseExpansion_ is default-initialized in the constructor that declares
scaleOffsets_ and blockwiseExpansion_.

♻️ Duplicate comments (1)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
157-166: Duplicate assignment of blockwise_expansion.axis.

Lines 158 and 159 both assign to blockwise_expansion.axis. This appears to be a copy-paste error.
🔎 Proposed fix to remove duplicate assignment
       Qnn_BlockwiseExpansion_t blockwise_expansion;
       blockwise_expansion.axis = cfg->ch_axis;
-      blockwise_expansion.axis = cfg->ch_axis;
       blockwise_expansion.scaleOffsets = nullptr;  // Will be set by setBlockwiseQuantization

🧹 Nitpick comments (1)

mllm/backends/qnn/QNNUtils.hpp (1)

207-209: Consider adding detailed documentation for quantization methods.

While a brief comment is present, these public methods would benefit from more comprehensive documentation explaining:

When these methods should be called relative to alloc()
Parameter constraints (e.g., valid range for axis)
Expected behavior and side effects
Example usage patterns

📝 Suggested documentation format

-  // Helper to set complex quantization params and manage memory
+  /**
+   * @brief Sets scale-offset quantization parameters for the tensor
+   * @param scaleOffsets Vector of scale-offset pairs for quantization
+   * @param axis The axis along which quantization is applied (-1 for per-tensor)
+   * @note Must be called before alloc() to take effect
+   */
   void setScaleOffsetQuantization(const std::vector<Qnn_ScaleOffset_t>& scaleOffsets, int32_t axis);
+  
+  /**
+   * @brief Sets blockwise quantization parameters for the tensor
+   * @param blockwise Blockwise expansion configuration
+   * @param scaleOffsets Vector of scale-offset pairs for blockwise quantization
+   * @note Must be called before alloc() to take effect
+   */
   void setBlockwiseQuantization(const Qnn_BlockwiseExpansion_t& blockwise, const std::vector<Qnn_ScaleOffset_t>& scaleOffsets);

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d08209 and 550380c.

📒 Files selected for processing (2)

mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

🧬 Code graph analysis (2)

mllm/backends/qnn/QNNUtils.hpp (2)

mllm/core/Tensor.hpp (1)

dtype (412-412)

mllm/core/Tensor.cpp (2)

dtype (506-506)

dtype (506-506)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

mllm/backends/qnn/QNNUtils.hpp (7)

name (185-186)

name (189-190)

create (255-257)

create (255-255)

mllmDataTypeToQnnDataType (52-52)

__mllmQnnLoggerCallback (92-130)

__mllmQnnLoggerCallback (92-92)

mllm/backends/qnn/QNNUtils.cpp (8)

createStaticTensor (466-485)

createStaticTensor (466-467)

create (447-464)

create (447-448)

create (551-555)

create (551-553)

mllmDataTypeToQnnDataType (302-392)

mllmDataTypeToQnnDataType (302-302)

🪛 Cppcheck (2.19.0)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

[error] 417-417: There is an unknown macro here somewhere. Configuration is required. If MLLM_RT_ASSERT_EQ is a macro then please configure it.

(unknownMacro)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-macos
GitHub Check: build-android
GitHub Check: build-x86

🔇 Additional comments (9)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (7)

24-34: Well-structured constructor refactor.

The constructor now properly delegates tensor creation to wrapper methods based on tensor type, with clean separation between static and dynamic tensor handling.

99-101: Good centralization of data type conversion.

Using the utility function mllmDataTypeToQnnDataType improves maintainability by centralizing type mapping logic.

122-136: Improved memory management for per-channel quantization.

The refactor replaces manual memory allocation with std::vector and delegates quantization setup to the wrapper's setScaleOffsetQuantization method, improving both safety and maintainability.

198-220: Well-designed API modernization.

The parameter handling methods now use std::shared_ptr for wrapper types instead of raw pointers, providing automatic memory management and clearer ownership semantics.

239-243: Clean wrapper-based initialization.

The constructor now properly delegates graph management to QNNModel, following the new wrapper-based architecture.

245-259: Well-structured operation addition.

The method cleanly gathers tensor names, registers wrappers with the model, and adds nodes using the wrapper-based API.

548-559: Proper error handling for missing context.

The method correctly returns nullptr with an error log when the context is not found, and cleanly manages graph creation through the wrapper-based constructor.

mllm/backends/qnn/QNNUtils.hpp (2)

23-23: Documentation updated to reflect SDK variable standardization.

The comment now correctly references QAIRT_SDK_ROOT instead of QNN_SDK_ROOT, consistent with the PR's standardization objectives.

52-53: Implementation confirmed with proper error handling.

Both utility functions are implemented in QNNUtils.cpp with comprehensive switch statements that handle data type conversions and size calculations. The qnnDataTypeToSize() function includes explicit error handling for unsupported types via MLLM_ERROR and MLLM_RT_ASSERT(false), and mllmDataTypeToQnnDataType() similarly validates input with error reporting for invalid data types.

coderabbitai · 2026-01-05T11:30:03Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

 bool QnnAOTGraph::compile() {
  if (is_compiled_) { return true; }
-
-  auto env = AOTCompileContext::getInstance().getEnv();
-  auto qnn_interface = env->getFuncSymbol().qnn_interface_;
-  qnn_interface.graphFinalize(qnn_graph_handle_, env->getContext(belongs_context_name_)->profile_bk_handle_, nullptr);
-
+  bool ret = qnn_model_->finalizeGraph(nullptr, nullptr) == mllm::qnn::MODEL_NO_ERROR;
  is_compiled_ = true;
-  return true;
+  return ret;
 }


⚠️ Potential issue | 🟡 Minor

Set is_compiled_ only on successful compilation.

Line 264 sets is_compiled_ = true unconditionally, even when finalizeGraph fails. This could lead to incorrect internal state where the graph appears compiled but actually failed.

🔎 Proposed fix to set flag only on success

bool QnnAOTGraph::compile() { if (is_compiled_) { return true; } bool ret = qnn_model_->finalizeGraph(nullptr, nullptr) == mllm::qnn::MODEL_NO_ERROR; - is_compiled_ = true; + if (ret) { is_compiled_ = true; } return ret; }

🤖 Prompt for AI Agents

In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 261-266, The compile() implementation sets is_compiled_ unconditionally which can mark a failed compilation as successful; change QnnAOTGraph::compile so it calls qnn_model_->finalizeGraph(...) and only sets is_compiled_ = true when that call returns mllm::qnn::MODEL_NO_ERROR (i.e., on success), returning the success boolean otherwise; keep the early-return when is_compiled_ is already true.

coderabbitai · 2026-01-05T11:30:03Z

mllm/backends/qnn/QNNUtils.hpp

+
+  // Storage for quantization parameters to ensure lifetime matches the tensor wrapper
+  std::vector<Qnn_ScaleOffset_t> scaleOffsets_;
+  Qnn_BlockwiseExpansion_t blockwiseExpansion_;


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify blockwiseExpansion_ initialization echo "=== Finding QNNTensorWrapper class and constructor ===" rg -n "class QNNTensorWrapper" --type=hpp -A 20 | head -50 echo -e "\n=== Searching for QNNTensorWrapper constructor in implementation files ===" rg -n "QNNTensorWrapper::QNNTensorWrapper" --type=cpp echo -e "\n=== Checking QNNUtils.cpp for constructor implementation ===" find . -name "QNNUtils.cpp" -o -name "QNNUtils.cc"

Repository: UbiquitousLearning/mllm

Length of output: 480

🏁 Script executed:

#!/bin/bash # Read the constructor implementation echo "=== QNNTensorWrapper constructor (lines 420-460) ===" sed -n '420,460p' mllm/backends/qnn/QNNUtils.cpp echo -e "\n=== Check for blockwiseExpansion_ initialization in constructor ===" rg -n "blockwiseExpansion_" mllm/backends/qnn/QNNUtils.cpp echo -e "\n=== Search for Qnn_BlockwiseExpansion_t definition ===" rg -A 10 "Qnn_BlockwiseExpansion_t" --type-list

Repository: UbiquitousLearning/mllm

Length of output: 2544

🏁 Script executed:

#!/bin/bash # Read the header file around the member declarations echo "=== QNNTensorWrapper header file members (lines 210-230) ===" sed -n '210,230p' mllm/backends/qnn/QNNUtils.hpp echo -e "\n=== Full setBlockwiseQuantization method ===" rg -B 2 -A 15 "void setBlockwiseQuantization" mllm/backends/qnn/QNNUtils.cpp echo -e "\n=== Search for Qnn_BlockwiseExpansion_t in header files ===" rg "Qnn_BlockwiseExpansion_t" --type=h --type=hpp echo -e "\n=== Check for Qnn_BlockwiseExpansion_t definition in QNN SDK ===" find . -name "*.h" -o -name "*.hpp" | xargs grep -l "Qnn_BlockwiseExpansion" 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 1249

🏁 Script executed:

#!/bin/bash # Search more carefully for the struct definition and setBlockwiseQuantization echo "=== Full setBlockwiseQuantization method implementation ===" grep -n "setBlockwiseQuantization" mllm/backends/qnn/QNNUtils.cpp sed -n '515,535p' mllm/backends/qnn/QNNUtils.cpp echo -e "\n=== Check if blockwiseExpansion_ is referenced elsewhere ===" rg "blockwiseExpansion_" mllm/backends/qnn/QNNUtils.cpp echo -e "\n=== Look for Qnn_BlockwiseExpansion_t in QNNUtils.hpp ===" grep -n "Qnn_BlockwiseExpansion_t" mllm/backends/qnn/QNNUtils.hpp echo -e "\n=== Search for where Qnn_BlockwiseExpansion_t comes from ===" rg "#include" mllm/backends/qnn/QNNUtils.hpp | head -20 echo -e "\n=== Check if blockwiseExpansion_ needs initialization before use ===" sed -n '440,530p' mllm/backends/qnn/QNNUtils.cpp

Repository: UbiquitousLearning/mllm

Length of output: 5988

Initialize blockwiseExpansion_ member in the constructor initialization list or with default constructor.

While blockwiseExpansion_ is only accessed after setBlockwiseQuantization() is called, leaving it uninitialized after construction is unsafe. Add explicit initialization in the constructor's member initializer list (e.g., blockwiseExpansion_{}) to follow C++ best practices and ensure the member has a well-defined state from construction.

🤖 Prompt for AI Agents

In @mllm/backends/qnn/QNNUtils.hpp around lines 218-221, The member Qnn_BlockwiseExpansion_t blockwiseExpansion_ is left uninitialized; update the class constructor to explicitly initialize blockwiseExpansion_ (for example via the constructor's member initializer list using blockwiseExpansion_{} or equivalent) so it has a well-defined state from construction; keep setBlockwiseQuantization() usage unchanged but ensure blockwiseExpansion_ is default-initialized in the constructor that declares scaleOffsets_ and blockwiseExpansion_.

oreomaker added 7 commits December 23, 2025 03:00

refactor(qnn): change legacy QNN_SDK_ROOT to QAIRT_SDK_ROOT

827716a

Merge branch 'main' into qnn-aot

ed68fa6

Merge branch 'main' into qnn-aot

e15db31

Merge branch 'main' into qnn-aot

d380bfe

refactor(qnn aot): add QNN AOT support and improve QNN backend integr…

5174606

…ation

Merge branch 'main' into qnn-aot

3f63883

refactor(qnn): restore QNN allocator implementation

0d08209

oreomaker requested review from chenghuaWang, liang1232018, xumengwei and yirongjie as code owners January 1, 2026 05:50

coderabbitai bot reviewed Jan 1, 2026

View reviewed changes

Merge branch 'main' into qnn-aot

550380c

UbiquitousLearning approved these changes Jan 5, 2026

View reviewed changes

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

chenghuaWang merged commit 6ada683 into UbiquitousLearning:main Jan 5, 2026
5 checks passed

chenghuaWang changed the title ~~Qnn aot refactor~~ fix(qualcomm): Qnn AOT refactor Jan 6, 2026

fix(qualcomm): Qnn AOT refactor #578

fix(qualcomm): Qnn AOT refactor #578

Uh oh!

Conversation

oreomaker commented Jan 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

UbiquitousLearning left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oreomaker commented Jan 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 1, 2026 •

edited

Loading