-
Notifications
You must be signed in to change notification settings - Fork 159
fix(qualcomm): Qnn AOT refactor #578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughRefactors the QNN backend: standardizes SDK env var to QAIRT_SDK_ROOT, simplifies QNNAllocator memory management, adds datatype/quantization utilities, and rewires AOT compilation from direct qnn_interface calls to wrapper-based QNNModel abstractions. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Env as QnnAOTEnv
participant Model as QNNModel (wrapper)
participant Backend as QNN Backend / QNN Interface
participant Context as Qnn_Context
participant Graph as QNN Graph
Env->>Model: request/create graph (qnnInterface, backendHandle, contextHandle, graphName)
alt context present
Model->>Backend: initialize model (create graph state)
Backend-->>Model: model handle / capabilities
Model->>Context: attach context & allocate resources
Model-->>Env: return wrapped graph handle
Env->>Graph: compile via qnn_model_->finalizeGraph(...)
Graph-->>Model: finalize result
Model-->>Env: report success / graph ready
else missing context
Env-->>Env: return nullptr (graph not captured)
end
note right of Model: Quant/Type handling delegated\nto QNNTensorWrapper utilities
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
docs/qnn_backend/setup_env.rstexamples/qwen3_qnn_aot/CMakeLists.txtmllm/CMakeLists.txtmllm/backends/qnn/CMakeLists.txtmllm/backends/qnn/QNNAllocator.cppmllm/backends/qnn/QNNAllocator.hppmllm/backends/qnn/QNNUtils.cppmllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile
🧰 Additional context used
📓 Path-based instructions (5)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/CMakeLists.txtmllm/backends/qnn/custom-op-package/LLaMAPackage/Makefilemllm/CMakeLists.txtmllm/backends/qnn/QNNUtils.cppmllm/backends/qnn/QNNAllocator.hppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/QNNAllocator.cppmllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{toml,json,txt,md}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When suggesting new dependencies, evaluate their maturity, licensing, and maintenance status. Prefer well-established and actively maintained libraries.
Files:
mllm/backends/qnn/CMakeLists.txtmllm/CMakeLists.txt
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/QNNUtils.cppmllm/backends/qnn/QNNAllocator.hppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/QNNAllocator.cppmllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/QNNUtils.cppmllm/backends/qnn/QNNAllocator.cppmllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/QNNUtils.cppmllm/backends/qnn/QNNAllocator.hppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/QNNAllocator.cppmllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
🧬 Code graph analysis (4)
mllm/backends/qnn/QNNUtils.cpp (1)
mllm/backends/qnn/QNNUtils.hpp (8)
mllmDataTypeToQnnDataType(52-52)dataContainer_(204-204)qnnTensor_(196-196)scaleOffsets(208-208)blockwise(209-209)qnnDataTypeToSize(53-53)qnnParam_(242-242)qnnParam_(243-243)
mllm/backends/qnn/QNNAllocator.hpp (1)
mllm/backends/qnn/QNNAllocator.cpp (2)
QNNAllocator(16-23)QNNAllocator(25-35)
mllm/backends/qnn/QNNAllocator.cpp (1)
mllm/backends/qnn/QNNAllocator.hpp (16)
ptr(58-61)ptr(58-58)ptr(62-62)ptr(62-62)ptr(78-78)ptr(80-80)storage(49-49)storage(51-51)storage(51-51)storage(53-53)storage(55-55)storage(55-55)storage(64-69)storage(64-64)storage(71-71)storage(71-71)
mllm/backends/qnn/QNNUtils.hpp (1)
mllm/core/Tensor.cpp (2)
dtype(468-468)dtype(468-468)
🪛 Clang (14.0.6)
mllm/backends/qnn/QNNUtils.cpp
[error] 303-303: variable 'ret' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 415-415: 8 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 454-454: variable 'dataType' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 454-454: invalid case style for variable 'dataType'
(readability-identifier-naming,-warnings-as-errors)
[error] 504-504: 2 adjacent parameters of 'setScaleOffsetQuantization' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 515-515: 2 adjacent parameters of 'setBlockwiseQuantization' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 584-584: variable 'dataSize' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 584-584: invalid case style for variable 'dataSize'
(readability-identifier-naming,-warnings-as-errors)
mllm/backends/qnn/QNNAllocator.hpp
[error] 33-33: use '= default' to define a trivial destructor
(modernize-use-equals-default,-warnings-as-errors)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
[error] 51-51: parameter name 'v' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
🪛 Cppcheck (2.19.0)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
[error] 417-417: There is an unknown macro here somewhere. Configuration is required. If MLLM_RT_ASSERT_EQ is a macro then please configure it.
(unknownMacro)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-x86
- GitHub Check: build-android
- GitHub Check: build-macos
🔇 Additional comments (25)
docs/qnn_backend/setup_env.rst (1)
34-34: LGTM! Environment variable migration is clear and consistent.The documentation correctly updates all references from
QNN_SDK_ROOTtoQAIRT_SDK_ROOT, maintaining consistency across installation instructions, verification steps, and prerequisites.Also applies to: 71-71, 87-87
mllm/backends/qnn/CMakeLists.txt (2)
16-22: LGTM! AOT source inclusion is properly conditional.The conditional compilation logic correctly includes AOT sources only when
MLLM_QUALCOMM_QNN_AOT_ON_X86_ENABLEis set, usingGLOB_RECURSEto gather all relevant source files and appending them to the main source list.
35-35: LGTM! Status message updated consistently.The status message correctly references
QAIRT_SDK_ROOT, maintaining consistency with the environment variable migration.mllm/backends/qnn/custom-op-package/LLaMAPackage/Makefile (1)
4-5: LGTM! Makefile properly migrates to QAIRT_SDK_ROOT with robust error handling.The environment variable migration is cleanly executed with clear informational messages and proper fallback logic. The existing error checking (lines 7-20) ensures that missing SDK paths are caught early with helpful error messages.
Also applies to: 11-12
mllm/CMakeLists.txt (1)
117-119: LGTM! Backend inclusion properly consolidated.The centralized conditional inclusion using an OR operation correctly handles both QNN AOT and QNN backend scenarios, eliminating duplication and improving maintainability. The backend's own CMakeLists.txt now handles the conditional source inclusion internally.
examples/qwen3_qnn_aot/CMakeLists.txt (1)
2-2: LGTM! QNN backend properly linked.Adding
MllmQNNBackendto the link libraries correctly enables QNN AOT functionality. The removal of explicitQAIRT_SDK_ROOTinclude paths (noted in the summary) is appropriate since these are now provided transitively throughMllmQNNBackend's PUBLIC interface.mllm/backends/qnn/QNNAllocator.hpp (2)
33-40: Destructor cleanup logic looks correct but could silently fail partially.The destructor correctly iterates and cleans up resources. However, if
memDeRegisterfails for one entry, the error is logged but iteration continues. This is reasonable for a destructor (shouldn't throw), but consider whether partial cleanup failures could leave the system in an inconsistent state.Note: The static analysis suggestion to use
= defaultis a false positive since this destructor has non-trivial cleanup logic.
78-78: Signature change simplifies the interface.Changing from
Storage*tovoid*and removing the return type aligns with the refactored implementation that uses assertions for error conditions rather than return values. This is acceptable for internal APIs where failures are programming errors.mllm/backends/qnn/QNNUtils.hpp (2)
52-54: Good addition of centralized datatype utilities.These utility functions centralize datatype translation logic, reducing code duplication across the codebase. The declarations align with implementations in QNNUtils.cpp.
207-222: Quantization helpers with proper lifetime management.The new
setScaleOffsetQuantizationandsetBlockwiseQuantizationmethods, along with private storage membersscaleOffsets_andblockwiseExpansion_, ensure quantization parameter data outlives the tensor wrapper. This addresses potential dangling pointer issues when QNN references this data.mllm/backends/qnn/aot/visitor/Base.hpp (1)
17-22: Pattern delegation to compile() enables extensibility.The
rewrite()method now delegates to the pure virtualcompile()method, enabling derived classes to implement the actual rewriting logic. This is a clean abstraction that separates the pattern interface from the implementation.mllm/backends/qnn/QNNUtils.cpp (3)
487-502: Idempotent allocation with proper registration flow.The early return guard prevents double allocation, and the conditional check before
alloc()allows pre-allocated storage to be registered. The flow correctly obtains the QNN allocator from context and registers the tensor.
504-525: Quantization setters correctly manage memory lifetime.Both methods copy input data to member storage (
scaleOffsets_,blockwiseExpansion_) before setting pointers in the QNN tensor structure. This ensures the quantization data remains valid for the lifetime of the wrapper.
302-392: Verify kInt4/kUInt4 QNN datatype mapping against QNN SDK source.The
mllmDataTypeToQnnDataTypefunction comprehensively handles data types including quantized variants. However, the FIXME comment at line 349 ("Maybe error here") forkInt4/kUInt4mapping toQNN_DATATYPE_SFIXED_POINT_4/QNN_DATATYPE_UFIXED_POINT_4indicates unresolved uncertainty by the original author. Public QNN SDK documentation does not explicitly confirm this mapping. Verify against the Qualcomm QNN SDK header files (e.g.,QnnDataTypes.h) to confirm the mapping is correct, or update the code and FIXME comment based on findings.Note: The static analysis warning about
retbeing uninitialized is a false positive—it is correctly initialized toQNN_DATATYPE_UNDEFINEDon line 303.mllm/backends/qnn/QNNAllocator.cpp (4)
37-46: Simplified allocation path is clean.The allocation flow directly uses
allocSize(storage)and inserts the pointer into the tracking set. The assertion ensures allocation succeeded.
48-55: Free handles both registered and unregistered memory.The conditional check before deregistration correctly handles memory that may or may not have been registered with QNN. This is important for cleanup of storage that was allocated but never used with
registerQnnTensorToSharedBuffer.
57-92: Registration flow handles re-registration gracefully.The early return for already-registered pointers avoids duplicate registration while still updating the tensor's mem handle. The implementation correctly creates the memory descriptor and registers with QNN.
94-98: Deregistration asserts pointer exists before cleanup.The assertion
MLLM_RT_ASSERT_EQ(ptrToFdAndMemHandleMap_.count(ptr), 1)will abort if called with an unregistered pointer. This is appropriate for internal APIs where calling deregister on unregistered memory is a programming error.mllm/backends/qnn/aot/QnnWrappersAPI.hpp (3)
43-65: QnnAOTNodeTensor now uses wrapper abstraction.The transition from raw
Qnn_Tensor_ttoQNNTensorWrapperimproves resource management. ThegetWrapper()accessor provides access to the underlying wrapper.Note: The static analysis warning about parameter name 'v' being too short is a style preference that can be addressed later if desired.
85-108: Parameter handling uses wrapper types consistently.The
addParamScalar,emplaceParamScalar,addParamTensor, andemplaceParamTensormethods now acceptQNNParamScalarWrapperandQNNParamTensorWrappershared pointers, aligning with the wrapper-based architecture.
115-128: QnnAOTGraph delegates to QNNModel.The constructor now accepts explicit QNN interface handles and delegates graph management to
QNNModel. The privateqnn_model_member encapsulates the underlying QNN operations.mllm/backends/qnn/aot/QnnWrappersAPI.cpp (4)
24-34: Constructor delegates to wrapper creation methods.The constructor correctly determines whether to create a static or non-static tensor wrapper based on the
force_static_weightparameter and tensor type. This simplifies the tensor creation logic.
99-101: Datatype parsing now uses centralized utility.Delegating to
mllmDataTypeToQnnDataTypereduces code duplication and ensures consistent datatype mapping across the codebase.
239-266: QnnAOTGraph uses QNNModel for graph operations.The constructor creates a
QNNModeland initializes it with the context and graph name.addOperationadds tensors and nodes via the model, andcompile()callsfinalizeGraph(). This abstracts away direct QNN interface calls.
548-559: Graceful handling of missing context.Returning
nullptrinstead of asserting when the context is not found allows callers to handle the error condition more gracefully. This is an improvement over hard assertions for recoverable error cases.
| Qnn_BlockwiseExpansion_t blockwise_expansion; | ||
| blockwise_expansion.axis = cfg->ch_axis; | ||
| blockwise_expansion.axis = cfg->ch_axis; | ||
| blockwise_expansion.scaleOffsets = nullptr; // Will be set by setBlockwiseQuantization | ||
| blockwise_expansion.numBlocksPerAxis = v->tensor_.size(cfg->ch_axis) / cfg->block_size; | ||
| blockwise_expansion.blockScaleBitwidth = 12; // 12 bits for 4 to 16 expansion | ||
| blockwise_expansion.blockScaleStorageType = QNN_BLOCKWISE_EXPANSION_BITWIDTH_SCALE_STORAGE_8; | ||
| blockwise_expansion.blocksScale8 = cfg->scale_level_0_int.ptr<mllm_uint8_t>(); | ||
|
|
||
| ret.encodingDefinition = QNN_DEFINITION_DEFINED; | ||
| ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_BLOCKWISE_EXPANSION; | ||
| ret.blockwiseExpansion = block_scale_array; | ||
| tensor_wrapper_->setBlockwiseQuantization(blockwise_expansion, scale_offsets); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate assignment of blockwise_expansion.axis.
Line 158 and 159 both assign to blockwise_expansion.axis. This appears to be a copy-paste error.
🔎 Proposed fix to remove duplicate assignment
Qnn_BlockwiseExpansion_t blockwise_expansion;
blockwise_expansion.axis = cfg->ch_axis;
- blockwise_expansion.axis = cfg->ch_axis;
blockwise_expansion.scaleOffsets = nullptr; // Will be set by setBlockwiseQuantization🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 157 to 166 there is a
duplicate assignment to blockwise_expansion.axis (lines 158-159); remove the
redundant second assignment so axis is set only once (keep the correct
cfg->ch_axis assignment), leaving the rest of blockwise_expansion initialization
unchanged.
|
|
||
| target_include_directories(MllmQNNBackend PUBLIC | ||
| $ENV{QNN_SDK_ROOT}/include/QNN # QNN SDK include | ||
| $ENV{QAIRT_SDK_ROOT}/include/QNN # QNN SDK include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, check the CMakeLists.txt file around line 31
cat -n mllm/backends/qnn/CMakeLists.txt | head -50Repository: UbiquitousLearning/mllm
Length of output: 1379
🏁 Script executed:
# Search for all QAIRT_SDK_ROOT usage in CMake files
rg -n "QAIRT_SDK_ROOT" --type cmake -B 2 -A 2Repository: UbiquitousLearning/mllm
Length of output: 1040
🏁 Script executed:
# Search for validation patterns of environment variables in CMake files
rg -n "if.*NOT.*DEFINED.*ENV" --type cmake -B 1 -A 3Repository: UbiquitousLearning/mllm
Length of output: 399
🏁 Script executed:
# Search for Makefile references mentioned in the review
fd -e Makefile | head -20Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Check for QNN_SDK_ROOT references to understand migration context
rg -n "QNN_SDK_ROOT" -B 1 -A 1Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Search for how QAIRT_SDK_ROOT is set or documented
rg -n "QAIRT_SDK_ROOT" -B 3 -A 3Repository: UbiquitousLearning/mllm
Length of output: 4854
🏁 Script executed:
# Check if there's a custom-op-package directory and Makefile referenced in review
find . -type d -name "*custom*op*" | head -10
find . -name "Makefile" | head -20Repository: UbiquitousLearning/mllm
Length of output: 180
🏁 Script executed:
# Check if there's any validation or setup script for QAIRT_SDK_ROOT
fd -e "\.sh$|\.md$" | xargs rg -l "QAIRT_SDK_ROOT" 2>/dev/null | head -10Repository: UbiquitousLearning/mllm
Length of output: 243
Add validation for QAIRT_SDK_ROOT environment variable before use.
The CMakeLists.txt uses $ENV{QAIRT_SDK_ROOT} without checking if the environment variable is defined. While the associated Makefile in custom-op-package/LLaMAPackage/ validates this with error handling, the CMake build files do not. If QAIRT_SDK_ROOT is unset, the include path becomes /include/QNN, causing a confusing compile failure rather than a clear configuration error.
Add validation similar to other backends in the codebase:
if(NOT DEFINED ENV{QAIRT_SDK_ROOT})
message(FATAL_ERROR "QAIRT_SDK_ROOT environment variable is not set. Please set it to point to the Qualcomm AI SDK root directory.")
endif()This check should be added early in the CMakeLists.txt files (both mllm/backends/qnn/CMakeLists.txt and mllm/ffi/CMakeLists.txt) before the environment variable is referenced.
| size_t qnnDataTypeToSize(Qnn_DataType_t dtype) { | ||
| switch (dtype) { | ||
| case QNN_DATATYPE_INT_8: | ||
| case QNN_DATATYPE_UINT_8: | ||
| case QNN_DATATYPE_BOOL_8: | ||
| case QNN_DATATYPE_SFIXED_POINT_8: | ||
| case QNN_DATATYPE_UFIXED_POINT_8: return 1; | ||
|
|
||
| case QNN_DATATYPE_INT_16: | ||
| case QNN_DATATYPE_UINT_16: | ||
| case QNN_DATATYPE_FLOAT_16: | ||
| case QNN_DATATYPE_SFIXED_POINT_16: | ||
| case QNN_DATATYPE_UFIXED_POINT_16: return 2; | ||
|
|
||
| case QNN_DATATYPE_INT_32: | ||
| case QNN_DATATYPE_UINT_32: | ||
| case QNN_DATATYPE_FLOAT_32: | ||
| case QNN_DATATYPE_SFIXED_POINT_32: | ||
| case QNN_DATATYPE_UFIXED_POINT_32: return 4; | ||
|
|
||
| case QNN_DATATYPE_INT_64: | ||
| case QNN_DATATYPE_UINT_64: return 8; | ||
|
|
||
| default: | ||
| MLLM_ERROR("qnnDataTypeToSize: unsupported Qnn_DataType_t {}", static_cast<int>(dtype)); | ||
| MLLM_RT_ASSERT(false); | ||
| return 0; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size mapping covers most QNN datatypes.
The function handles common QNN datatypes correctly. However, it doesn't cover QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 which are used in mllmDataTypeToQnnDataType for kInt4/kUInt4. This could cause a runtime assertion failure if 4-bit types are used with qnnDataTypeToSize.
🔎 Proposed fix to add 4-bit type handling
size_t qnnDataTypeToSize(Qnn_DataType_t dtype) {
switch (dtype) {
+ // 4-bit types are packed, return 1 byte as minimum addressable unit
+ case QNN_DATATYPE_SFIXED_POINT_4:
+ case QNN_DATATYPE_UFIXED_POINT_4: return 1; // Note: actual size is 0.5 bytes
+
case QNN_DATATYPE_INT_8:
case QNN_DATATYPE_UINT_8:🧰 Tools
🪛 Clang (14.0.6)
[error] 415-415: 8 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/backends/qnn/QNNUtils.cpp around lines 394-422, qnnDataTypeToSize is
missing cases for QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4
which leads to the default assert if 4-bit types are passed; add those two enum
cases to the switch and return an appropriate byte-size (use 1 to represent the
storage unit) for each, and optionally add a brief comment noting these are
4-bit types packed two-per-byte so callers should handle packing/unpacking if
needed.
UbiquitousLearning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Fix all issues with AI Agents 🤖
In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp:
- Around line 261-266: The compile() implementation sets is_compiled_
unconditionally which can mark a failed compilation as successful; change
QnnAOTGraph::compile so it calls qnn_model_->finalizeGraph(...) and only sets
is_compiled_ = true when that call returns mllm::qnn::MODEL_NO_ERROR (i.e., on
success), returning the success boolean otherwise; keep the early-return when
is_compiled_ is already true.
In @mllm/backends/qnn/QNNUtils.hpp:
- Around line 218-221: The member Qnn_BlockwiseExpansion_t blockwiseExpansion_
is left uninitialized; update the class constructor to explicitly initialize
blockwiseExpansion_ (for example via the constructor's member initializer list
using blockwiseExpansion_{} or equivalent) so it has a well-defined state from
construction; keep setBlockwiseQuantization() usage unchanged but ensure
blockwiseExpansion_ is default-initialized in the constructor that declares
scaleOffsets_ and blockwiseExpansion_.
♻️ Duplicate comments (1)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
157-166: Duplicate assignment ofblockwise_expansion.axis.Lines 158 and 159 both assign to
blockwise_expansion.axis. This appears to be a copy-paste error.🔎 Proposed fix to remove duplicate assignment
Qnn_BlockwiseExpansion_t blockwise_expansion; blockwise_expansion.axis = cfg->ch_axis; - blockwise_expansion.axis = cfg->ch_axis; blockwise_expansion.scaleOffsets = nullptr; // Will be set by setBlockwiseQuantization
🧹 Nitpick comments (1)
mllm/backends/qnn/QNNUtils.hpp (1)
207-209: Consider adding detailed documentation for quantization methods.While a brief comment is present, these public methods would benefit from more comprehensive documentation explaining:
- When these methods should be called relative to
alloc()- Parameter constraints (e.g., valid range for
axis)- Expected behavior and side effects
- Example usage patterns
📝 Suggested documentation format
- // Helper to set complex quantization params and manage memory + /** + * @brief Sets scale-offset quantization parameters for the tensor + * @param scaleOffsets Vector of scale-offset pairs for quantization + * @param axis The axis along which quantization is applied (-1 for per-tensor) + * @note Must be called before alloc() to take effect + */ void setScaleOffsetQuantization(const std::vector<Qnn_ScaleOffset_t>& scaleOffsets, int32_t axis); + + /** + * @brief Sets blockwise quantization parameters for the tensor + * @param blockwise Blockwise expansion configuration + * @param scaleOffsets Vector of scale-offset pairs for blockwise quantization + * @note Must be called before alloc() to take effect + */ void setBlockwiseQuantization(const Qnn_BlockwiseExpansion_t& blockwise, const std::vector<Qnn_ScaleOffset_t>& scaleOffsets);
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
mllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
🧬 Code graph analysis (2)
mllm/backends/qnn/QNNUtils.hpp (2)
mllm/core/Tensor.hpp (1)
dtype(412-412)mllm/core/Tensor.cpp (2)
dtype(506-506)dtype(506-506)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
mllm/backends/qnn/QNNUtils.hpp (7)
name(185-186)name(189-190)create(255-257)create(255-255)mllmDataTypeToQnnDataType(52-52)__mllmQnnLoggerCallback(92-130)__mllmQnnLoggerCallback(92-92)mllm/backends/qnn/QNNUtils.cpp (8)
createStaticTensor(466-485)createStaticTensor(466-467)create(447-464)create(447-448)create(551-555)create(551-553)mllmDataTypeToQnnDataType(302-392)mllmDataTypeToQnnDataType(302-302)
🪛 Cppcheck (2.19.0)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
[error] 417-417: There is an unknown macro here somewhere. Configuration is required. If MLLM_RT_ASSERT_EQ is a macro then please configure it.
(unknownMacro)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-macos
- GitHub Check: build-android
- GitHub Check: build-x86
🔇 Additional comments (9)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (7)
24-34: Well-structured constructor refactor.The constructor now properly delegates tensor creation to wrapper methods based on tensor type, with clean separation between static and dynamic tensor handling.
99-101: Good centralization of data type conversion.Using the utility function
mllmDataTypeToQnnDataTypeimproves maintainability by centralizing type mapping logic.
122-136: Improved memory management for per-channel quantization.The refactor replaces manual memory allocation with
std::vectorand delegates quantization setup to the wrapper'ssetScaleOffsetQuantizationmethod, improving both safety and maintainability.
198-220: Well-designed API modernization.The parameter handling methods now use
std::shared_ptrfor wrapper types instead of raw pointers, providing automatic memory management and clearer ownership semantics.
239-243: Clean wrapper-based initialization.The constructor now properly delegates graph management to
QNNModel, following the new wrapper-based architecture.
245-259: Well-structured operation addition.The method cleanly gathers tensor names, registers wrappers with the model, and adds nodes using the wrapper-based API.
548-559: Proper error handling for missing context.The method correctly returns
nullptrwith an error log when the context is not found, and cleanly manages graph creation through the wrapper-based constructor.mllm/backends/qnn/QNNUtils.hpp (2)
23-23: Documentation updated to reflect SDK variable standardization.The comment now correctly references
QAIRT_SDK_ROOTinstead ofQNN_SDK_ROOT, consistent with the PR's standardization objectives.
52-53: Implementation confirmed with proper error handling.Both utility functions are implemented in
QNNUtils.cppwith comprehensive switch statements that handle data type conversions and size calculations. TheqnnDataTypeToSize()function includes explicit error handling for unsupported types viaMLLM_ERRORandMLLM_RT_ASSERT(false), andmllmDataTypeToQnnDataType()similarly validates input with error reporting for invalid data types.
| bool QnnAOTGraph::compile() { | ||
| if (is_compiled_) { return true; } | ||
|
|
||
| auto env = AOTCompileContext::getInstance().getEnv(); | ||
| auto qnn_interface = env->getFuncSymbol().qnn_interface_; | ||
| qnn_interface.graphFinalize(qnn_graph_handle_, env->getContext(belongs_context_name_)->profile_bk_handle_, nullptr); | ||
|
|
||
| bool ret = qnn_model_->finalizeGraph(nullptr, nullptr) == mllm::qnn::MODEL_NO_ERROR; | ||
| is_compiled_ = true; | ||
| return true; | ||
| return ret; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set is_compiled_ only on successful compilation.
Line 264 sets is_compiled_ = true unconditionally, even when finalizeGraph fails. This could lead to incorrect internal state where the graph appears compiled but actually failed.
🔎 Proposed fix to set flag only on success
bool QnnAOTGraph::compile() {
if (is_compiled_) { return true; }
bool ret = qnn_model_->finalizeGraph(nullptr, nullptr) == mllm::qnn::MODEL_NO_ERROR;
- is_compiled_ = true;
+ if (ret) { is_compiled_ = true; }
return ret;
}🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 261-266, The compile()
implementation sets is_compiled_ unconditionally which can mark a failed
compilation as successful; change QnnAOTGraph::compile so it calls
qnn_model_->finalizeGraph(...) and only sets is_compiled_ = true when that call
returns mllm::qnn::MODEL_NO_ERROR (i.e., on success), returning the success
boolean otherwise; keep the early-return when is_compiled_ is already true.
|
|
||
| // Storage for quantization parameters to ensure lifetime matches the tensor wrapper | ||
| std::vector<Qnn_ScaleOffset_t> scaleOffsets_; | ||
| Qnn_BlockwiseExpansion_t blockwiseExpansion_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify blockwiseExpansion_ initialization
echo "=== Finding QNNTensorWrapper class and constructor ==="
rg -n "class QNNTensorWrapper" --type=hpp -A 20 | head -50
echo -e "\n=== Searching for QNNTensorWrapper constructor in implementation files ==="
rg -n "QNNTensorWrapper::QNNTensorWrapper" --type=cpp
echo -e "\n=== Checking QNNUtils.cpp for constructor implementation ==="
find . -name "QNNUtils.cpp" -o -name "QNNUtils.cc"Repository: UbiquitousLearning/mllm
Length of output: 480
🏁 Script executed:
#!/bin/bash
# Read the constructor implementation
echo "=== QNNTensorWrapper constructor (lines 420-460) ==="
sed -n '420,460p' mllm/backends/qnn/QNNUtils.cpp
echo -e "\n=== Check for blockwiseExpansion_ initialization in constructor ==="
rg -n "blockwiseExpansion_" mllm/backends/qnn/QNNUtils.cpp
echo -e "\n=== Search for Qnn_BlockwiseExpansion_t definition ==="
rg -A 10 "Qnn_BlockwiseExpansion_t" --type-listRepository: UbiquitousLearning/mllm
Length of output: 2544
🏁 Script executed:
#!/bin/bash
# Read the header file around the member declarations
echo "=== QNNTensorWrapper header file members (lines 210-230) ==="
sed -n '210,230p' mllm/backends/qnn/QNNUtils.hpp
echo -e "\n=== Full setBlockwiseQuantization method ==="
rg -B 2 -A 15 "void setBlockwiseQuantization" mllm/backends/qnn/QNNUtils.cpp
echo -e "\n=== Search for Qnn_BlockwiseExpansion_t in header files ==="
rg "Qnn_BlockwiseExpansion_t" --type=h --type=hpp
echo -e "\n=== Check for Qnn_BlockwiseExpansion_t definition in QNN SDK ==="
find . -name "*.h" -o -name "*.hpp" | xargs grep -l "Qnn_BlockwiseExpansion" 2>/dev/nullRepository: UbiquitousLearning/mllm
Length of output: 1249
🏁 Script executed:
#!/bin/bash
# Search more carefully for the struct definition and setBlockwiseQuantization
echo "=== Full setBlockwiseQuantization method implementation ==="
grep -n "setBlockwiseQuantization" mllm/backends/qnn/QNNUtils.cpp
sed -n '515,535p' mllm/backends/qnn/QNNUtils.cpp
echo -e "\n=== Check if blockwiseExpansion_ is referenced elsewhere ==="
rg "blockwiseExpansion_" mllm/backends/qnn/QNNUtils.cpp
echo -e "\n=== Look for Qnn_BlockwiseExpansion_t in QNNUtils.hpp ==="
grep -n "Qnn_BlockwiseExpansion_t" mllm/backends/qnn/QNNUtils.hpp
echo -e "\n=== Search for where Qnn_BlockwiseExpansion_t comes from ==="
rg "#include" mllm/backends/qnn/QNNUtils.hpp | head -20
echo -e "\n=== Check if blockwiseExpansion_ needs initialization before use ==="
sed -n '440,530p' mllm/backends/qnn/QNNUtils.cppRepository: UbiquitousLearning/mllm
Length of output: 5988
Initialize blockwiseExpansion_ member in the constructor initialization list or with default constructor.
While blockwiseExpansion_ is only accessed after setBlockwiseQuantization() is called, leaving it uninitialized after construction is unsafe. Add explicit initialization in the constructor's member initializer list (e.g., blockwiseExpansion_{}) to follow C++ best practices and ensure the member has a well-defined state from construction.
🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNUtils.hpp around lines 218-221, The member
Qnn_BlockwiseExpansion_t blockwiseExpansion_ is left uninitialized; update the
class constructor to explicitly initialize blockwiseExpansion_ (for example via
the constructor's member initializer list using blockwiseExpansion_{} or
equivalent) so it has a well-defined state from construction; keep
setBlockwiseQuantization() usage unchanged but ensure blockwiseExpansion_ is
default-initialized in the constructor that declares scaleOffsets_ and
blockwiseExpansion_.
Build system and environment variable standardization:
Replaced all references to the
QNN_SDK_ROOTenvironment variable withQAIRT_SDK_ROOTin documentation and CMake files to standardize SDK path usage and avoid confusion. [1] [2] [3] [4] [5]Updated CMake build logic to modularize QNN backend source inclusion and subdirectory addition, ensuring that QNN sources are correctly included when building for Qualcomm QNN AOT or QNN backend support. [1] [2] [3] [4]
QNN backend code simplification and robustness:
Refactored
QNNAllocatorby removing unused buffer tracking, statistics, and fallback logic, and simplified the destructor to ensure proper deregistration and memory cleanup. TheregisterQnnTensorToSharedBufferinterface was also simplified. [1] [2] [3] [4]Simplified the
QNNTensorWrapperallocation logic, removing complex buffer reuse and fallback code, and added helpers for setting quantization parameters with proper memory management. [1] [2]Centralized and expanded QNN data type conversion logic, adding
mllmDataTypeToQnnDataTypeandqnnDataTypeToSizeutility functions for robust and maintainable type mapping. [1] [2] [3] [4]These changes collectively improve maintainability, reduce potential for errors, and make the QNN backend codebase cleaner and easier to understand.
Summary by CodeRabbit
Documentation
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.