-
Notifications
You must be signed in to change notification settings - Fork 160
feat(qualcomm): Qnn AOT Lowering pass #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…iew operations - Add new QnnAOTCastTypePattern to handle CastType operations with quantization support for float/int conversions and QNN Quantize/Dequantize/Cast operations - Add new QnnAOTEmbeddingPattern to convert embedding operations to QNN Gather - Add new QnnAOTIndexPattern to handle index operations using QNN Gather with proper axis detection - Add new QnnAOTViewPattern to convert view operations to QNN Reshape - Refactor LLM2QnnLoweringPass to use template-based pattern registration instead of manual insertion - Update Elewise visitor to add QnnAOTMulPattern for multiplication operations - Modify base pattern to use rewrite method instead of compile method
📝 WalkthroughWalkthroughThe PR adds five new QNN AOT visitor patterns (CastType, Embedding, Index, View, Mul), introduces a Sigmoid neural network layer, and refactors the pattern registration system to support bulk initialization. It also removes tensor caching and runtime registration side effects from QNN tensor creation and changes the base pattern class from compile-based to rewrite-based API. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Fix all issues with AI Agents 🤖
In @mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp:
- Line 24: registerPatterns is missing QnnAOTMulPattern registration even though
QnnAOTMulPattern is implemented (in Elewise.hpp) like QnnAOTAddPattern; update
the registerPatterns template instantiation to include QnnAOTMulPattern
alongside QnnAOTAddPattern (so the list becomes QnnAOTEmbeddingPattern,
QnnAOTCastTypePattern, QnnAOTAddPattern, QnnAOTMulPattern, QnnAOTViewPattern,
QnnAOTIndexPattern) to ensure multiplication ops are lowered by
LLM2QnnLoweringPass.
In @mllm/backends/qnn/aot/visitor/Embedding.cpp:
- Around line 34-35: The lookup of the weight symbol via
writer.getContext()->lookupSymbolTable(a_op->getName() + ".weight") can return
nullptr; validate the returned pointer before calling outputs().front() (the
code that assigns to variable weight). If lookup returns nullptr, handle it
(e.g., log or raise an informative error including a_op->getName(), or
return/throw) to avoid null dereference; only call
outputs().front()->cast_<ir::tensor::TensorValue>() when the symbol table
pointer is non-null. Ensure the error path provides clear context about the
missing "<opName>.weight" symbol.
In @mllm/backends/qnn/aot/visitor/View.cpp:
- Around line 38-55: The computed but unused shape variables should be removed:
delete the local variables `shape_data` and `shape_tensor_shape` and the loop
that fills `shape_data` (the for-loop over `output_shape`) in View.cpp so only
the output shape query remains; keep the QNN Reshape op creation
(`QnnAOTNodeOperation::create("Reshape")`), package name set, inputs/outputs via
`env->captureQnnAOTNodeTensor(...)`, `setName(view_op->getAOp()->getName())`,
and the final `env->captureAOTNodeOp(...)` call unchanged.
🧹 Nitpick comments (4)
mllm/backends/qnn/aot/visitor/CastType.cpp (1)
15-25: Consider using a switch statement or lookup for maintainability.The
isIntfunction has a long chain of OR conditions covering all integer and quantized types. While correct, a more maintainable approach could use a switch statement or a set-based lookup.🔎 Alternative implementation
static bool isFloat(DataTypes dtype) { return dtype == kFloat32 || dtype == kFloat16; } static bool isInt(DataTypes dtype) { - return dtype == kInt8 || dtype == kInt16 || dtype == kInt32 || dtype == kUInt8 || dtype == kUInt16 || dtype == kUInt32 - || dtype == kInt8PerTensorSym || dtype == kInt8PerChannelSym || dtype == kUInt8PerTensorSym - || dtype == kUInt8PerChannelSym || dtype == kInt16PerTensorSym || dtype == kInt16PerChannelSym - || dtype == kUInt16PerTensorSym || dtype == kUInt16PerChannelSym || dtype == kInt8PerTensorAsy - || dtype == kInt8PerChannelAsy || dtype == kUInt8PerTensorAsy || dtype == kUInt8PerChannelAsy - || dtype == kInt16PerTensorAsy || dtype == kInt16PerChannelAsy || dtype == kUInt16PerTensorAsy - || dtype == kUInt16PerChannelAsy; + switch (dtype) { + case kInt8: case kInt16: case kInt32: + case kUInt8: case kUInt16: case kUInt32: + case kInt8PerTensorSym: case kInt8PerChannelSym: + case kUInt8PerTensorSym: case kUInt8PerChannelSym: + case kInt16PerTensorSym: case kInt16PerChannelSym: + case kUInt16PerTensorSym: case kUInt16PerChannelSym: + case kInt8PerTensorAsy: case kInt8PerChannelAsy: + case kUInt8PerTensorAsy: case kUInt8PerChannelAsy: + case kInt16PerTensorAsy: case kInt16PerChannelAsy: + case kUInt16PerTensorAsy: case kUInt16PerChannelAsy: + return true; + default: + return false; + } }mllm/compile/ir/linalg/Op.hpp (1)
31-31: Duplicate forward declaration ofSigmoidOp.
SigmoidOpis forward-declared both at line 31 and line 83. The duplicate at line 83 appears to be a remnant. Consider removing one to avoid redundancy.🔎 Proposed fix
Remove line 83:
class WhereOp; -class SigmoidOp; } // namespace mllmAlso applies to: 83-83
mllm/backends/qnn/aot/visitor/Index.cpp (1)
47-76: Axis detection logic is correct but complex—consider a brief inline comment.The loop correctly identifies the gather axis by skipping "select all" slices while incrementing the axis counter, then breaks upon finding vector or tensor indices. The early exit and error handling for multiple axes are well implemented.
A brief comment explaining the axis-counting strategy could improve readability for future maintainers.
mllm/backends/qnn/aot/visitor/Elewise.cpp (1)
48-81: LGTM with optional refactor opportunity.
QnnAOTMulPattern::rewritecorrectly implements element-wise multiplication support usingElementWiseMultiply. The implementation mirrorsAddPatternwhich is acceptable for clarity.Optional: Consider extracting a shared helper for the common boilerplate (env retrieval, attribute validation, input/output extraction, node registration) if more element-wise patterns are added in the future.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
mllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/aot/visitor/CastType.cppmllm/backends/qnn/aot/visitor/CastType.hppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/visitor/Embedding.cppmllm/backends/qnn/aot/visitor/Embedding.hppmllm/backends/qnn/aot/visitor/Index.cppmllm/backends/qnn/aot/visitor/Index.hppmllm/backends/qnn/aot/visitor/View.cppmllm/backends/qnn/aot/visitor/View.hppmllm/compile/ir/linalg/Op.cppmllm/compile/ir/linalg/Op.hppmllm/nn/Nn.hppmllm/nn/layers/Sigmoid.cppmllm/nn/layers/Sigmoid.hpp
💤 Files with no reviewable changes (1)
- mllm/backends/qnn/aot/QnnWrappersAPI.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/visitor/Embedding.hppmllm/backends/qnn/aot/visitor/CastType.hppmllm/backends/qnn/aot/visitor/View.cppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/visitor/Index.hppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/aot/visitor/Embedding.cppmllm/backends/qnn/aot/visitor/CastType.cppmllm/backends/qnn/aot/visitor/View.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/visitor/Index.cppmllm/nn/layers/Sigmoid.cppmllm/compile/ir/linalg/Op.hppmllm/nn/Nn.hppmllm/compile/ir/linalg/Op.cppmllm/nn/layers/Sigmoid.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/visitor/Embedding.hppmllm/backends/qnn/aot/visitor/CastType.hppmllm/backends/qnn/aot/visitor/View.cppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/visitor/Index.hppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/aot/visitor/Embedding.cppmllm/backends/qnn/aot/visitor/CastType.cppmllm/backends/qnn/aot/visitor/View.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/visitor/Index.cppmllm/nn/layers/Sigmoid.cppmllm/compile/ir/linalg/Op.hppmllm/nn/Nn.hppmllm/compile/ir/linalg/Op.cppmllm/nn/layers/Sigmoid.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/visitor/Embedding.hppmllm/backends/qnn/aot/visitor/CastType.hppmllm/backends/qnn/aot/visitor/View.cppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/visitor/Index.hppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/aot/visitor/Embedding.cppmllm/backends/qnn/aot/visitor/CastType.cppmllm/backends/qnn/aot/visitor/View.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/visitor/Index.cppmllm/nn/layers/Sigmoid.cppmllm/compile/ir/linalg/Op.hppmllm/nn/Nn.hppmllm/compile/ir/linalg/Op.cppmllm/nn/layers/Sigmoid.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/visitor/View.cppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Embedding.cppmllm/backends/qnn/aot/visitor/CastType.cppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/visitor/Index.cppmllm/nn/layers/Sigmoid.cppmllm/compile/ir/linalg/Op.cpp
🧬 Code graph analysis (12)
mllm/backends/qnn/aot/visitor/Embedding.hpp (1)
mllm/backends/qnn/aot/visitor/Base.hpp (8)
op(17-17)op(17-17)op(26-26)op(26-26)writer(19-19)writer(19-19)writer(28-28)writer(28-28)
mllm/backends/qnn/aot/visitor/CastType.hpp (5)
mllm/backends/qnn/aot/visitor/Base.hpp (8)
op(17-17)op(17-17)op(26-26)op(26-26)writer(19-19)writer(19-19)writer(28-28)writer(28-28)mllm/backends/qnn/aot/visitor/Elewise.hpp (4)
op(14-14)op(25-25)writer(16-16)writer(27-27)mllm/backends/qnn/aot/visitor/Embedding.hpp (2)
op(14-14)writer(16-16)mllm/backends/qnn/aot/visitor/Index.hpp (2)
op(14-14)writer(16-16)mllm/backends/qnn/aot/visitor/View.hpp (2)
op(14-14)writer(16-16)
mllm/backends/qnn/aot/visitor/View.cpp (9)
mllm/backends/qnn/aot/visitor/CastType.cpp (4)
isMatch(27-29)isMatch(27-27)rewrite(31-81)rewrite(31-31)mllm/backends/qnn/aot/visitor/Elewise.cpp (8)
isMatch(13-15)isMatch(13-13)isMatch(48-50)isMatch(48-48)rewrite(17-46)rewrite(17-17)rewrite(52-81)rewrite(52-52)mllm/backends/qnn/aot/visitor/Embedding.cpp (4)
isMatch(13-15)isMatch(13-13)rewrite(17-59)rewrite(17-17)mllm/backends/qnn/aot/visitor/Index.cpp (4)
isMatch(14-16)isMatch(14-14)rewrite(18-99)rewrite(18-18)mllm/backends/qnn/aot/visitor/Base.hpp (8)
op(17-17)op(17-17)op(26-26)op(26-26)writer(19-19)writer(19-19)writer(28-28)writer(28-28)mllm/backends/qnn/aot/visitor/Elewise.hpp (4)
op(14-14)op(25-25)writer(16-16)writer(27-27)mllm/backends/qnn/aot/visitor/Embedding.hpp (2)
op(14-14)writer(16-16)mllm/backends/qnn/aot/visitor/Index.hpp (2)
op(14-14)writer(16-16)mllm/backends/qnn/aot/visitor/View.hpp (2)
op(14-14)writer(16-16)
mllm/backends/qnn/aot/visitor/Elewise.hpp (9)
mllm/backends/qnn/aot/visitor/Base.hpp (8)
writer(19-19)writer(19-19)writer(28-28)writer(28-28)op(17-17)op(17-17)op(26-26)op(26-26)mllm/backends/qnn/aot/visitor/CastType.hpp (2)
writer(16-16)op(14-14)mllm/backends/qnn/aot/visitor/Embedding.hpp (2)
writer(16-16)op(14-14)mllm/backends/qnn/aot/visitor/Index.hpp (2)
writer(16-16)op(14-14)mllm/backends/qnn/aot/visitor/View.hpp (2)
writer(16-16)op(14-14)mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)
op(22-22)mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
op(17-17)mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
op(17-17)mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
op(17-17)
mllm/backends/qnn/aot/visitor/Elewise.cpp (2)
mllm/backends/qnn/aot/visitor/Base.hpp (8)
writer(19-19)writer(19-19)writer(28-28)writer(28-28)op(17-17)op(17-17)op(26-26)op(26-26)mllm/backends/qnn/aot/QnnWrappersAPI.hpp (3)
qnn_context_name(216-216)qnn_context_name(218-219)qnn_context_name(221-222)
mllm/backends/qnn/aot/visitor/Base.hpp (6)
mllm/backends/qnn/aot/visitor/CastType.hpp (1)
writer(16-16)mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
writer(16-16)writer(27-27)mllm/backends/qnn/aot/visitor/Embedding.hpp (1)
writer(16-16)mllm/backends/qnn/aot/visitor/Index.hpp (1)
writer(16-16)mllm/backends/qnn/aot/visitor/View.hpp (1)
writer(16-16)mllm/compile/ir/linalg/Attribute.hpp (6)
node(27-27)node(27-27)node(327-327)node(327-327)node(348-348)node(348-348)
mllm/backends/qnn/aot/visitor/CastType.cpp (1)
mllm/backends/qnn/aot/visitor/CastType.hpp (2)
op(14-14)writer(16-16)
mllm/backends/qnn/aot/visitor/View.hpp (6)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)
op(22-22)mllm/backends/qnn/aot/visitor/Base.hpp (8)
op(17-17)op(17-17)op(26-26)op(26-26)writer(19-19)writer(19-19)writer(28-28)writer(28-28)mllm/backends/qnn/aot/visitor/CastType.hpp (2)
op(14-14)writer(16-16)mllm/backends/qnn/aot/visitor/Elewise.hpp (4)
op(14-14)op(25-25)writer(16-16)writer(27-27)mllm/backends/qnn/aot/visitor/Embedding.hpp (2)
op(14-14)writer(16-16)mllm/backends/qnn/aot/visitor/Index.hpp (2)
op(14-14)writer(16-16)
mllm/backends/qnn/aot/visitor/Index.cpp (3)
mllm/backends/qnn/aot/visitor/CastType.cpp (4)
isMatch(27-29)isMatch(27-27)rewrite(31-81)rewrite(31-31)mllm/backends/qnn/aot/visitor/Embedding.cpp (4)
isMatch(13-15)isMatch(13-13)rewrite(17-59)rewrite(17-17)mllm/backends/qnn/aot/visitor/View.cpp (4)
isMatch(14-16)isMatch(14-14)rewrite(18-58)rewrite(18-18)
mllm/nn/layers/Sigmoid.cpp (1)
mllm/nn/layers/Sigmoid.hpp (2)
Sigmoid(13-13)Sigmoid(15-15)
mllm/compile/ir/linalg/Op.hpp (2)
mllm/core/aops/SigmoidOp.cpp (1)
SigmoidOp(12-12)mllm/core/aops/SigmoidOp.hpp (1)
SigmoidOp(15-15)
mllm/compile/ir/linalg/Op.cpp (2)
mllm/core/aops/SigmoidOp.cpp (1)
SigmoidOp(12-12)mllm/core/aops/SigmoidOp.hpp (1)
SigmoidOp(15-15)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/visitor/Embedding.hpp
[error] 14-14: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 16-16: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/visitor/CastType.hpp
[error] 14-14: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 16-16: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/visitor/Elewise.hpp
[error] 16-16: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 25-25: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 27-27: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/visitor/Index.hpp
[error] 14-14: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 16-16: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/visitor/Base.hpp
[error] 19-19: method 'rewrite' can be made static
(readability-convert-member-functions-to-static,-warnings-as-errors)
mllm/backends/qnn/aot/visitor/View.hpp
[error] 14-14: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 16-16: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/compile/ir/linalg/Op.hpp
[error] 190-190: parameter name 'p' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 190-190: class 'SigmoidOp' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-android
- GitHub Check: build-macos
- GitHub Check: build-x86
🔇 Additional comments (25)
mllm/backends/qnn/aot/visitor/Embedding.cpp (2)
13-15: LGTM!The pattern matching logic correctly identifies
EmbeddingOpwith theusing_qnnattribute.
17-29: LGTM!The validation logic is thorough, with proper error handling for invalid casts and missing attributes.
mllm/backends/qnn/aot/visitor/Embedding.hpp (1)
12-21: LGTM!The class design follows the established pattern for QNN AOT visitors, with consistent
isMatch,rewrite, and factorycreate()methods. The structure aligns well with other pattern classes in the codebase.mllm/backends/qnn/aot/visitor/CastType.cpp (3)
27-29: LGTM!The pattern matching correctly identifies
CastTypeOpwith theusing_qnnattribute.
31-58: LGTM!The validation and setup logic is thorough, with proper error handling for casts and missing attributes.
59-69: Verify coverage of all required cast types.The current logic supports:
- Float → Int (Quantize)
- Int → Float (Dequantize)
- Float32 → Float16 (Cast)
However, it does not handle:
- Float16 → Float32 conversions
- Int → Int conversions (e.g., int8 → int32)
If these conversions are expected in the pipeline, they should be handled. If they are intentionally unsupported, consider adding a comment explaining why.
Additionally, the error message on Line 67 logs dtype values as integers. Consider logging the enum names or symbolic representations for better debuggability.
mllm/backends/qnn/aot/visitor/Base.hpp (1)
19-19: LGTM!The refactoring from a compile-based to rewrite-based API is consistent with the new pattern implementations. The default no-op implementations allow derived classes to selectively override behavior.
Also applies to: 28-28
mllm/backends/qnn/aot/visitor/CastType.hpp (1)
12-21: LGTM!The class design follows the established pattern for QNN AOT visitors and is consistent with the other pattern classes (Embedding, Index, View). The factory method enables proper registration in the lowering pass.
mllm/nn/Nn.hpp (1)
13-13: LGTM!The Sigmoid layer inclusion is correctly placed in alphabetical order and follows the existing pattern with the IWYU pragma export comment.
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)
25-28: LGTM! Clean template-based refactor.The variadic template helper with fold expression is a clean approach to bulk pattern registration. This eliminates repetitive initialization code while maintaining type safety.
mllm/compile/ir/linalg/Op.cpp (1)
57-57: LGTM!The Sigmoid operation declaration is correctly placed alongside other activation functions (after SiLU) and properly integrates with the LINALG operation set.
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
14-19: LGTM! Clean includes for new visitor patterns.The new includes are properly organized and support the expanded pattern registration.
mllm/nn/layers/Sigmoid.cpp (1)
9-11: LGTM!The Sigmoid layer constructors are correctly implemented, properly delegating to the base Layer class with the appropriate operation type and options.
mllm/compile/ir/linalg/Op.hpp (1)
190-190: LGTM!The
SigmoidOpregistration viaLINALG_AOPS_DEFINEfollows the established pattern used by other activation operations (e.g.,SiLUOpat line 189). The static analysis warnings about short parameter names and missing special member functions are false positives—these are macro-generated and consistent with other operators in this file.mllm/nn/layers/Sigmoid.hpp (1)
1-21: LGTM!The
Sigmoidlayer follows the established pattern for NN layers in this codebase, using the standard macrosMLLM_LAYER_ANY_INPUTS_1_OUTPUTS_FORWARDandMLLM_LAYER_ENABLE_INPLACE_ATTRIBUTE. The dual constructor approach (default and options-based) provides flexibility consistent with other layers.mllm/backends/qnn/aot/visitor/Index.hpp (1)
12-21: LGTM!The
QnnAOTIndexPatternfollows the established visitor pattern structure withisMatch,rewrite, andcreate()factory method. The static analysis warnings about short parameter names (op) can be safely ignored—this naming convention is consistent across all QNN AOT visitors (Base.hpp, Embedding.hpp, CastType.hpp, View.hpp, Elewise.hpp).mllm/backends/qnn/aot/visitor/View.hpp (1)
12-21: LGTM!The
QnnAOTViewPatternmirrors the structure of other AOT patterns. The static analysis warning about theopparameter name is a false positive given the codebase convention.mllm/backends/qnn/aot/visitor/Index.cpp (2)
14-16: LGTM!The
isMatchimplementation correctly checks forlinalg::IndexOpwith theusing_qnnattribute, following the same pattern as other AOT visitors.
18-99: LGTM!The
rewriteimplementation properly:
- Validates required attributes (
qnn_graph_name,qnn_context_name)- Handles casting failures with appropriate error messages
- Constructs the QNN Gather node with correct input/output wiring
- Attaches the
axisscalar parameterThe error handling covers edge cases (missing indices, multiple axes) appropriately.
mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
16-16: API change fromcompiletorewritealigns with base class.The method rename aligns with
QnnAOTBasePattern::rewritein Base.hpp (line 18). This is a consistent API change.
23-32: LGTM!
QnnAOTMulPatternfollows the established structure ofQnnAOTAddPattern, providing symmetric support for element-wise multiplication. The static factory method correctly returnsOpTypes::kMul.mllm/backends/qnn/aot/visitor/Elewise.cpp (2)
17-46: LGTM!The method rename from
compiletorewritealigns with the base class interface. The implementation correctly creates anElementWiseAddQNN node with proper input/output wiring.
20-20:quant_recipeattribute is validated but not used.Both
AddPattern(line 20) andMulPattern(line 55) validate the presence ofquant_recipeviaMLLM_RETURN_FALSE_IF_NOT, but neither uses the value. If this is intentional gating (ensuring the op is properly marked for quantization), consider adding a brief comment. Otherwise, the check may be vestigial.Also applies to: 55-55
mllm/backends/qnn/aot/visitor/View.cpp (2)
14-16: LGTM!The
isMatchimplementation correctly identifiesViewOpoperations marked for QNN processing, consistent with other AOT visitor patterns.
18-58: Overall structure is well-implemented.The
rewritemethod follows the established pattern from other AOT visitors:
- Proper error handling with early returns (lines 21-25, 27-30)
- Consistent attribute validation using
MLLM_RETURN_FALSE_IF_NOTmacros- Correct input/output extraction and type casting
- Proper method chaining for QNN operation configuration
- Correct registration with the AOT context
Once the shape specification issue is resolved, this implementation will be consistent with the rest of the codebase.
| // TODO reg other patterns here. | ||
| // TODO reg other patterns here. | ||
| // TODO reg other patterns here. | ||
| registerPatterns<QnnAOTEmbeddingPattern, QnnAOTCastTypePattern, QnnAOTAddPattern, QnnAOTViewPattern, QnnAOTIndexPattern>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for QnnAOTMulPattern class definition
echo "=== Searching for QnnAOTMulPattern class definition ==="
ast-grep --pattern $'class QnnAOTMulPattern $$$' 2>/dev/null || rg -n 'class QnnAOTMulPattern' --type=cpp --type=hpp
# Search for Mul pattern references
echo -e "\n=== Searching for Mul pattern references ==="
rg -nC2 'QnnAOTMulPattern|MulPattern' --type=cpp --type=hpp | head -50
# Check the actual file at line 24
echo -e "\n=== Checking LLM2QnnLoweringPass.cpp around line 24 ==="
fd -e cpp 'LLM2QnnLoweringPass' --exec cat -n {} \; | head -50
# Search for all registerPatterns calls
echo -e "\n=== Searching for all registerPatterns calls ==="
rg -nC5 'registerPatterns' --type=cpp --type=hppRepository: UbiquitousLearning/mllm
Length of output: 2650
🏁 Script executed:
#!/bin/bash
# Search for QnnAOTMulPattern without file type restrictions
echo "=== Searching for QnnAOTMulPattern ==="
rg -r 'QnnAOTMulPattern' --max-count=20
# Look for Elewise visitor files
echo -e "\n=== Looking for Elewise visitor files ==="
fd 'Elewise' -type f
# Check Elewise.hpp if it exists
echo -e "\n=== Checking Elewise.hpp content ==="
find . -name 'Elewise.hpp' -type f -exec head -100 {} \;
# Search for MulPattern or Mul handler registration
echo -e "\n=== Searching for Mul handler in visitor files ==="
rg -n 'Mul' mllm/backends/qnn/aot/visitor/ 2>/dev/null || trueRepository: UbiquitousLearning/mllm
Length of output: 2779
🏁 Script executed:
#!/bin/bash
# Check if Elewise.hpp is included and if MulPattern import is there
echo "=== Checking LLM2QnnLoweringPass.hpp ==="
fd 'LLM2QnnLoweringPass.hpp' -type f -exec cat {} \;
# Search for any other pattern registrations
echo -e "\n=== Searching for other registerPatterns calls ==="
rg -n 'registerPatterns|QnnAOTMulPattern' mllm/backends/qnn/aot/passes/
# Check if there's a reason MulPattern might be excluded
echo -e "\n=== Checking git history or comments around line 24 ==="
rg -B5 -A5 'registerPatterns' mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppRepository: UbiquitousLearning/mllm
Length of output: 1063
🏁 Script executed:
#!/bin/bash
# Get the full content of LLM2QnnLoweringPass.hpp to understand registerPatterns
echo "=== Full LLM2QnnLoweringPass.hpp ==="
fd 'LLM2QnnLoweringPass.hpp' -type f -exec cat -n {} \;
# Check what patterns are imported in the .cpp file
echo -e "\n=== Checking all includes in LLM2QnnLoweringPass.cpp ==="
head -20 mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
# Verify Elewise.hpp exports both patterns
echo -e "\n=== Verifying Elewise.hpp exports ==="
fd 'Elewise.hpp' -type f -exec cat -n {} \; | head -40Repository: UbiquitousLearning/mllm
Length of output: 1264
🏁 Script executed:
#!/bin/bash
# Find the files first
echo "=== Finding header files ==="
find mllm/backends/qnn/aot/passes -name "*.hpp" -o -name "*.h"
find mllm/backends/qnn/aot/visitor -name "*.hpp" -o -name "*.h"
# Read LLM2QnnLoweringPass.hpp directly
echo -e "\n=== Content of LLM2QnnLoweringPass.hpp ==="
cat mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
# Read Elewise.hpp directly
echo -e "\n=== Content of Elewise.hpp (full) ==="
cat mllm/backends/qnn/aot/visitor/Elewise.hpp
# Check if MulPattern is mentioned anywhere in the registration context
echo -e "\n=== Search for MulPattern in all visitor files ==="
find mllm/backends/qnn/aot/visitor -type f \( -name "*.hpp" -o -name "*.cpp" \) -exec grep -l "Mul" {} \;Repository: UbiquitousLearning/mllm
Length of output: 3516
Add QnnAOTMulPattern to the registered patterns list.
QnnAOTMulPattern is fully defined and implemented in Elewise.hpp (matching the structure of QnnAOTAddPattern), but it is missing from the registerPatterns call on line 24. Since QnnAOTAddPattern is registered, QnnAOTMulPattern should be registered as well for consistency and to handle multiplication operations through the QNN backend.
Suggested change
registerPatterns<QnnAOTEmbeddingPattern, QnnAOTCastTypePattern, QnnAOTAddPattern, QnnAOTMulPattern, QnnAOTViewPattern, QnnAOTIndexPattern>();🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp around line 24,
registerPatterns is missing QnnAOTMulPattern registration even though
QnnAOTMulPattern is implemented (in Elewise.hpp) like QnnAOTAddPattern; update
the registerPatterns template instantiation to include QnnAOTMulPattern
alongside QnnAOTAddPattern (so the list becomes QnnAOTEmbeddingPattern,
QnnAOTCastTypePattern, QnnAOTAddPattern, QnnAOTMulPattern, QnnAOTViewPattern,
QnnAOTIndexPattern) to ensure multiplication ops are lowered by
LLM2QnnLoweringPass.
| auto weight = | ||
| writer.getContext()->lookupSymbolTable(a_op->getName() + ".weight")->outputs().front()->cast_<ir::tensor::TensorValue>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for symbol table lookup.
The symbol table lookup could return nullptr if the weight symbol is not found, leading to a null pointer dereference when calling outputs().front(). Add validation to ensure the symbol exists before accessing its outputs.
🔎 Proposed fix
- auto weight =
- writer.getContext()->lookupSymbolTable(a_op->getName() + ".weight")->outputs().front()->cast_<ir::tensor::TensorValue>();
+ auto weight_symbol = writer.getContext()->lookupSymbolTable(a_op->getName() + ".weight");
+ if (!weight_symbol) {
+ MLLM_ERROR("Failed to find weight symbol for embedding: {}.weight", a_op->getName());
+ return false;
+ }
+ auto weight = weight_symbol->outputs().front()->cast_<ir::tensor::TensorValue>();🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/visitor/Embedding.cpp around lines 34-35, The lookup
of the weight symbol via writer.getContext()->lookupSymbolTable(a_op->getName()
+ ".weight") can return nullptr; validate the returned pointer before calling
outputs().front() (the code that assigns to variable weight). If lookup returns
nullptr, handle it (e.g., log or raise an informative error including
a_op->getName(), or return/throw) to avoid null dereference; only call
outputs().front()->cast_<ir::tensor::TensorValue>() when the symbol table
pointer is non-null. Ensure the error path provides clear context about the
missing "<opName>.weight" symbol.
| // Create Shape Tensor | ||
| auto output_shape = output->tensor_.shape(); | ||
| std::vector<int32_t> shape_data; | ||
| for (auto dim : output_shape) { shape_data.push_back(static_cast<int32_t>(dim)); } | ||
|
|
||
| // Shape tensor shape is [rank(output)] | ||
| std::vector<int32_t> shape_tensor_shape = {static_cast<int32_t>(shape_data.size())}; | ||
|
|
||
| // Create QNN Reshape Op | ||
| auto qnn_op_node = QnnAOTNodeOperation::create("Reshape"); | ||
| qnn_op_node->setPackageName("qti.aisw"); | ||
|
|
||
| qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, input)) | ||
| ->emplaceOutput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, output)) | ||
| ->setName(view_op->getAOp()->getName()); | ||
|
|
||
| // Register this op node into one graph. | ||
| env->captureAOTNodeOp(qnn_context_name, qnn_graph_name, qnn_op_node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, locate and read the View.cpp file
find . -name "View.cpp" -path "*/qnn/aot/visitor/*" | head -5Repository: UbiquitousLearning/mllm
Length of output: 107
🏁 Script executed:
#!/bin/bash
# Search for QNN Reshape patterns and how other operations handle multiple inputs
rg -n "Reshape|RESHAPE" --type=cpp -A 15 -B 5 | head -100Repository: UbiquitousLearning/mllm
Length of output: 9304
🏁 Script executed:
#!/bin/bash
# Look for patterns of how emplaceInput and shape tensors are used
rg -n "emplaceInput.*shape|shape.*emplaceInput" --type=cpp -B 5 -A 5 | head -80Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Search for other similar visitor files to understand the pattern
fd -n "\.cpp$" -path "*/qnn/aot/visitor/*" | xargs wc -l | sort -nRepository: UbiquitousLearning/mllm
Length of output: 316
🏁 Script executed:
#!/bin/bash
# Read the View.cpp file to see the exact code
wc -l mllm/backends/qnn/aot/visitor/View.cppRepository: UbiquitousLearning/mllm
Length of output: 108
🏁 Script executed:
#!/bin/bash
# Read the specific lines mentioned in the review
sed -n '30,60p' mllm/backends/qnn/aot/visitor/View.cppRepository: UbiquitousLearning/mllm
Length of output: 1186
🏁 Script executed:
#!/bin/bash
# Search for how other visitor files handle shape tensors and emplaceInput
find mllm/backends/qnn/aot/visitor -name "*.cpp" | xargs grep -l "emplaceInput.*Tensor\|shape_tensor"Repository: UbiquitousLearning/mllm
Length of output: 274
🏁 Script executed:
#!/bin/bash
# Search for QNN Reshape operations more broadly
rg "QnnAOTNodeOperation.*Reshape|Reshape.*QnnAOTNodeOperation" --type=cpp -A 20 -B 5Repository: UbiquitousLearning/mllm
Length of output: 1476
🏁 Script executed:
#!/bin/bash
# Look for Embedding and Index patterns mentioned in the review to compare
find mllm/backends/qnn/aot/visitor -name "*.cpp" | xargs grep -l "Embedding\|Index"Repository: UbiquitousLearning/mllm
Length of output: 150
🏁 Script executed:
#!/bin/bash
# Read Embedding.cpp to see the pattern
cat -n mllm/backends/qnn/aot/visitor/Embedding.cppRepository: UbiquitousLearning/mllm
Length of output: 2813
🏁 Script executed:
#!/bin/bash
# Read Index.cpp to see the pattern
cat -n mllm/backends/qnn/aot/visitor/Index.cppRepository: UbiquitousLearning/mllm
Length of output: 4304
🏁 Script executed:
#!/bin/bash
# Search for other Reshape usages in QNN backend
find mllm/backends/qnn -name "*.cpp" -o -name "*.hpp" | xargs rg -i "reshape" -B 5 -A 10 | head -150Repository: UbiquitousLearning/mllm
Length of output: 10133
🏁 Script executed:
#!/bin/bash
# Look for QNN Reshape documentation or wrappers
find mllm/backends/qnn -name "*.hpp" | xargs rg "Reshape" -B 3 -A 5Repository: UbiquitousLearning/mllm
Length of output: 51
🏁 Script executed:
#!/bin/bash
# Check for any QNN API documentation or examples
find mllm/backends/qnn -name "*Reshape*" -o -name "*reshape*"Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Look at the QnnWrappersAPI to understand op creation patterns
find mllm/backends/qnn/aot -name "*Wrapper*" -o -name "*API*" | xargs ls -laRepository: UbiquitousLearning/mllm
Length of output: 248
🏁 Script executed:
#!/bin/bash
# Read the full View.cpp file to see context
cat -n mllm/backends/qnn/aot/visitor/View.cppRepository: UbiquitousLearning/mllm
Length of output: 2664
🏁 Script executed:
#!/bin/bash
# Check the QnnWrappersAPI to understand how Reshape nodes are created
head -100 mllm/backends/qnn/aot/QnnWrappersAPI.hppRepository: UbiquitousLearning/mllm
Length of output: 3715
🏁 Script executed:
#!/bin/bash
# Look for how the qnn_op_node is processed after creation
rg -n "captureAOTNodeOp" --type=cpp -B 3 -A 10 | head -80Repository: UbiquitousLearning/mllm
Length of output: 5687
🏁 Script executed:
#!/bin/bash
# Search for any other Reshape patterns in the codebase
rg -i "reshape" mllm/backends/qnn --type=cpp --type=hpp -B 5 -A 10 | grep -A 15 "emplaceInput\|emplaceParam\|paramTensor\|paramScalar" | head -100Repository: UbiquitousLearning/mllm
Length of output: 94
🏁 Script executed:
#!/bin/bash
# Look for QNN documentation comments about Reshape
rg -i "reshape" mllm/backends/qnn/aot -B 3 -A 3 --type=cpp --type=hpp | grep -i "shape\|param"Repository: UbiquitousLearning/mllm
Length of output: 94
🏁 Script executed:
#!/bin/bash
# Check the entire QnnWrappersAPI.cpp to see how operations are finalized
sed -n '400,700p' mllm/backends/qnn/aot/QnnWrappersAPI.cppRepository: UbiquitousLearning/mllm
Length of output: 9376
🏁 Script executed:
#!/bin/bash
# Search for any documentation or comments about QNN Reshape behavior
rg -i "reshape.*identity\|identity.*reshape\|reshape.*shape\|reshape.*param" mllm/backends/qnn -A 3 -B 3 --type=cppRepository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Check if there are other AOT visitor patterns that compute but don't use variables
find mllm/backends/qnn/aot/visitor -name "*.cpp" | xargs rg -n "std::vector.*shape\|std::vector.*data" -A 5Repository: UbiquitousLearning/mllm
Length of output: 51
🏁 Script executed:
#!/bin/bash
# Verify the exact context around the "identity reshape" comment
sed -n '70,90p' mllm/backends/qnn/op/QNNViewOp.cppRepository: UbiquitousLearning/mllm
Length of output: 886
🏁 Script executed:
#!/bin/bash
# Check the entire graphAddNode signature to understand parameters
rg "graphAddNode" mllm/backends/qnn -B 5 -A 5 --type=cpp | head -60Repository: UbiquitousLearning/mllm
Length of output: 4080
🏁 Script executed:
#!/bin/bash
# Check how output shape is used in the QNN operations
rg "output.*shape\|output_shape" mllm/backends/qnn/aot/visitor -B 2 -A 2 --type=cppRepository: UbiquitousLearning/mllm
Length of output: 49
Remove unused shape computation variables.
Lines 38-44 compute shape_data and shape_tensor_shape, which are never used. QNN Reshape operations infer the target shape from the output tensor and do not require explicit shape parameters. Delete these unused variables.
- // Create Shape Tensor
- auto output_shape = output->tensor_.shape();
- std::vector<int32_t> shape_data;
- for (auto dim : output_shape) { shape_data.push_back(static_cast<int32_t>(dim)); }
-
- // Shape tensor shape is [rank(output)]
- std::vector<int32_t> shape_tensor_shape = {static_cast<int32_t>(shape_data.size())};
-
// Create QNN Reshape OpCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/visitor/View.cpp around lines 38-55, The computed but
unused shape variables should be removed: delete the local variables
`shape_data` and `shape_tensor_shape` and the loop that fills `shape_data` (the
for-loop over `output_shape`) in View.cpp so only the output shape query
remains; keep the QNN Reshape op creation
(`QnnAOTNodeOperation::create("Reshape")`), package name set, inputs/outputs via
`env->captureQnnAOTNodeTensor(...)`, `setName(view_op->getAOp()->getName())`,
and the final `env->captureAOTNodeOp(...)` call unchanged.
chenghuaWang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please check Guidelines for Contributing.
Summary by CodeRabbit
Release Notes
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.