-
Notifications
You must be signed in to change notification settings - Fork 159
feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization #572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added LLMQuantRecipePass to handle quantization recipes for various operations. - Introduced multiple pattern classes (Index, Elementwise, Transpose, Concat, Repeat, MatMul, Equal, Where, Softmax, Linear, View, Embedding, Qwen3 Attention) to match and rewrite specific Linalg operations. - Removed obsolete QuantPrecisionsCfgPass files. - Created SplitLLMGraphPass with a placeholder for future graph splitting logic. - Enhanced quantization attributes with detailed specifications and improved dumping functionality. - Updated RTTI generation to include new quantization spec classes. - Added utility macros for error handling in common operations.
📝 WalkthroughWalkthroughAdds a pattern-driven LLM quantization-recipe pass, two new QNN AOT passes (merge/split LLM head), makes the AOT pipeline config-driven, extends IR quantization RTTI/attributes and numeric types (4-bit, VectorInt16, QuantizationSpec), updates Region ownership types, refactors QNN AOT wrappers/visitor APIs, and updates build/examples to include FlatBuffers and LLM config changes. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as User/CLI
participant AOT as AOTPipeline
participant Pass as LLMQuantRecipePass
participant Merge as MergeLLMHeadPass
participant Env as QnnAOTEnv
participant Graph as QnnAOTGraph
CLI->>AOT: request compile (config with quant_recipe.llm_recipe=true)
AOT->>Pass: run LLMQuantRecipePass on IR
Note right of Pass `#e6f7ff`: pattern-match & annotate ops\nwith QuantizationSpec attrs
AOT->>Merge: run MergeLLMHeadIntoMainGraphPass
Merge->>AOT: relocate lm_head and rewire returns
AOT->>Env: lowering invokes captureAOTGraph / captureQnnAOTNodeTensor / captureAOTNodeOp
Env->>Graph: register tensors & ops into graph
Graph->>Env: compile graph (set is_compiled_)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (1)
mllm/compile/ir/linalg/Attribute.cpp (1)
251-254: Same parenthesis issue forkNonein the second lambda.This
kNonecase has the same missing closing parenthesis issue as the first lambda.
🧹 Nitpick comments (10)
mllm/utils/Common.hpp (1)
57-61: Consider adding brief documentation for the new macros.The new early-return guard macros are clear and follow existing patterns in the codebase. However, adding a brief comment explaining their purpose and typical usage would improve developer ergonomics.
📝 Suggested documentation
+// Early-return guards for boolean predicates #define MLLM_RETURN_FALSE_IF_NOT(x) \ if (!(x)) { return false; } #define MLLM_RETURN_FALSE_IF(x) \ if ((x)) { return false; }Note: Static analysis suggests using constexpr template functions instead of macros, but this is a stylistic preference. The macro approach is consistent with existing codebase conventions (e.g.,
NYI,RT_ASSERT,FORCE_INLINE) and is appropriate for simple control-flow guards.mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)
25-30: Consider breaking early after finding CallGraphOp.The walk asserts
call_main_graph_op == nullptron each iteration, meaning it expects exactly oneCallGraphOp. Consider returningWALK_BREAKafter finding the first one to avoid unnecessary iteration.🔎 Suggested optimization
writer.walk<ir::graph::CallGraphOp>( [&](ir::IRWriter& /*writer*/, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult { MLLM_RT_ASSERT_EQ(call_main_graph_op, nullptr); call_main_graph_op = call_op; - return ir::IRWriter::WalkResult::WALK_CONTINUE; + return ir::IRWriter::WalkResult::WALK_BREAK; });
82-106: Redundant assertions can be consolidated.Lines 83-84 and 98-99 perform identical assertions (
inputs().size() == 1andoutputs().size() == 1). Since the assertions at lines 83-84 already validate these invariants, the second set at lines 98-99 is redundant.🔎 Suggested consolidation
// Make sure the output is right { MLLM_RT_ASSERT_EQ(llm_head_op->inputs().size(), 1); MLLM_RT_ASSERT_EQ(llm_head_op->outputs().size(), 1); auto llm_head_input = llm_head_op->inputs().front(); auto llm_head_output = llm_head_op->outputs().front(); auto& graph_region_outs = main_graph->outputs(); // Replace llm_head_input with llm_head_output in graph outputs std::replace(graph_region_outs.begin(), graph_region_outs.end(), llm_head_input, llm_head_output); std::replace(return_op->inputs().begin(), return_op->inputs().end(), llm_head_input, llm_head_output); std::replace(main_graph->getTopRegion()->outputs().begin(), main_graph->getTopRegion()->outputs().end(), llm_head_input, llm_head_output); - } - // Make sure the call_graph_op's output is right - { - MLLM_RT_ASSERT_EQ(llm_head_op->inputs().size(), 1); - MLLM_RT_ASSERT_EQ(llm_head_op->outputs().size(), 1); - auto llm_head_input = llm_head_op->inputs().front(); - auto llm_head_output = llm_head_op->outputs().front(); - auto& graph_region_outs = call_main_graph_op->outputs(); - - // Replace llm_head_input with llm_head_output in call graph outputs - std::replace(graph_region_outs.begin(), graph_region_outs.end(), llm_head_input, llm_head_output); + // Also update call_graph_op's outputs + auto& call_graph_outs = call_main_graph_op->outputs(); + std::replace(call_graph_outs.begin(), call_graph_outs.end(), llm_head_input, llm_head_output); }mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
21-21: Unused variable:writeris created but never used.The
IRWriterinstance is constructed but not utilized in the current placeholder implementation. Either remove it until the graph splitting logic is implemented, or mark it as[[maybe_unused]]to suppress compiler warnings.🔎 Suggested fix
auto module_op = op->cast_<ir::ModuleOp>(); - auto writer = ir::IRWriter(getCtx(), module_op->getTopRegion()); + [[maybe_unused]] auto writer = ir::IRWriter(getCtx(), module_op->getTopRegion()); // TODO: Implement graph splitting logic hereOr simply remove the line until it's needed.
23-23: Placeholder TODO noted.The pass is a placeholder for future graph-splitting logic. The TODO comment follows the coding guidelines format.
Would you like me to open an issue to track the implementation of this graph splitting logic?
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
11-18: Consider explicitly deleting copy/move operations for Rule of 5 compliance.Per cppcoreguidelines-special-member-functions, defining a destructor typically implies copy/move should also be addressed. Since passes are meant to be used via
shared_ptrfrom the factory, explicitly deleting copy/move operations clarifies intent and prevents accidental slicing.🔎 Proposed fix
class SplitLLMGraphPass final : public ir::Pass { public: SplitLLMGraphPass() = default; + SplitLLMGraphPass(const SplitLLMGraphPass&) = delete; + SplitLLMGraphPass& operator=(const SplitLLMGraphPass&) = delete; + SplitLLMGraphPass(SplitLLMGraphPass&&) = delete; + SplitLLMGraphPass& operator=(SplitLLMGraphPass&&) = delete; ~SplitLLMGraphPass() override = default; uint8_t run(const ir::node_ptr_t& op) override; };mllm/compile/ir/linalg/Attribute.cpp (1)
166-270: Extract duplicatedgen_quant_spec_strlambda into a shared utility.The
gen_quant_spec_strlambda is duplicated verbatim in bothLinalgIRQuantizatonAnnotationAttr::dump(lines 26-126) andLinalgIRQuantizatonSpecAttr::dump(lines 167-267). Consider extracting this into a free function or static method to avoid code duplication and simplify future maintenance.mllm/compile/ir/linalg/Attribute.hpp (1)
325-344: Consider applying Rule of 5 for consistency.Per static analysis, this class defines a non-default destructor but lacks explicit copy/move operations. While the base class may handle this, explicitly deleting them would clarify intent. The short parameter name
pindump()is consistent with other dump methods in the codebase, so that hint can be ignored.🔎 Proposed fix
class LinalgIRQuantizatonSpecAttr final : public LinalgIRAttr { public: QuantizationSpec::ptr_t spec_; DEFINE_SPECIFIC_IR_CLASS(LinalgIRQuantizatonSpecAttr); ~LinalgIRQuantizatonSpecAttr() override; + LinalgIRQuantizatonSpecAttr(const LinalgIRQuantizatonSpecAttr&) = delete; + LinalgIRQuantizatonSpecAttr& operator=(const LinalgIRQuantizatonSpecAttr&) = delete; LinalgIRQuantizatonSpecAttr();mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
23-26: Multiple TODO patterns are placeholders.These patterns match but don't perform actual rewrites. Ensure these are tracked for completion before the quantization pipeline is used in production.
Would you like me to open issues to track completion of these TODO rewrite implementations?
Also applies to: 39-42, 52-55, 65-68, 78-81, 91-94, 104-107, 117-120, 130-133, 143-146
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
187-200: Consider explicit copy/move deletion for Rule of 5 compliance.Similar to other pass classes,
LLMQuantRecipePassdefines a destructor without explicit copy/move operations. Since passes are used viashared_ptr, explicitly deleting these operations clarifies intent.🔎 Proposed fix
class LLMQuantRecipePass final : public ir::Pass { public: LLMQuantRecipePass(); + LLMQuantRecipePass(const LLMQuantRecipePass&) = delete; + LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete; ~LLMQuantRecipePass() override = default;
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
examples/CMakeLists.txtexamples/qwen3_qnn_aot/compile.cppexamples/qwen3_qnn_aot/qnn_aot_cfg.jsonmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hppmllm/backends/qnn/aot/passes/QuantPrecisionsCfgPass.cppmllm/backends/qnn/aot/passes/QuantPrecisionsCfgPass.hppmllm/backends/qnn/aot/passes/SplitGraphPass.cppmllm/backends/qnn/aot/passes/SplitGraphPass.hppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.hppmllm/compile/ir/GeneratedRTTIKind.hppmllm/compile/ir/Node.cppmllm/compile/ir/Node.hppmllm/compile/ir/NodeRTTIClassOfImpl.hppmllm/compile/ir/linalg/Attribute.cppmllm/compile/ir/linalg/Attribute.hppmllm/compile/ir/rtti_kind_gen.pymllm/utils/Common.hpp
💤 Files with no reviewable changes (1)
- examples/qwen3_qnn_aot/compile.cpp
🧰 Additional context used
📓 Path-based instructions (5)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/utils/Common.hppmllm/compile/ir/rtti_kind_gen.pymllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cppmllm/compile/ir/GeneratedRTTIKind.hppmllm/compile/ir/Node.hppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.hppmllm/compile/ir/linalg/Attribute.hppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/compile/ir/Node.cppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/compile/ir/NodeRTTIClassOfImpl.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/utils/Common.hppmllm/compile/ir/rtti_kind_gen.pymllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cppmllm/compile/ir/GeneratedRTTIKind.hppmllm/compile/ir/Node.hppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.hppmllm/compile/ir/linalg/Attribute.hppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/compile/ir/Node.cppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/compile/ir/NodeRTTIClassOfImpl.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/utils/Common.hppmllm/compile/ir/rtti_kind_gen.pymllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cppmllm/compile/ir/GeneratedRTTIKind.hppmllm/compile/ir/Node.hppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.hppmllm/compile/ir/linalg/Attribute.hppmllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/compile/ir/Node.cppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/compile/ir/NodeRTTIClassOfImpl.hpp
{mllm,mllm-cli,pymllm}/**/*.{sh,py}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
If a file starts with a shebang, it must be the first line and contain only the interpreter path and optional single argument, encoded in UTF-8.
Files:
mllm/compile/ir/rtti_kind_gen.py
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/compile/ir/rtti_kind_gen.pymllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/compile/ir/Node.cppmllm/backends/qnn/aot/passes/SplitLLMGraphPass.cppmllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/AOTPipeline.cpp
🧠 Learnings (3)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Ensure functions that can fail return appropriate error codes or raise exceptions.
Applied to files:
mllm/utils/Common.hpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Use named constants instead of magic numbers.
Applied to files:
mllm/utils/Common.hpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Add comments for complex algorithms or non-obvious logic.
Applied to files:
examples/CMakeLists.txt
🧬 Code graph analysis (9)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (3)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (13)
op(18-18)op(30-30)op(44-44)op(58-58)op(72-72)op(86-86)op(100-100)op(112-112)op(124-124)op(138-138)op(152-152)op(164-164)op(178-178)mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (1)
op(17-17)mllm/compile/ir/Node.hpp (2)
op(468-468)op(470-470)
mllm/compile/ir/linalg/Attribute.hpp (1)
mllm/compile/ir/linalg/Attribute.cpp (3)
LinalgIRQuantizatonSpecAttr(160-160)LinalgIRQuantizatonSpecAttr(162-162)LinalgIRQuantizatonSpecAttr(164-164)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (2)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
op(17-17)mllm/compile/ir/Node.hpp (2)
op(468-468)op(470-470)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (4)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (43)
op(18-18)op(30-30)op(44-44)op(58-58)op(72-72)op(86-86)op(100-100)op(112-112)op(124-124)op(138-138)op(152-152)op(164-164)op(178-178)op(193-193)writer(20-20)writer(32-32)writer(46-46)writer(60-60)writer(74-74)writer(88-88)writer(102-102)writer(114-114)writer(126-126)writer(140-140)writer(154-154)writer(166-166)writer(180-180)LLMQuantRecipePass(189-189)LLMQuantRecipePass(191-191)p(195-195)make_shared(22-22)make_shared(34-36)make_shared(48-50)make_shared(62-64)make_shared(76-78)make_shared(90-92)make_shared(104-104)make_shared(116-116)make_shared(128-130)make_shared(142-144)make_shared(156-156)make_shared(168-170)make_shared(182-184)mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
op(17-17)mllm/compile/ir/linalg/Attribute.hpp (9)
node(27-27)node(27-27)node(322-322)node(322-322)node(343-343)node(343-343)create(76-76)p(318-318)p(337-337)mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
run(16-26)run(16-16)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (4)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)
run(16-109)run(16-16)mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
op(17-17)mllm/compile/ir/Node.hpp (4)
op(468-468)op(470-470)IRWriter(354-354)IRWriter(355-355)mllm/compile/ir/Node.cpp (1)
IRWriter(290-291)
mllm/compile/ir/linalg/Attribute.cpp (2)
mllm/core/DataTypes.hpp (1)
nameOfType(713-713)mllm/compile/ir/linalg/Attribute.hpp (12)
p(318-318)p(337-337)spec(99-104)spec(129-134)spec(159-164)spec(191-196)spec(225-230)spec(258-262)spec(293-297)ctx(320-320)ctx(339-339)ctx(341-341)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
LLMQuantRecipePass(293-311)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (4)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (2)
createMarkQnnGraphPass(68-68)createMarkQnnGraphPass(68-68)mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)
createOpNamingPass(88-88)createOpNamingPass(88-88)mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)
createMergeLLMHeadIntoMainGraphPass(111-111)createMergeLLMHeadIntoMainGraphPass(111-111)mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
createLLMQuantRecipePass(364-364)createLLMQuantRecipePass(364-364)
mllm/compile/ir/NodeRTTIClassOfImpl.hpp (1)
mllm/compile/ir/GeneratedRTTIKind.hpp (2)
v(161-161)v(161-161)
🪛 Clang (14.0.6)
mllm/utils/Common.hpp
[error] 57-57: function-like macro 'MLLM_RETURN_FALSE_IF_NOT' used; consider a 'constexpr' template function
(cppcoreguidelines-macro-usage,-warnings-as-errors)
[error] 60-60: function-like macro 'MLLM_RETURN_FALSE_IF' used; consider a 'constexpr' template function
(cppcoreguidelines-macro-usage,-warnings-as-errors)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp
[error] 11-11: class 'SplitLLMGraphPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
mllm/compile/ir/linalg/Attribute.hpp
[error] 325-325: class 'LinalgIRQuantizatonSpecAttr' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
[error] 337-337: parameter name 'p' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp
[error] 11-11: class 'MergeLLMHeadIntoMainGraphPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
[error] 18-18: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 30-30: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 44-44: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 58-58: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 72-72: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 86-86: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 100-100: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 112-112: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 124-124: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 138-138: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 152-152: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 164-164: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 178-178: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 187-187: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
[error] 195-195: parameter name 'p' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
[error] 340-340: function-like macro 'RTTI_RK_ATTR_LINALGIRATTR_QUANTIZATIONSPEC_IMPL' used; consider a 'constexpr' template function
(cppcoreguidelines-macro-usage,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-macos
- GitHub Check: build-android
- GitHub Check: build-x86
🔇 Additional comments (13)
examples/CMakeLists.txt (1)
6-7: The subdirectories are properly configured. Bothexamples/minicpm_oandexamples/minicpm4exist with valid CMakeLists.txt files and appropriate source code, confirming that theadd_subdirectory()calls are valid.mllm/compile/ir/GeneratedRTTIKind.hpp (1)
139-139: LGTM!The new
RK_Attr_LinalgIRAttr_QuantizationSpecenum value is correctly positioned betweenQuantizationAnnotationandLast, maintaining the RTTI hierarchy ordering required forclassofrange checks.mllm/compile/ir/rtti_kind_gen.py (1)
219-219: LGTM!The
QuantizationSpecattribute derivation is correctly added alongsideQuantizationAnnotation, ensuring the RTTI generator produces the corresponding enum value and macro in the generated files.mllm/compile/ir/Node.cpp (2)
63-65: LGTM!The implementation correctly aligns with the header changes, returning
std::list<node_weak_ptr_t>&for bothinputs()andoutputs().
298-311: Good addition: clearing belongsTo on op removal.The addition of
op->belongsTo() = nullptrensures proper cleanup of the parent relationship when an op is completely removed. This is intentional—removeOpis used for deletion with full cleanup, whileremoveOpWithoutEdgeCutis used for moving ops between regions where the parent reference will be updated by the subsequent insert function (e.g.,insertOpAtLast,insertOpAtPos).mllm/compile/ir/NodeRTTIClassOfImpl.hpp (1)
340-342: LGTM!The new
RTTI_RK_ATTR_LINALGIRATTR_QUANTIZATIONSPEC_IMPLmacro follows the established pattern in this auto-generated file. While static analysis suggests usingconstexprtemplate functions instead of macros (which is generally good advice), this would require refactoring the entire RTTI generation system and is outside the scope of this change.mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (2)
11-18: LGTM!The pass class follows the established pattern for IR passes in this codebase. Regarding the static analysis hint about the rule of five: since this class has no data members and is primarily used via
shared_ptrthrough the factory function, the default-generated copy/move operations are safe.
17-17: No action needed: return types are already consistent across all passes.Verification shows that all passes in the codebase—including
MergeLLMHeadIntoMainGraphPassandSplitLLMGraphPass—consistently useuint8_tas the return type for therun()method. TheSplitLLMGraphPass.hppfile at line 17 containsuint8_t run(const ir::node_ptr_t& op) override;, which is identical in style to the method in question. No inconsistency exists.Likely an incorrect or invalid review comment.
mllm/compile/ir/Node.hpp (1)
125-127: Breaking API change is properly implemented.The
Region::inputs()andRegion::outputs()methods returnstd::list<node_weak_ptr_t>&and all usages throughout the codebase have been updated to work with this type. Callers properly iterate over the lists, assign them directly, or explicitly cast toValvia->cast_<ir::Val>()when the underlying value is needed. No incompatibilities remain.examples/qwen3_qnn_aot/qnn_aot_cfg.json (1)
18-46: LGTM!The new
builtin_llm_passstructure is well-organized with explicit configuration for model type, lm_head, linear, and kv_cache quantization settings. The per-field descriptors with method, symmetry, and precision provide clear and maintainable configuration.mllm/compile/ir/linalg/Attribute.hpp (1)
65-77: LGTM!The
QuantizationSpecRawstruct follows the established pattern of other quantization spec types with proper UUID assignment and factory methods.mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
293-311: Unusedconfigvariable except for model check.The
configvariable is fetched at line 294 but only used at line 308 to check the model type. This is fine for now, but consider moving the config fetch inside the conditional or using it for other pattern configurations as well.mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
16-185: LGTM!The pattern class declarations follow a consistent structure with
isMatch,rewrite, and staticcreatefactory methods. The short parameter names (op,p) are idiomatic for this codebase and can be safely ignored per static analysis.
…pdate related patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
mllm/compile/ir/linalg/Attribute.cpp (1)
26-126: Extract duplicated lambda to eliminate 240+ lines of repetition.The
gen_quant_spec_strlambda is defined identically in bothLinalgIRQuantizatonAnnotationAttr::dump()andLinalgIRQuantizatonSpecAttr::dump(), resulting in significant code duplication. Extract this logic to a static helper function or free function in an anonymous namespace.🔎 Proposed refactoring to eliminate duplication
Add a helper function before the class method implementations:
+namespace { +std::string gen_quant_spec_str(const QuantizationSpec::ptr_t& q) { + std::stringstream ss; + ss << "QuantSpec("; + switch (q->type) { + case QuantizationSpecType::kSymPerTensor: { + auto _q = std::static_pointer_cast<QuantizationSpecSymPerTensor>(q); + ss << "SymPerTensor("; + ss << "quant_min: " << _q->quant_min; + ss << ", quant_max: " << _q->quant_max; + ss << ", quant_to_type: " << nameOfType(_q->quant_to_type); + ss << ", scale_type: " << nameOfType(_q->scale_type); + ss << ")"; + break; + } + // ... (include all other cases) + case QuantizationSpecType::kNone: { + ss << "None()"; + break; + } + case QuantizationSpecType::kRaw: { + auto _q = std::static_pointer_cast<QuantizationSpecRaw>(q); + ss << "Raw("; + ss << "type: " << nameOfType(_q->type_); + ss << ")"; + break; + } + } + ss << ", uuid=" << q->uuid << ")"; + return ss.str(); +} +} // namespaceThen replace both lambda definitions with calls to the helper:
void LinalgIRQuantizatonAnnotationAttr::dump(IRPrinter& p) { - auto gen_quant_spec_str = [](const QuantizationSpec::ptr_t& q) -> std::string { - // ... 100 lines of duplicated code - }; - p.print("QuantAnnotation"); // ... rest of methodvoid LinalgIRQuantizatonSpecAttr::dump(IRPrinter& p) { - auto gen_quant_spec_str = [](const QuantizationSpec::ptr_t& q) -> std::string { - // ... 100 lines of duplicated code - }; - p.print("{}", gen_quant_spec_str(spec_));Also applies to: 167-267
♻️ Duplicate comments (2)
mllm/compile/ir/linalg/Attribute.cpp (1)
251-254: Critical: Missing closing parenthesis inkNonecase.The
kNonecase outputs"None("but breaks without emitting the closing parenthesis, producing malformed output like"None(, uuid=...)". The first occurrence of this lambda (line 111) correctly outputs"None()", but this second occurrence still has the bug.🔎 Required fix
case QuantizationSpecType::kNone: { - ss << "None("; + ss << "None()"; break; }mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
365-371: Assertion onrewrite()failure will crash the pass.This was flagged in a past review. Using
MLLM_RT_ASSERT(pattern.second->rewrite(...))causes the pass to abort on a failed rewrite. Consider logging a warning and continuing to process other ops, or document that failing rewrites are considered fatal errors.
🧹 Nitpick comments (7)
mllm/compile/ir/linalg/Attribute.cpp (2)
144-150: Consider usingsize_tfor the loop counter.
weight_idxis declared asintbut is compared againstannotation_.weights.size(), which returnssize_t. While not causing a bug here, usingsize_twould avoid signed/unsigned comparison warnings and better express intent.🔎 Suggested type adjustment
- int weight_idx = 0; + size_t weight_idx = 0; for (const auto& [name, spec] : annotation_.weights) {
155-158: Unusedctxparameter in build methods.All three
build()methods accept anIRContext* ctxparameter but never use it. If this parameter is reserved for future functionality or API consistency, consider adding a comment or[[maybe_unused]]attribute to document the intent and suppress compiler warnings.🔎 Example with attribute annotation
-LinalgIRQuantizatonAnnotationAttr::ptr_t LinalgIRQuantizatonAnnotationAttr::build(IRContext* ctx) { +LinalgIRQuantizatonAnnotationAttr::ptr_t LinalgIRQuantizatonAnnotationAttr::build([[maybe_unused]] IRContext* ctx) { auto ret = std::make_shared<LinalgIRQuantizatonAnnotationAttr>(); return ret; }Also applies to: 272-281
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (3)
4-11: Missing standard library headers.The header uses
std::vectorandstd::stringin the class declaration but doesn't include<vector>or<string>. While these might be transitively included through other headers, it's best practice to explicitly include what you use.🔎 Proposed fix
#pragma once +#include <string> #include <unordered_map> +#include <vector> #include "mllm/compile/ir/Node.hpp" #include "mllm/compile/passes/Pass.hpp"
194-207: Consider explicitly deleting copy and move operations.The class defines a custom constructor and destructor but doesn't explicitly define or delete copy/move constructors and assignment operators. Since this pass holds registered patterns that likely shouldn't be duplicated, explicitly deleting these operations clarifies intent and prevents accidental copies.
🔎 Proposed fix
class LLMQuantRecipePass final : public ir::Pass { public: LLMQuantRecipePass(); + LLMQuantRecipePass(const LLMQuantRecipePass&) = delete; + LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete; + LLMQuantRecipePass(LLMQuantRecipePass&&) = delete; + LLMQuantRecipePass& operator=(LLMQuantRecipePass&&) = delete; ~LLMQuantRecipePass() override = default;
14-18: Add documentation for the utility function.Per coding guidelines, public APIs should have clear documentation. This function name is descriptive but quite long—a brief comment explaining when/why to use it and what conditions must hold would help maintainers.
🔎 Example documentation
//===----------------------------------------------------------------------===// // Utility functions //===----------------------------------------------------------------------===// +/// Propagates the quantization spec from a single-input op's input to its single output, +/// and sets the op's quant_recipe annotation attribute. Returns false if preconditions +/// (single input/output, input has quant_recipe) are not met. bool shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr(const ir::IRContext::ptr_t& ctx, const ir::linalg::LinalgIROp::ptr_t& op);mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
224-275: EmbeddingOp itself is not marked withquant_recipe.Line 272 sets
quant_recipeonly onnode->nextOp()(the CastTypeOp), but the originalnode(EmbeddingOp) is not marked. WhileisMatchchecksnextOp()->getAttr("quant_recipe")to avoid re-processing, this asymmetry means inspecting the EmbeddingOp alone won't reveal it's been processed. Consider marking both ops for consistency and easier debugging.🔎 Proposed fix
// Attach to quantize node node->nextOp()->setAttr("quant_recipe", annotation_attr); + // Mark embedding op as processed (can reuse annotation or create a marker attribute) + node->setAttr("quant_recipe", annotation_attr); return true;
377-377: Inconsistent return value.Line 348 returns
ir::PASS_RET_SUCCESSbut line 377 returns the literal0. Use the named constant consistently for clarity and maintainability.🔎 Proposed fix
- return 0; + return ir::PASS_RET_SUCCESS;
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hppmllm/compile/ir/linalg/Attribute.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/compile/ir/linalg/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
🧬 Code graph analysis (2)
mllm/compile/ir/linalg/Attribute.cpp (2)
mllm/core/DataTypes.hpp (1)
nameOfType(713-713)mllm/compile/ir/linalg/Attribute.hpp (9)
p(318-318)p(337-337)spec(99-104)spec(129-134)spec(159-164)spec(191-196)spec(225-230)spec(258-262)spec(293-297)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (2)
mllm/compile/ir/linalg/Attribute.hpp (11)
ctx(320-320)ctx(339-339)ctx(341-341)node(27-27)node(27-27)node(322-322)node(322-322)node(343-343)node(343-343)p(318-318)p(337-337)mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
op(17-17)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
[error] 14-14: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 15-15: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
[error] 18-18: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 25-25: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 37-37: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 51-51: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 65-65: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 79-79: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 93-93: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 107-107: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 119-119: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 131-131: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 145-145: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 159-159: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 171-171: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 185-185: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 194-194: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
[error] 202-202: parameter name 'p' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-x86
- GitHub Check: build-macos
- GitHub Check: build-android
🔇 Additional comments (2)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
176-204: LGTM - Past review issue addressed.Both branches now consistently set
view_op->setAttr("quant_recipe", annotation_attr), resolving the previously flagged asymmetry.
380-383:patterns_map usage requires codebase verification.The review claims the
patterns_map is unused and suggests removal. However, this requires confirming whether the map is accessed in other methods, derived classes, or external code that depends on this class. Before removal, verify all usages ofpatterns_across the codebase.
- Introduced new methods in QnnAOTNodeTensor for parsing tensor types, data types, and quantization parameters from IR. - Updated QnnAOTNodeTensor constructor to accept a flag for static weight handling. - Implemented QnnAOTGraph methods for adding operations and compiling the graph. - Enhanced capture methods in QnnAOTEnv for managing AOT graphs and tensors. - Refactored LLMQuantRecipePass to support recursive graph visiting and improved quantization pattern matching. - Updated Elewise visitor patterns to compile operations and register them in the AOT graph. - Removed obsolete methods and streamlined the interface for better clarity and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
102-104: Factory method doesn't passforce_static_weightto constructor.
QnnAOTNodeTensor::create()acceptsforce_static_weightbut doesn't forward it to the constructor.🔎 Proposed fix
static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) { - return std::make_shared<QnnAOTNodeTensor>(v); + return std::make_shared<QnnAOTNodeTensor>(v, force_static_weight); }This fix should be applied in
QnnWrappersAPI.hppat lines 102-104.
♻️ Duplicate comments (3)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (3)
264-265: Missing defensive config validation before nested access.Accessing
config["quant_recipe"]["builtin_llm_pass"]["linear"]without checking if these keys exist can throw an exception. This was flagged in a previous review.
520-538: Missing defensive config validation for Qwen3 pattern registration.Line 535 accesses nested config keys without validation. This was flagged in a previous review.
78-81: TODO stubrewrite()returns true, silently consuming matched ops.This was flagged in a previous review.
🧹 Nitpick comments (5)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
169-229: Tensor type parsing handles cases correctly but has a silent fallthrough.The switch at line 173-174 for
kTensorMemTypes_Startdoes nothing and falls through, leavingret_qnn_tensor_typeasQNN_TENSOR_TYPE_UNDEFINED. Consider adding a log or assertion if this is unexpected.🔎 Optional: Add explicit handling
case kTensorMemTypes_Start: { + MLLM_WARN("Unexpected kTensorMemTypes_Start in parseQnnTensorTypeFromIR"); break; }mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)
127-131: Factory method initializes onlyop_name_, leaving other members uninitialized.The
create()method only setsop_name_. The static analysis correctly flags thatname_,param_scalar,param_tensor,inputs, andoutputsare not initialized. While vectors default-construct to empty,name_will be empty which may be intentional. Consider adding a default constructor that initializes all members.
170-176: Public member variables inQnnAOTGraph.The static analysis flags
is_compiled_,op_node_, andall_tensors_as public. While this may be intentional for convenience, consider using accessors for better encapsulation.mllm/backends/qnn/aot/visitor/Base.hpp (1)
24-31:QnnAOTQuantRecipeBasePatternhas no abstract methods.The class only has default implementations returning false. Consider making it abstract or clarifying its purpose with a comment.
mllm/backends/qnn/aot/visitor/Elewise.cpp (1)
17-46: Implementation correctly uses new capture API.The
compilemethod properly:
- Retrieves environment context
- Validates required attributes
- Captures input/output tensors
- Creates and registers the QNN operation
However, consider adding null checks for the cast results on lines 33-35.
🔎 Optional: Add null checks for casts
// Start to attach! auto i_0 = op->inputs().front()->cast_<ir::tensor::TensorValue>(); auto i_1 = (*(std::next(op->inputs().begin())))->cast_<ir::tensor::TensorValue>(); auto o_0 = op->outputs().front()->cast_<ir::tensor::TensorValue>(); + MLLM_RETURN_FALSE_IF_NOT(i_0 && i_1 && o_0);
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hppexamples/qwen3_qnn_aot/qwen3_qnn_aot.mirmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Elewise.hpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/visitor/Base.hppmllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/visitor/Elewise.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/visitor/Elewise.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cpp
🧠 Learnings (2)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
🧬 Code graph analysis (5)
mllm/backends/qnn/aot/visitor/Base.hpp (3)
mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
writer(16-16)op(14-14)mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
op(17-17)mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
op(17-17)
mllm/backends/qnn/aot/visitor/Elewise.cpp (4)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (16)
isMatch(73-76)isMatch(73-73)isMatch(86-92)isMatch(86-86)isMatch(119-122)isMatch(119-119)isMatch(132-135)isMatch(132-132)isMatch(169-172)isMatch(169-169)isMatch(182-185)isMatch(182-182)isMatch(195-198)isMatch(195-195)isMatch(208-211)isMatch(208-208)mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
op(14-14)writer(16-16)mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
op(17-17)mllm/backends/qnn/aot/QnnWrappersAPI.hpp (5)
qnn_context_name(268-268)qnn_context_name(270-271)qnn_context_name(273-274)create(47-49)create(47-47)
mllm/backends/qnn/aot/visitor/Elewise.hpp (4)
mllm/backends/qnn/aot/visitor/Base.hpp (9)
writer(19-19)writer(19-19)writer(21-21)writer(30-30)writer(30-30)op(17-17)op(17-17)op(28-28)op(28-28)mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
op(17-17)mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
op(17-17)mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
op(17-17)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (17)
QnnAOTNodeTensor(106-106)v(102-104)v(102-102)v(109-109)v(111-111)v(113-113)v(115-115)name(151-151)name(253-253)name(255-255)name(257-257)qnn_op(170-170)qnn_context_name(268-268)qnn_context_name(270-271)qnn_context_name(273-274)create(47-49)create(47-47)mllm/compile/ir/Node.hpp (8)
name(302-302)name(306-306)op(468-468)op(470-470)create(265-300)create(265-265)create(369-401)create(369-369)mllm/backends/qnn/QNNUtils.hpp (4)
name(191-192)name(195-196)create(255-257)create(255-255)mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)
env(26-26)env(26-26)mllm/backends/qnn/aot/visitor/Elewise.cpp (2)
compile(17-46)compile(17-17)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
mllm/backends/qnn/QNNUtils.hpp (2)
name(191-192)name(195-196)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/visitor/Base.hpp
[error] 21-21: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/visitor/Elewise.hpp
[error] 16-16: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
[error] 22-22: all parameters should be named in a function
(readability-named-parameter,-warnings-as-errors)
[error] 24-24: variable name 'rw' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 49-49: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 50-50: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
[error] 106-106: parameter name 'v' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 123-123: constructor does not initialize these fields: name_, op_name_, param_scalar, param_tensor, inputs, outputs
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 174-174: member variable 'is_compiled_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 175-175: member variable 'op_node_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 176-176: member variable 'all_tensors_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 271-271: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-android
- GitHub Check: build-x86
- GitHub Check: build-macos
🔇 Additional comments (10)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)
179-180: Establishes consistent quantization handling for KV cache states.Adding the de-quantization step for
value_statesaligns its precision handling withkey_states(lines 172-173), ensuring both KV components undergo the documented "de-quantization and quantization again" pattern before concatenation. This quantization recipe handling is intentional for QNN AOT compatibility.mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
21-42: LGTM for recursive graph visiting logic.The recursive traversal correctly walks through subgraphs via
CallGraphOpand applies pattern matching. The guard for existingquant_recipeprevents reprocessing.
49-68: Utility function logic is correct.The helper properly validates preconditions, creates annotation attributes, and propagates quant specs from input to output.
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (3)
772-808: Tensor capture and caching logic looks correct.The method properly caches static tensors at context level and normal tensors at graph level, avoiding duplicate creation.
141-167: Constructor properly initializes QNN tensor structure.The v2 tensor initialization is comprehensive and handles the static weight case correctly.
231-321: Data type mapping is comprehensive.The switch covers all expected data types with appropriate QNN mappings. The FIXME at line 278-279 is noted but acceptable for initial implementation.
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
267-276: New capture API methods are well-designed.The new
captureAOTGraph,captureAOTNodeOp, andcaptureQnnAOTNodeTensormethods provide a clean interface for building QNN graphs programmatically.mllm/backends/qnn/aot/visitor/Elewise.hpp (1)
12-21: Interface update aligns with base class change.The
QnnAOTAddPatternnow correctly implements thecompilemethod from the updatedQnnAOTBasePatterninterface.mllm/backends/qnn/aot/visitor/Elewise.cpp (1)
13-15: Match condition includesusing_qnnattribute check.The pattern only matches AddOps that have been marked for QNN processing, preventing unintended matches.
mllm/backends/qnn/aot/visitor/Base.hpp (1)
13-22: Interface change fromaddNodetocompileis properly implemented across all derived classes.The new
compile(ir::IRWriter& writer, const ir::op_ptr_t& op)signature provides a cleaner interface that aligns with the writer-based compilation approach. All derived classes (e.g.,QnnAOTAddPatternin Elewise.hpp) have been updated to implement this method, and there are no remaining references to the old interface.
| Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR(const ir::tensor::TensorValue::ptr_t& v) { | ||
| Qnn_QuantizeParams_t ret = QNN_QUANTIZE_PARAMS_INIT; | ||
|
|
||
| MLLM_RT_ASSERT(v->getAttr("quant_recipe")); | ||
| auto quant_spec = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_; | ||
|
|
||
| switch (quant_spec->type) { | ||
| case ir::linalg::QuantizationSpecType::kRaw: { | ||
| break; | ||
| } | ||
| case ir::linalg::QuantizationSpecType::kSymPerTensor: { | ||
| auto cfg = std::static_pointer_cast<ir::linalg::QuantizationSpecSymPerTensor>(quant_spec); | ||
| ret.encodingDefinition = QNN_DEFINITION_DEFINED; | ||
| ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_SCALE_OFFSET; | ||
| ret.scaleOffsetEncoding = Qnn_ScaleOffset_t{.scale = cfg->scale.item<float>(), .offset = 0}; | ||
| break; | ||
| } | ||
| case ir::linalg::QuantizationSpecType::kSymPerChannel: { | ||
| auto cfg = std::static_pointer_cast<ir::linalg::QuantizationSpecSymPerChannel>(quant_spec); | ||
| ret.encodingDefinition = QNN_DEFINITION_DEFINED; | ||
| ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_AXIS_SCALE_OFFSET; | ||
| ret.axisScaleOffsetEncoding = Qnn_AxisScaleOffset_t{ | ||
| .axis = cfg->ch_axis, | ||
| .numScaleOffsets = (uint32_t)v->tensor_.shape()[cfg->ch_axis], | ||
| .scaleOffset = nullptr, // TODO make this correct. | ||
| }; | ||
| break; | ||
| } | ||
| case ir::linalg::QuantizationSpecType::kSymPerBlock: | ||
| case ir::linalg::QuantizationSpecType::kAsymPerTensor: | ||
| case ir::linalg::QuantizationSpecType::kAsymPerChannel: | ||
| case ir::linalg::QuantizationSpecType::kAsymPerBlock: { | ||
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "Can't handle [kSymPerBlock, kAsymPerTensor, kAsymPerChannel, kAsymPerBlock] type"); | ||
| } | ||
| case ir::linalg::QuantizationSpecType::kLPBQ: { | ||
| ret.encodingDefinition = QNN_DEFINITION_DEFINED; | ||
| ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_BLOCKWISE_EXPANSION; | ||
| ret.blockwiseExpansion = nullptr; // TODO | ||
| break; | ||
| } | ||
| default: { | ||
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "Can't handle kNone type"); | ||
| } | ||
| } | ||
|
|
||
| return ret; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseQnnQuantizeParamFromIR asserts quant_recipe exists but calling code may not guarantee this.
Line 328 uses MLLM_RT_ASSERT which will crash if quant_recipe is missing. Consider returning a default or handling gracefully.
🔎 Proposed fix: return default params if no quant_recipe
Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR(const ir::tensor::TensorValue::ptr_t& v) {
Qnn_QuantizeParams_t ret = QNN_QUANTIZE_PARAMS_INIT;
- MLLM_RT_ASSERT(v->getAttr("quant_recipe"));
+ if (!v->getAttr("quant_recipe")) {
+ return ret; // Return default params if no quant_recipe
+ }
auto quant_spec = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 325-371, the code
unconditionally asserts that v->getAttr("quant_recipe") exists which can crash
when callers don't provide it; instead check if v->getAttr("quant_recipe") is
null and if so return the default-initialized Qnn_QuantizeParams_t (ret)
immediately (optionally emit a debug/info log), otherwise cast and proceed as
existing; replace the MLLM_RT_ASSERT with a null-check and early return to
handle missing quant_recipe gracefully.
| bool QnnAOTGraph::compile() { | ||
| if (is_compiled_) { return true; } | ||
| // TODO | ||
|
|
||
| is_compiled_ = true; | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compile() is a stub that only sets is_compiled_ to true.
The method doesn't perform actual compilation. This may cause issues if code expects compiled graphs.
| static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) { | ||
| return std::make_shared<QnnAOTNodeTensor>(v); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factory method doesn't forward force_static_weight to constructor.
The create() method accepts force_static_weight but ignores it when calling the constructor.
🔎 Proposed fix
static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) {
- return std::make_shared<QnnAOTNodeTensor>(v);
+ return std::make_shared<QnnAOTNodeTensor>(v, force_static_weight);
}🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.hpp around lines 102 to 104, the static
factory method accepts force_static_weight but does not pass it to the
QnnAOTNodeTensor constructor; update the create() implementation to forward the
force_static_weight argument to the constructor (or call the correct constructor
overload that accepts it) so the created QnnAOTNodeTensor is constructed with
the intended static-weight behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (8)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
535-537: Add defensive validation for config access to prevent runtime exceptions.Line 535 directly accesses
config["quant_recipe"]["builtin_llm_pass"]["model"]without validating key existence. The nlohmann::json library throws exceptions when keys are missing. Other code in this codebase (AOTPipeline.cpp:17-18, MarkTensorIO.cpp:22) defensively uses.contains()before access.🔎 Recommended defensive check
- if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") { + if (config.contains("quant_recipe") && + config["quant_recipe"].contains("builtin_llm_pass") && + config["quant_recipe"]["builtin_llm_pass"].contains("model") && + config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") { addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100); }
264-264: Add defensive validation for nested config access.Line 264 directly accesses
config["quant_recipe"]["builtin_llm_pass"]["linear"]without validating key existence. The nlohmann::json library throws exceptions for missing keys. Other code in the codebase (AOTPipeline.cpp, MarkTensorIO.cpp) uses.contains()to defensively check keys before access.🔎 Recommended defensive check
+ if (!config.contains("quant_recipe") || + !config["quant_recipe"].contains("builtin_llm_pass") || + !config["quant_recipe"]["builtin_llm_pass"].contains("linear")) { + MLLM_WARN("Missing required config keys for linear quantization"); + return false; + } auto config = AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"]["linear"];mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)
141-168: Direct access toquant_recipeattribute without null check.Line 144 unconditionally accesses
v->getAttr("quant_recipe")and casts it without verifying the attribute exists. Ifquant_recipeis missing, this will crash. Additionally, line 154 callsparseQnnQuantizeParamFromIR(v)which also asserts onquant_recipe(line 329), creating redundant checks. Consider consolidating the attribute access and adding a null check at the beginning of the constructor.🔎 Proposed fix: Check for quant_recipe once and handle gracefully
QnnAOTNodeTensor::QnnAOTNodeTensor(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight) { name_ = v->name(); mllm_tensor_ = v->tensor_; - quant_spec_ = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_; + auto quant_attr = v->getAttr("quant_recipe"); + if (quant_attr) { + quant_spec_ = quant_attr->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_; + } for (auto s : v->tensor_.shape()) { shape_.emplace_back(s); }
326-406:parseQnnQuantizeParamFromIRassertsquant_recipeexists but calling code may not guarantee this.Line 329 uses
MLLM_RT_ASSERTwhich will crash ifquant_recipeis missing. This is the same attribute accessed in the constructor (line 144). Consider returning a defaultQnn_QuantizeParams_tif the attribute is missing, or checking at a higher level before construction.🔎 Proposed fix: Return default params if no quant_recipe
Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR(const ir::tensor::TensorValue::ptr_t& v) { Qnn_QuantizeParams_t ret = QNN_QUANTIZE_PARAMS_INIT; - MLLM_RT_ASSERT(v->getAttr("quant_recipe")); + if (!v->getAttr("quant_recipe")) { + return ret; // Return default params if no quant_recipe + } auto quant_spec = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
466-486:addOperationhas multiple TODOs and doesn't actually add the op to QNN.The method constructs
qnn_op_configbut never calls the QNN interface to register the operation with the backend. Lines 477-483 contain TODOs for params, inputs, outputs, validations, and graph registration. This leaves operations in an incomplete state.Would you like me to help implement the operation registration logic or open an issue to track this?
488-494:compile()is a stub that only setsis_compiled_to true.The method doesn't perform actual graph compilation or validation. This may cause issues if downstream code expects a fully compiled graph.
795-798:captureAOTGraphalways returns nullptr.The stub implementation returns nullptr, which will cause null pointer dereferences in callers expecting a valid
QnnAOTGraph::ptr_t.mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
103-105: Factory method doesn't forwardforce_static_weightto constructor.The
create()method acceptsforce_static_weightbut ignores it when calling the constructor (line 104). This means the parameter has no effect, breaking the intended static weight functionality.🔎 Proposed fix
static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) { - return std::make_shared<QnnAOTNodeTensor>(v); + return std::make_shared<QnnAOTNodeTensor>(v, force_static_weight); }
🧹 Nitpick comments (4)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
21-23: Name the anonymous function parameter for clarity.Line 22's
patterns_w_priority_parameter is unnamed in the function signature, which reduces readability and violates the coding guidelines requiring all parameters to be named.🔎 Proposed fix
-void recursiveVisitGraph(const ir::IRContext::ptr_t& ctx, - const std::vector<std::pair<int, ir::Pattern::ptr_t>>& patterns_w_priority_, - const ir::graph::SubGraphOp::ptr_t& sub_graph_ir) { +void recursiveVisitGraph(const ir::IRContext::ptr_t& ctx, + const std::vector<std::pair<int, ir::Pattern::ptr_t>>& patterns_with_priority, + const ir::graph::SubGraphOp::ptr_t& sub_graph_ir) {Then update references inside the function accordingly.
As per coding guidelines, Clang requires all parameters to be named.
284-297: Consider caching compiled regex patterns to avoid repeated compilation.The loop compiles a regex for each config key on every Linear operation processed (line 288). For models with many Linear ops, this repeated compilation could impact performance.
🔎 Performance optimization
Consider caching compiled patterns in a static map initialized once, or compile patterns during pass construction rather than per-op processing.
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
807-843: Consider defensive checks for static weight determination.Lines 814-816 determine
__qnn_enable_static_weightbased on memory type checks. The logic appears correct, but the compound condition(v->tensor_.memType() <= kParams_End && v->tensor_.memType() >= kParams_Start)assumes the enum ordering is contiguous. If the enum definition changes or additional memory types are inserted, this could break.🔎 More robust static weight check
// Check if this value want static qnn weight. The static qnn weight will be shared through one context in diff graphs! - if (v->tensor_.memType() == kGlobal || (v->tensor_.memType() <= kParams_End && v->tensor_.memType() >= kParams_Start)) { + bool is_param_type = (v->tensor_.memType() == kParamsMMAP || + v->tensor_.memType() == kParamsNormal); + if (v->tensor_.memType() == kGlobal || is_param_type) { __qnn_enable_static_weight = true; }mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
162-162: Consider encapsulating public data members.Static analysis flags public members (
name_on line 162,is_compiled_,op_node_,all_tensors_on lines 182-184) as violating encapsulation. While the current struct-like design may be intentional for AOT tooling, exposing internal state increases coupling and makes it harder to enforce invariants. If the codebase evolves toward stricter encapsulation, consider adding accessor methods.Based on coding guidelines recommending adherence to best practices like the Google C++ Style Guide.
Also applies to: 182-184
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
mllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/QnnWrappersAPI.hpp
🧠 Learnings (3)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
[error] 22-22: all parameters should be named in a function
(readability-named-parameter,-warnings-as-errors)
[error] 24-24: variable name 'rw' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 49-49: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 50-50: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
[error] 107-107: parameter name 'v' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 128-128: constructor does not initialize these fields: name_, op_name_, param_scalar, param_tensor, inputs, outputs, unreachable_handle_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 174-174: constructor does not initialize these fields: op_node_, all_tensors_, graph_name_, belongs_context_name_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 182-182: member variable 'is_compiled_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 183-183: member variable 'op_node_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 184-184: member variable 'all_tensors_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 279-279: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-macos
- GitHub Check: build-android
- GitHub Check: build-x86
🔇 Additional comments (6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
28-33: Verify that aborting on rewrite failure is the intended behavior.Line 30 uses
MLLM_RT_ASSERT(pattern.second->rewrite(iw, some_op)), which will abort execution if any pattern'srewritereturns false. Given that several patterns have TODO stubs that may return true/false inconsistently, this could lead to unexpected crashes during development or runtime.If rewrite failures should be non-fatal, consider logging a warning and continuing:
🔎 Alternative approach
if (pattern.second->isMatch(some_op)) { - MLLM_RT_ASSERT(pattern.second->rewrite(iw, some_op)); + if (!pattern.second->rewrite(iw, some_op)) { + MLLM_WARN("Pattern rewrite failed for op, continuing with next pattern"); + } else { - break; + break; + } }
312-312: Handle asymmetric quantization if sym requirement is relaxed.Line 312 returns false when
symis false:MLLM_RETURN_FALSE_IF_NOT(sym). If asymmetric quantization support is planned, this will require additional implementation.Is asymmetric quantization planned for future releases? If so, consider adding a TODO comment indicating the enhancement, or document this limitation.
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
170-230: LGTM!The tensor type parsing logic is comprehensive and correctly maps IR memory types to QNN tensor types. The attribute-based override mechanism (lines 225-227) appropriately prioritizes explicit graph input/output attributes over inferred types.
454-459: LGTM!Simple getter/setter methods for operation naming. The fluent interface pattern (returning
thisfromsetName) is consistent with other methods in the class.mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)
118-126: LGTM!The refactored internal representation improves efficiency by caching frequently accessed fields (
name_,mllm_tensor_,shape_) rather than repeatedly querying the IR. Theunreachable_handle_vector for memory management is a practical approach to ensure cleanup of dynamically allocated QNN structures.
132-136: LGTM!The factory method for
QnnAOTNodeOperationand the name accessors follow consistent patterns. The fluent interface (returningptr_tfromsetName) enables method chaining, which is used throughout the codebase.Also applies to: 156-159
| // FIXME: Maybe error here. | ||
| case kInt4: { | ||
| ret = QNN_DATATYPE_SFIXED_POINT_4; | ||
| break; | ||
| } | ||
| case kUInt4: { | ||
| ret = QNN_DATATYPE_UFIXED_POINT_4; | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n mllm/backends/qnn/aot/QnnWrappersAPI.cpp | head -330 | tail -100Repository: UbiquitousLearning/mllm
Length of output: 3274
🏁 Script executed:
rg "kInt4|kUInt4" --type cpp --type h -B 2 -A 2Repository: UbiquitousLearning/mllm
Length of output: 1338
🏁 Script executed:
web_search
QNN QNN_DATATYPE_SFIXED_POINT_4 UFIXED_POINT_4 Qualcomm neural network fixed point 4-bit datatypeRepository: UbiquitousLearning/mllm
Length of output: 159
🌐 Web query:
QNN Qualcomm SFIXED_POINT_4 UFIXED_POINT_4 datatype specification
💡 Result:
Short answer: In Qualcomm QNN, QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 are 4‑bit signed/unsigned fixed‑point (quantized) tensor formats (two 4‑bit values are packed into one byte when packing is enabled). Representations and semantics:
- SFIXED_POINT_4: 4‑bit signed two’s‑complement integer (range −8..+7) stored as fixed‑point quantized values (dequantize with per‑tensor / per‑channel scale ± zero‑point as provided by the model).
- UFIXED_POINT_4: 4‑bit unsigned integer (range 0..15) stored as fixed‑point quantized values (dequantize with scale/zero‑point).
- Packed storage: when 4‑bit packing is used, two 4‑bit elements are stored per byte (low nibble = first value, high nibble = second). When not packed the values may be stored using 8‑bit containers (i.e., as S/UFIXED_POINT_8). QNN toolchains provide flags such as --pack_4_bit_weights to enable packing.
Sources: Qualcomm QNN toolchain notes and examples (discussion of QNN_DATATYPE_SFIXED_POINT_4 / UFIXED_POINT_4 and 4‑bit packing). [1][2]
Sources
[1] Qualcomm AI Hub / QAIRT job docs (shows QNN support for INT4/4‑bit quantization options).
[2] Implementation notes / community writeup describing 4‑bit packing and QNN_DATATYPE_SFIXED_POINT_4 / QNN_DATATYPE_UFIXED_POINT_4 (packing two 4‑bit values per byte and relation to 8‑bit storage).
Remove the unnecessary FIXME comment on line 279.
The mapping of kInt4/kUInt4 to QNN_DATATYPE_SFIXED_POINT_4/QNN_DATATYPE_UFIXED_POINT_4 is correct per QNN documentation (4-bit signed and unsigned fixed-point formats respectively). The types are actively used in quantization recipes (e.g., "w4a16" in LLMQuantRecipePass.cpp), and the mapping follows the same pattern as the 8-bit and 16-bit quantized variants. The FIXME comment can be safely removed.
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 279 to 286, remove the
unnecessary "// FIXME: Maybe error." comment preceding the kInt4/kUInt4 cases;
the mapping to QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 is
correct so simply delete that FIXME line and keep the case branches as-is
without altering logic or indentation.
- Introduced `VectorInt16Attr` for handling vectors of int16_t attributes. - Added new 4-bit integer types: `mllm_int4_t` and `mllm_uint4_t` with corresponding type info. - Enhanced `Tensor` class with new methods for element-wise addition, subtraction, and multiplication that accept a `DataTypes` parameter. - Updated element-wise operation handling to support `VectorInt16Attr` for int16 tensor constants. - Ensured UUID generation for quantization specifications in `QuantizationSpecAsymPerBlock` and `QuantizationSpecLPBQ`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
♻️ Duplicate comments (2)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
795-798:captureAOTGraphreturns nullptr—implement graph retrieval.The
captureAOTGraphmethod (lines 795-798) always returnsnullptr. Callers expecting a validQnnAOTGraph::ptr_twill encounter null pointer errors. Implement the method to look up the graph fromcontexts_[qnn_context_name]->graphs_[g_name]and return it, or document why this stub is acceptable.🔎 Proposed fix
QnnAOTGraph::ptr_t QnnAOTEnv::captureAOTGraph(const std::string& qnn_context_name, const std::string& g_name) { - // TODO - return nullptr; + MLLM_RT_ASSERT_EQ(contexts_.count(qnn_context_name), 1); + MLLM_RT_ASSERT_EQ(contexts_[qnn_context_name]->graphs_.count(g_name), 1); + return contexts_[qnn_context_name]->graphs_[g_name]; }mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
865-867: Add defensive validation for nested config key access to prevent runtime exceptions.Line 865 directly accesses nested config keys without validation:
config["quant_recipe"]["builtin_llm_pass"]["model"]. The nlohmann::json library throws exceptions when accessing missing keys viaoperator[]. Other code in the codebase (e.g.,AOTPipeline.cpp,MarkTensorIO.cpp) defensively checks keys with.contains()before access. Apply the same pattern here:- if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") { + if (config.contains("quant_recipe") + && config["quant_recipe"].contains("builtin_llm_pass") + && config["quant_recipe"]["builtin_llm_pass"].contains("model") + && config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") { addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100); }
🧹 Nitpick comments (7)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
837-837: Complete the Qwen3 Attention pattern implementation or document deferral.Line 837's TODO comment indicates incomplete implementation. The pattern successfully identifies Q/K/V/O linear operations (lines 814-835) but performs no transformation. This creates a sophisticated stub that matches complex patterns but applies no quantization annotations, which may confuse maintainers.
Do you want me to open an issue to track completing this implementation, or should a comment be added explaining why transformation is deferred?
69-71: Initialize local pointer variables to nullptr for safety.Lines 69 and 71 declare pointer variables without initialization. While they're assigned in all switch branches before use, defensive initialization to
nullptrprevents undefined behavior if the switch logic changes or an unexpected case is added.🔎 Proposed fix
- ir::linalg::LinalgIRQuantizatonSpecAttr::ptr_t ret = nullptr; + ir::linalg::LinalgIRQuantizatonSpecAttr::ptr_t ret = nullptr; // Already initialized - ir::linalg::QuantizationSpec::ptr_t spec = nullptr; + ir::linalg::QuantizationSpec::ptr_t spec = nullptr; // Already initializedNote: These are already initialized correctly. The static analyzer warning can be safely ignored.
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
272-285: Add copy/move constructor/assignment operators or document the rule-of-zero.The
LLMQuantRecipePassclass (line 272) defines a default destructor but lacks copy constructor, copy assignment operator, move constructor, and move assignment operator. Since the class manages STL containers with shared_ptr members, consider explicitly deleting copy operations and defaulting move operations to make ownership semantics clear.🔎 Proposed fix
class LLMQuantRecipePass final : public ir::Pass { public: LLMQuantRecipePass(); + // Explicitly delete copy operations + LLMQuantRecipePass(const LLMQuantRecipePass&) = delete; + LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete; + + // Explicitly default move operations + LLMQuantRecipePass(LLMQuantRecipePass&&) = default; + LLMQuantRecipePass& operator=(LLMQuantRecipePass&&) = default; + ~LLMQuantRecipePass() override = default;examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)
194-195: Extract magic number to a named constant.Line 194 uses the magic number
-20for the masking offset. This value appears to be a minimum attention score for masked positions. Extract it to a named constant to improve readability and maintainability.🔎 Proposed fix
+ // Attention mask offset for masked positions + static constexpr float kMaskedAttentionOffset = -20.0f; + // Masked Softmax auto attn_min = attn.min(-1, true); - float minus_value = -20; - attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min.add(minus_value, kInt16)); + attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min.add(kMaskedAttentionOffset, kInt16));As per coding guidelines: Use named constants instead of magic numbers.
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
488-494: Implement graph compilation or add tracking comment.The
compile()method (lines 488-494) only setsis_compiled_flag without performing actual compilation. The TODO comment suggests this is incomplete. If compilation is deferred intentionally, document the expected behavior; otherwise, implement the QNN graph compilation via the backend interface.Additional context
Typical QNN graph compilation involves calling
qnn_interface_.graphFinalize()to optimize and prepare the graph for execution. If this is intentionally deferred to a later pass, add a detailed comment explaining the design decision.mllm/core/Tensor.hpp (1)
289-291: LGTM - new scalar arithmetic methods with explicitDataTypesparameter.The new
add,sub, andmulmethods provide useful control over the data type of the scalar RHS operand, complementing the existingoperator+,operator-,operator*methods.Consider adding a
div(float rhs, DataTypes data_type = kFloat32)method for symmetry withoperator/(float)if this use case is anticipated.mllm/core/Tensor.cpp (1)
219-226: Consider adding kInt16PerTensorSym support for consistency.The
sub()method's type switch omitskInt16PerTensorSym, while bothadd()(line 182) andmul()(line 268) include it. This matches the existing inconsistency inoperator-(line 198-205), but for consistency and completeness, consider adding this case unless it's intentionally excluded.🔎 Proposed addition
switch (data_type) { case kFloat32: *(rhs_tensor.ptr<float>()) = rhs; break; case kFloat16: *(rhs_tensor.ptr<half_float::half>()) = half_float::half(rhs); break; case kInt32: *(rhs_tensor.ptr<int32_t>()) = rhs; break; case kInt16: *(rhs_tensor.ptr<int16_t>()) = rhs; break; case kInt8: *(rhs_tensor.ptr<int8_t>()) = rhs; break; + case kInt16PerTensorSym: *(rhs_tensor.ptr<int16_t>()) = rhs; break; default: NYI("Type is not supported"); break; }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hppexamples/qwen3_qnn_aot/qwen3_qnn_aot.mirexamples/qwen3_qnn_aot/qwen3_qnn_aot_quant_recipe.mirmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hppmllm/compile/ir/GeneratedRTTIKind.hppmllm/compile/ir/NodeRTTIClassOfImpl.hppmllm/compile/ir/builtin/Attribute.cppmllm/compile/ir/builtin/Attribute.hppmllm/compile/ir/linalg/Attribute.hppmllm/compile/ir/rtti_kind_gen.pymllm/core/DataTypes.cppmllm/core/DataTypes.hppmllm/core/Tensor.cppmllm/core/Tensor.hppmllm/core/aops/ElewiseOps.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
- mllm/compile/ir/rtti_kind_gen.py
- mllm/compile/ir/GeneratedRTTIKind.hpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/compile/ir/builtin/Attribute.hppmllm/compile/ir/linalg/Attribute.hppmllm/core/DataTypes.hppmllm/core/aops/ElewiseOps.cppmllm/core/DataTypes.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/core/Tensor.hppmllm/compile/ir/NodeRTTIClassOfImpl.hppmllm/core/Tensor.cppmllm/compile/ir/builtin/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/compile/ir/builtin/Attribute.hppmllm/compile/ir/linalg/Attribute.hppmllm/core/DataTypes.hppmllm/core/aops/ElewiseOps.cppmllm/core/DataTypes.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/core/Tensor.hppmllm/compile/ir/NodeRTTIClassOfImpl.hppmllm/core/Tensor.cppmllm/compile/ir/builtin/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/core/aops/ElewiseOps.cppmllm/core/DataTypes.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/core/Tensor.cppmllm/compile/ir/builtin/Attribute.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/compile/ir/builtin/Attribute.hppmllm/compile/ir/linalg/Attribute.hppmllm/core/DataTypes.hppmllm/core/aops/ElewiseOps.cppmllm/core/DataTypes.cppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/core/Tensor.hppmllm/compile/ir/NodeRTTIClassOfImpl.hppmllm/core/Tensor.cppmllm/compile/ir/builtin/Attribute.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
🧠 Learnings (3)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
🧬 Code graph analysis (7)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (2)
mllm/nn/Functional.hpp (1)
where(161-161)mllm/nn/Functional.cpp (2)
where(203-206)where(203-203)
mllm/compile/ir/builtin/Attribute.hpp (1)
mllm/compile/ir/builtin/Attribute.cpp (14)
VectorInt16Attr(107-107)VectorInt16Attr(109-109)data(23-23)data(23-23)data(39-39)data(39-39)data(53-53)data(53-53)data(85-85)data(85-85)data(99-99)data(99-99)data(113-113)data(113-113)
mllm/compile/ir/linalg/Attribute.hpp (1)
mllm/compile/ir/linalg/Attribute.cpp (3)
LinalgIRQuantizatonSpecAttr(160-160)LinalgIRQuantizatonSpecAttr(162-162)LinalgIRQuantizatonSpecAttr(164-164)
mllm/core/Tensor.hpp (1)
mllm/core/Tensor.cpp (16)
rhs(126-128)rhs(126-126)rhs(130-132)rhs(130-130)rhs(134-136)rhs(134-134)rhs(138-140)rhs(138-138)rhs(148-168)rhs(148-148)rhs(192-211)rhs(192-192)rhs(234-254)rhs(234-234)rhs(278-297)rhs(278-278)
mllm/core/Tensor.cpp (4)
mllm/core/Tensor.hpp (17)
rhs(272-272)rhs(273-273)rhs(274-274)rhs(275-275)rhs(278-278)rhs(283-283)rhs(284-284)rhs(285-285)rhs(286-286)rhs(289-289)rhs(290-290)rhs(291-291)rhs(296-296)rhs(298-298)rhs(300-300)rhs(302-302)device(399-399)mllm/engine/Context.hpp (1)
device(31-31)mllm/core/TensorViewImpl.cpp (2)
device(10-10)device(10-10)mllm/engine/Context.cpp (2)
instance(14-17)instance(14-14)
mllm/compile/ir/builtin/Attribute.cpp (1)
mllm/compile/ir/builtin/Attribute.hpp (15)
VectorInt16Attr(158-158)VectorInt16Attr(160-160)VectorInt16Attr(162-162)p(42-42)p(82-82)p(100-100)p(120-120)p(142-142)p(164-164)data(40-40)data(62-62)data(80-80)data(124-124)data(146-146)data(168-168)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
LLMQuantRecipePass(845-868)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
[error] 23-23: all parameters should be named in a function
(readability-named-parameter,-warnings-as-errors)
[error] 26-26: variable name 'rw' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 67-67: 2 adjacent parameters of 'genSimpleQuantizationSpecAttr' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 69-69: variable 'ret' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 71-71: variable 'spec' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 114-114: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 115-115: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 135-135: 2 adjacent parameters of 'noSharingSingleInAndSingleOutQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 135-135: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 145-145: variable 'quant_spec_i0' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 146-146: variable 'quant_spec_o0' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
[error] 24-24: 2 adjacent parameters of 'makeRotaryPosEmbedding' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 194-194: 20 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
mllm/compile/ir/builtin/Attribute.hpp
[error] 154-154: class 'VectorInt16Attr' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
[error] 164-164: parameter name 'p' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/compile/ir/linalg/Attribute.hpp
[error] 329-329: class 'LinalgIRQuantizatonSpecAttr' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
[error] 341-341: parameter name 'p' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
[error] 340-340: function-like macro 'RTTI_RK_ATTR_LINALGIRATTR_QUANTIZATIONSPEC_IMPL' used; consider a 'constexpr' template function
(cppcoreguidelines-macro-usage,-warnings-as-errors)
[error] 369-369: function-like macro 'RTTI_RK_ATTR_BUILTINIRATTR_VECTORINT16ATTR_IMPL' used; consider a 'constexpr' template function
(cppcoreguidelines-macro-usage,-warnings-as-errors)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
[error] 28-28: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 30-30: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 37-37: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 51-51: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 63-63: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 77-77: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 91-91: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 103-103: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 115-115: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 129-129: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 143-143: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 157-157: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 171-171: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 185-185: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 197-197: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 209-209: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 223-223: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 237-237: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 249-249: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 263-263: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 272-272: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
[error] 280-280: parameter name 'p' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-android
- GitHub Check: build-macos
- GitHub Check: build-x86
🔇 Additional comments (8)
mllm/compile/ir/NodeRTTIClassOfImpl.hpp (1)
340-342: LGTM - new RTTI macros follow established patterns.The new macros for
QuantizationSpecandVectorInt16Attrare consistent with the existing RTTI infrastructure. Since this is auto-generated code, the static analysis hints about preferringconstexprtemplate functions can be safely disregarded.Also applies to: 369-371
mllm/compile/ir/builtin/Attribute.cpp (1)
107-119: LGTM -VectorInt16Attrimplementation follows established pattern.The implementation mirrors
VectorFP32Attrconsistently, including constructors,dump,data()accessor, andbuild()factory method.mllm/compile/ir/builtin/Attribute.hpp (1)
154-174: LGTM -VectorInt16Attrdeclaration follows existing patterns.The class mirrors
VectorFP32Attrexactly. The static analysis warnings about Rule of Five and short parameter namepcan be disregarded since:
- The same pattern is used by
VectorFP32Attrand other attribute classes - theshared_ptrsemantics ofDEFINE_SPECIFIC_IR_CLASShandle proper copying.- The parameter name
pforIRPrinteris used consistently across alldumpmethods in this file.mllm/core/aops/ElewiseOps.cpp (1)
42-54: LGTM - type-specific constant attribute creation.The switch statement appropriately handles
kFloat32andkInt16constant types with corresponding attribute classes, and logs a warning for unsupported types. This provides a clear path for extending support to additional types in the future.mllm/core/DataTypes.hpp (1)
588-593: LGTM - type info registrations and macros for 4-bit types.The
MLLM_DEFINE_BASIC_TYPE_INFO,MLLM_DEFINE_SELF_TYPE_INFO, and type alias macros are correctly defined for the new 4-bit types, with appropriate min/max values (signed: -8 to 7, unsigned: 0 to 15).Also applies to: 707-708, 771-772
mllm/compile/ir/linalg/Attribute.hpp (3)
65-77: LGTM -QuantizationSpecRawfollows established spec pattern.The new
QuantizationSpecRawstruct correctly inherits fromQuantizationSpec, sets the type tokRaw, and generates a UUID. The factory methods (create(DataTypes)andcreate()) mirror the pattern used by other spec types.
247-247: LGTM - UUID generation added toQuantizationSpecAsymPerBlockandQuantizationSpecLPBQ.Adding UUID generation to the
create()methods ensures these specs have unique identifiers, consistent with other quantization spec types.Also applies to: 262-262, 283-283, 299-299
329-348: LGTM -LinalgIRQuantizatonSpecAttrfollows existing attribute patterns.The new class mirrors
LinalgIRQuantizatonAnnotationAttrin structure. The static analysis warnings about Rule of Five and short parameter namepcan be disregarded per established codebase conventions (same pattern used by other IR attribute classes).
| void QnnAOTGraph::addOperation(const QnnAOTNodeOperation::ptr_t& qnn_op) { | ||
| auto env = AOTCompileContext::getInstance().getEnv(); | ||
| auto qnn_interface = env->getFuncSymbol().qnn_interface_; | ||
|
|
||
| Qnn_OpConfig_t qnn_op_config = QNN_OPCONFIG_INIT; | ||
| qnn_op_config.version = QNN_OPCONFIG_VERSION_1; | ||
| qnn_op_config.v1 = QNN_OPCONFIG_V1_INIT; | ||
| qnn_op_config.v1.name = qnn_op->name_.c_str(); | ||
| qnn_op_config.v1.packageName = qnn_op->package_name_.c_str(); | ||
| qnn_op_config.v1.typeName = qnn_op->op_name_.c_str(); | ||
|
|
||
| // TODO PARAMs | ||
| // TODO Inputs | ||
| // TODO Outputs | ||
|
|
||
| // TODO node validations | ||
|
|
||
| // TODO add node to graph. | ||
|
|
||
| op_node_.insert({qnn_op->getName(), qnn_op}); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete the QNN operation addition or document the stub.
The addOperation method (lines 466-486) constructs a Qnn_OpConfig_t and populates basic fields but has TODO comments for critical parts (lines 477-483): parameters, inputs, outputs, validation, and actually adding the node to the QNN graph. The method stores the operation locally but never registers it with the QNN backend, making this a non-functional stub.
Would you like me to help implement the QNN operation addition logic, or should this be tracked in a separate issue?
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 466-486, implement the
TODOs: build the full Qnn_OpConfig_t by converting the qnn_op's parameters into
QNN parameter entries, map its inputs and outputs into QNN tensor
descriptors/array entries, and set any required versioned fields (sizes/counts)
instead of leaving them empty; perform node-level validations (required fields
present, compatible shapes/dtypes) and return/raise errors on failure; call the
QNN backend API via qnn_interface (use the appropriate graph add/register
function on the interface, pass the populated Qnn_OpConfig_t and capture the
resulting node handle/ID), check the backend return status and log or propagate
errors, and only then store the operation in op_node_ (or update the stored
entry with the backend node handle). Ensure any temporary C strings or buffers
referenced by Qnn_OpConfig_t remain valid for the call (allocate/copy if
necessary) and free them on error/cleanup.
| CASE(kInt4) | ||
| CASE(kUInt4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, check if lanesOfType is called with kInt4/kUInt4 anywhere
rg -n 'lanesOfType.*kInt4|lanesOfType.*kUInt4' --type=cppRepository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Check generic usages of lanesOfType that might pass kInt4/kUInt4
rg -n 'lanesOfType\s*\(' --type=cpp -B 2 -A 3Repository: UbiquitousLearning/mllm
Length of output: 13068
🏁 Script executed:
# Examine the actual lanesOfType function implementation and surrounding context
fd DataTypes.cpp --type f | xargs cat -n | sed -n '1,200p'Repository: UbiquitousLearning/mllm
Length of output: 5688
🏁 Script executed:
# Check if kInt4/kUInt4 are used elsewhere in the codebase in ways that might call lanesOfType
rg -n 'kInt4|kUInt4' --type=cpp -B 1 -A 1 | head -100Repository: UbiquitousLearning/mllm
Length of output: 1941
Missing kInt4/kUInt4 cases in lanesOfType function.
The new 4-bit types are added to bytesOfType and nameOfType, but lanesOfType (lines 9–56) lacks corresponding CASE entries. Since these types have MllmDataTypeInfo specializations defined (DataTypes.hpp lines 707–708), calling lanesOfType(kInt4) or lanesOfType(kUInt4) will trigger the NYI("Unknown data type") path instead of returning valid lane counts.
Add CASE(kInt4) and CASE(kUInt4) to lanesOfType to maintain consistency and prevent runtime failures.
🤖 Prompt for AI Agents
In mllm/core/DataTypes.cpp around lines 100–101, the lanesOfType function is
missing CASE(kInt4) and CASE(kUInt4) entries; add CASE(kInt4) and CASE(kUInt4)
to the switch so those enum values return the correct lane counts (use the same
pattern as the other integer TYPE cases and reference the MllmDataTypeInfo
specializations defined in DataTypes.hpp for the 4-bit types) to avoid falling
through to NYI("Unknown data type").
| Tensor Tensor::add(float rhs, DataTypes data_type) { | ||
| auto rhs_tensor = Tensor::empty({1}, data_type, device()).alloc(); | ||
| if (device() != kCPU) { | ||
| Context::instance().buildOpAndSubmitTask( | ||
| OpTypes::kFill, aops::FillOpOptions{.type = aops::FillOpTypes::kSpecific, .value = rhs}, {rhs_tensor}); | ||
| } else { | ||
| switch (data_type) { | ||
| case kFloat32: *(rhs_tensor.ptr<float>()) = rhs; break; | ||
| case kFloat16: *(rhs_tensor.ptr<half_float::half>()) = half_float::half(rhs); break; | ||
| case kInt32: *(rhs_tensor.ptr<int32_t>()) = rhs; break; | ||
| case kInt16: *(rhs_tensor.ptr<int16_t>()) = rhs; break; | ||
| case kInt8: *(rhs_tensor.ptr<int8_t>()) = rhs; break; | ||
| case kInt16PerTensorSym: *(rhs_tensor.ptr<int16_t>()) = rhs; break; | ||
| default: NYI("Type is not supported"); break; | ||
| } | ||
| } | ||
| auto opts = aops::AddOpOptions{}; | ||
| opts.setInputsConstant(0, 0); | ||
| opts.setInputsConstant(1, 1); | ||
| return Context::instance().buildOpAndSubmitTask(OpTypes::kAdd, opts, {*this, rhs_tensor})[0]; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Consider extracting common scalar operation logic to reduce duplication.
The add(), sub(), and mul() methods duplicate significant logic from their corresponding operators (operator+, operator-, operator*). Each method repeats ~21 lines of tensor creation, type switching, and option configuration.
Consider refactoring to extract a common helper function that handles scalar-to-tensor conversion with configurable type, then use it in both the operators and these new methods.
🔎 Example refactoring approach
// Helper to convert scalar to tensor with specified type
Tensor Tensor::scalarToTensor(float value, DataTypes data_type) {
auto rhs_tensor = Tensor::empty({1}, data_type, device()).alloc();
if (device() != kCPU) {
Context::instance().buildOpAndSubmitTask(
OpTypes::kFill, aops::FillOpOptions{.type = aops::FillOpTypes::kSpecific, .value = value}, {rhs_tensor});
} else {
switch (data_type) {
case kFloat32: *(rhs_tensor.ptr<float>()) = value; break;
case kFloat16: *(rhs_tensor.ptr<half_float::half>()) = half_float::half(value); break;
case kInt32: *(rhs_tensor.ptr<int32_t>()) = value; break;
case kInt16: *(rhs_tensor.ptr<int16_t>()) = value; break;
case kInt8: *(rhs_tensor.ptr<int8_t>()) = value; break;
case kInt16PerTensorSym: *(rhs_tensor.ptr<int16_t>()) = value; break;
default: NYI("Type is not supported"); break;
}
}
return rhs_tensor;
}
Tensor Tensor::add(float rhs, DataTypes data_type) {
auto rhs_tensor = scalarToTensor(rhs, data_type);
auto opts = aops::AddOpOptions{};
opts.setInputsConstant(0, 0);
opts.setInputsConstant(1, 1);
return Context::instance().buildOpAndSubmitTask(OpTypes::kAdd, opts, {*this, rhs_tensor})[0];
}Then apply the same pattern to sub(), mul(), and potentially update the operator overloads to use this helper.
Also applies to: 213-232, 256-276
🧩 Analysis chain
🏁 Script executed:
# First, find the file and confirm the code location
find . -name "Tensor.cpp" -type f | head -5Repository: UbiquitousLearning/mllm
Length of output: 89
🏁 Script executed:
# Search for usage patterns of these methods with explicit data_type arguments
rg -n '\.add\(.*,\s*k[A-Z]' --type=cpp -A 2 -B 2Repository: UbiquitousLearning/mllm
Length of output: 592
🏁 Script executed:
# Look for similar sub() and mul() methods
rg -n '\.sub\(.*,\s*k[A-Z]|\.mul\(.*,\s*k[A-Z]' --type=cpp -A 2 -B 2Repository: UbiquitousLearning/mllm
Length of output: 463
🏁 Script executed:
# Check operation backend definitions for type handling
rg -n 'AddOp|SubOp|MulOp' mllm --type=cpp | head -20Repository: UbiquitousLearning/mllm
Length of output: 2168
🏁 Script executed:
# Look for validation logic in Tensor class
rg -n 'buildOpAndSubmitTask.*Add|buildOpAndSubmitTask.*Sub|buildOpAndSubmitTask.*Mul' --type=cpp -A 5 -B 2Repository: UbiquitousLearning/mllm
Length of output: 6330
🏁 Script executed:
# Check if there's any type validation or casting logic
rg -n 'dtype\(\)|data_type' mllm/core/Tensor.cpp | head -20Repository: UbiquitousLearning/mllm
Length of output: 1083
🏁 Script executed:
# Find and examine AddOp, SubOp, MulOp implementations
find . -name "*Op.hpp" -o -name "*Op.cpp" | grep -E "Add|Sub|Mul" | head -10Repository: UbiquitousLearning/mllm
Length of output: 446
🏁 Script executed:
# Look at the operation backend headers/implementations
find mllm/core/aops -name "*.hpp" -o -name "*.cpp" | head -20Repository: UbiquitousLearning/mllm
Length of output: 646
🏁 Script executed:
# Check if there's type compatibility logic in the operation system
rg -n 'typeCompatible|validateType|checkType|dtype.*dtype' mllm/core/aops --type=cpp --type=hpp | head -20Repository: UbiquitousLearning/mllm
Length of output: 94
🏁 Script executed:
# Look for any type casting or broadcasting in arithmetic operations
rg -n 'cast|broadcast|promote.*type' mllm/core/aops --type=cpp --type=hpp -i | head -20Repository: UbiquitousLearning/mllm
Length of output: 94
🏁 Script executed:
# Examine the full context of how add method is declared in header
rg -n 'Tensor::add\(float' mllm/core --type=hpp -B 2 -A 5Repository: UbiquitousLearning/mllm
Length of output: 94
🏁 Script executed:
# Check the actual file size and see more context around these methods
wc -l mllm/core/Tensor.cppRepository: UbiquitousLearning/mllm
Length of output: 91
🏁 Script executed:
# Find AddOp, SubOp, MulOp implementations
find mllm/core/aops -name "*AddOp*" -o -name "*SubOp*" -o -name "*MulOp*"Repository: UbiquitousLearning/mllm
Length of output: 122
🏁 Script executed:
# Look at the add method declaration in Tensor.h
find mllm -name "Tensor.h" -o -name "Tensor.hpp" | xargs cat | grep -A 5 "Tensor add"Repository: UbiquitousLearning/mllm
Length of output: 357
🏁 Script executed:
# Check if there are any type validation or compatibility checks in the tensor operations
rg 'dtype\(\).*!=.*data_type|data_type.*!=.*dtype' mllm --type-list | head -5Repository: UbiquitousLearning/mllm
Length of output: 456
🏁 Script executed:
# Look for type compatibility in the backend operations
find mllm -name "*.h" -o -name "*.hpp" | xargs grep -l "AddOp\|SubOp\|MulOp" 2>/dev/null | head -5Repository: UbiquitousLearning/mllm
Length of output: 272
🏁 Script executed:
# Examine the AddOp implementation
find mllm/core/aops -name "AddOp.cpp" -o -name "AddOp.h"Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Get the full context of the add() method and nearby methods
sed -n '148,280p' mllm/core/Tensor.cppRepository: UbiquitousLearning/mllm
Length of output: 6076
🏁 Script executed:
# Check the ElewiseOps implementations to see if they handle mixed types
head -100 mllm/backends/cpu/ops/ElewiseOps.hppRepository: UbiquitousLearning/mllm
Length of output: 3068
🏁 Script executed:
# Look for any type broadcasting or casting logic in the elewise operations
grep -n 'dtype\|type.*cast\|promote' mllm/backends/cpu/ops/ElewiseOps.hpp | head -20Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Check if there's documentation about type compatibility in AddOp, SubOp, MulOp
find mllm/core/aops -name "*.hpp" | xargs grep -l "AddOp\|SubOp\|MulOp"Repository: UbiquitousLearning/mllm
Length of output: 124
🏁 Script executed:
# Look at the actual implementation or at least the class definition
sed -n '1,50p' mllm/core/aops/MatMulOp.hppRepository: UbiquitousLearning/mllm
Length of output: 1710
🏁 Script executed:
# Search for any tests that show valid type combinations for add/sub/mul
find . -name "*test*" -type f | xargs grep -l "\.add\(.*k[A-Z]\|\.sub\(.*k[A-Z]\|\.mul\(.*k[A-Z]" 2>/dev/null | head -5Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Check if there's a pattern for mixed-type operations - look at the Qwen example more carefully
sed -n '188,200p' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hppRepository: UbiquitousLearning/mllm
Length of output: 555
🏁 Script executed:
# Look at the ElewiseOps.hpp to see AddOp, SubOp, MulOp base class definitions
head -200 mllm/core/aops/ElewiseOps.hppRepository: UbiquitousLearning/mllm
Length of output: 3196
🏁 Script executed:
# Check CPU implementation of AddOp to see if there's type handling
find mllm/backends/cpu/ops -name "*.cpp" | xargs grep -l "CPUAddOp\|CPUSubOp\|CPUMulOp" | head -1 | xargs head -300Repository: UbiquitousLearning/mllm
Length of output: 12943
🏁 Script executed:
# Look for any comments or documentation about type compatibility
rg -n 'type.*compati|mixed.*type|broadcast.*type' mllm/core --type=cpp | head -10Repository: UbiquitousLearning/mllm
Length of output: 208
🏁 Script executed:
# Check if operations validate input tensor types before execution
rg -n 'dtype.*==\|dtype.*!=\|type.*mismatch' mllm/backends/cpu/ops --type=cpp | head -20Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Look at the actual CPU Add operation implementation
find mllm/backends/cpu -name "*.cpp" | xargs grep -A 30 "void CPUAddOp::forward"Repository: UbiquitousLearning/mllm
Length of output: 2883
🏁 Script executed:
# Check how output dtype is determined in the operation context
rg -n 'buildOpAndSubmitTask.*Add|buildOpAndSubmitTask.*Sub|buildOpAndSubmitTask.*Mul' mllm/core --type=cpp -B 5 -A 10 | head -50Repository: UbiquitousLearning/mllm
Length of output: 3179
🏁 Script executed:
# Look for how output tensor dtype is set when operations are executed
rg -n 'output\.dtype\(\)|dtype.*output' mllm/core/aops/ElewiseOps.cpp | head -20Repository: UbiquitousLearning/mllm
Length of output: 178
🏁 Script executed:
# Check if there's any type validation or conversion in the BaseOp or Context when executing operations
rg -n 'dtype.*mismatch|type.*check|validateInput' mllm/core --type=cpp | head -20Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Look at how the output tensor is created to see what dtype it gets
sed -n '100,150p' mllm/core/aops/ElewiseOps.cppRepository: UbiquitousLearning/mllm
Length of output: 1503
🏁 Script executed:
# Check how output dtype is determined more carefully - look at the macro definition
sed -n '1,120p' mllm/core/aops/ElewiseOps.cppRepository: UbiquitousLearning/mllm
Length of output: 9353
🏁 Script executed:
# Verify the actual output type by looking at the full macro
grep -A 20 '__MLLM_ELEWISE_OP_IMPL\|__MLLM_ELEWISE_OP_DEFINE' mllm/core/aops/ElewiseOps.cpp | head -60Repository: UbiquitousLearning/mllm
Length of output: 2887
🏁 Script executed:
# Check what happens when inputs have mismatched dtypes - look for any type casting
rg -n 'cast.*dtype|dtype.*cast|input.*cast' mllm/backends/cpu/ops/ElewiseOps.cpp | head -20Repository: UbiquitousLearning/mllm
Length of output: 713
🏁 Script executed:
# See if there's validation in the Context that checks input compatibility
find mllm/core -name "Context*" | head -3Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Check if there are any tests showing mixed-type add/sub/mul operations
find . -path "*/test*" -name "*.cpp" -o -name "*.hpp" | xargs grep -l "add.*kInt\|mul.*kFloat\|sub.*kInt" 2>/dev/nullRepository: UbiquitousLearning/mllm
Length of output: 875
🏁 Script executed:
# Let me understand the exact problem - when add() uses different data_type, what happens at the backend
# Check if there's any type casting or validation when inputs have different dtypes
rg -n 'input0\.ptr.*input1\.ptr|inputs\[0\]\.ptr.*inputs\[1\]\.ptr' mllm/backends/cpu/ops/ElewiseOps.cpp | head -10Repository: UbiquitousLearning/mllm
Length of output: 1330
🏁 Script executed:
# Look more carefully at what happens when input types don't match
sed -n '200,250p' mllm/backends/cpu/ops/ElewiseOps.cppRepository: UbiquitousLearning/mllm
Length of output: 2334
🏁 Script executed:
# Check if there are type compatibility assertions or checks
rg -n 'assert.*dtype|CHECK.*dtype|validate.*dtype' mllm --type=cpp | head -10Repository: UbiquitousLearning/mllm
Length of output: 181
🏁 Script executed:
# See if the Qwen example actually uses the result of the mixed-type operation
sed -n '188,200p' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hppRepository: UbiquitousLearning/mllm
Length of output: 555
🏁 Script executed:
# Let me trace exactly what happens in the Qwen3 case - what is attn_min's dtype?
sed -n '185,198p' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hppRepository: UbiquitousLearning/mllm
Length of output: 627
🏁 Script executed:
# Check what type scale_ is in the Qwen3 model
grep -n 'scale_' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp | head -10Repository: UbiquitousLearning/mllm
Length of output: 179
🏁 Script executed:
# Now verify the critical issue - if input1 is int16 but output is float, what happens?
# The backend will call input1.ptr<float>() on an int16 tensor - this is unsafe casting
# Let's confirm the reshape macro determines output dtype as inputs[0].dtype()
sed -n '28,50p' mllm/core/aops/ElewiseOps.cppRepository: UbiquitousLearning/mllm
Length of output: 1991
🏁 Script executed:
# Check if there's any type validation before the backend operation is called
rg -n 'input.*dtype.*==\|inputs\[.*\]\.dtype' mllm/core/Tensor.cpp | grep -A 2 -B 2 'buildOpAndSubmitTask.*Add\|buildOpAndSubmitTask.*Sub\|buildOpAndSubmitTask.*Mul'Repository: UbiquitousLearning/mllm
Length of output: 49
🏁 Script executed:
# Look at what happens when we have a float tensor and call .add(scalar, kInt16)
# Does it actually work or fail? Check CPU backend implementation more carefully
sed -n '140,160p' mllm/backends/cpu/ops/ElewiseOps.cppRepository: UbiquitousLearning/mllm
Length of output: 1390
🏁 Script executed:
# Verify the critical type safety issue - when input1 is int16 but we're casting to float
# Let's see what happens when we have float input0 and int16 input1 being added
sed -n '140,180p' mllm/backends/cpu/ops/ElewiseOps.cppRepository: UbiquitousLearning/mllm
Length of output: 2432
🏁 Script executed:
# Check if there's any backend that properly handles mixed-type operations
find mllm/backends -name "ElewiseOps.cpp" -o -name "ElewiseOps.hpp" | head -5Repository: UbiquitousLearning/mllm
Length of output: 140
🏁 Script executed:
# Look for any OpenCL or other backend implementations that might handle mixed types
find mllm/backends -path "*/ops/Elewise*" -type f | head -10Repository: UbiquitousLearning/mllm
Length of output: 218
🏁 Script executed:
# Search for any documentation or comments about type compatibility in mixed operations
rg -n 'mixed.*type|type.*mismatch|type.*compat' mllm --type=cpp --type=hpp | head -20Repository: UbiquitousLearning/mllm
Length of output: 94
🏁 Script executed:
# Critical check: verify what happens when add() is called with a different data_type
# Let's trace through the exact scenario: float tensor with .add(scalar, kInt16)
# The output will be float (from reshape using inputs[0].dtype())
# But input1 will be int16
# In CPU backend, when dtype=kFloat32, it does input1.ptr<mllm_fp32_t>() on an int16 tensor
# This is unsafe! Let's verify by looking at the actual case
python3 << 'EOF'
# Let's trace what happens in the Qwen3 example:
# attn_min is float32 (output of min operation)
# attn_min.add(minus_value, kInt16) creates:
# - input0 = attn_min (float32)
# - input1 = rhs_tensor (int16, shape {1})
# - output = Tensor::empty with dtype = inputs[0].dtype() = float32
#
# In CPU backend forward() for AddOp:
# - dtype = output.dtype() = float32
# - Executes case kFloat32:
# - Calls input1.ptr<mllm_fp32_t>() on a tensor that is actually int16
# - This is undefined behavior / type safety violation
print("Identified critical issue:")
print("- add/sub/mul with different data_type allows mixed-type operations")
print("- Output dtype is always LHS dtype (inputs[0].dtype())")
print("- Backend assumes input1 has same dtype as output")
print("- input1.ptr<float>() called on int16 data = TYPE SAFETY BUG")
EOFRepository: UbiquitousLearning/mllm
Length of output: 153
Mixed-type scalar operations contain a critical type safety bug where the backend unsafely casts mismatched input types.
When add(), sub(), or mul() are called with a data_type parameter differing from the tensor's dtype, the operation framework creates a type mismatch:
- Output dtype is forced to match the LHS tensor (via
inputs[0].dtype()in the reshape macro) - Input1 is created with the specified
data_typeparameter - The CPU backend's
forward()method switches on output dtype and directly casts both inputs to that type without validation
This causes undefined behavior. For instance, float_tensor.add(scalar, kInt16) results in:
- input0: float32
- input1: int16
- output: float32
When the backend processes the float32 case, it calls input1.ptr<mllm_fp32_t>() on an int16 tensor, violating type safety.
This bug is triggered in the Qwen3 model where attn_min.add(minus_value, kInt16) mixes float32 and int16 types.
Required fixes:
- Validate type compatibility in the reshape macro (reject mismatches or auto-cast consistently)
- Ensure backend implementations only cast between compatible types
- Add assertions or checks before unsafe pointer casts
🤖 Prompt for AI Agents
In mllm/core/Tensor.cpp around lines 170-190, the scalar add implementation
creates rhs tensors with an explicit data_type that can differ from the LHS
tensor dtype, which leads to unsafe backend pointer casts and UB; fix by (1)
validating that the requested data_type matches the LHS dtype or performing an
explicit, safe cast/conversion of the scalar into the LHS dtype before creating
rhs_tensor (i.e., always allocate rhs_tensor with the same dtype as *this and
convert rhs into that representation), (2) update the reshape/op construction so
input/output dtypes are consistent (reject mixed-type ops or canonicalize types
before building ops), and (3) add runtime checks/assertions in CPU/backend
forward paths to assert input tensor dtypes match expected output dtype before
performing ptr<T>() casts so mismatches fail early rather than causing UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
CMakeLists.txt (1)
263-264: LGTM! Consider adding a brief comment.The FlatBuffers integration is configured correctly:
- Tests are disabled to avoid unnecessary builds
EXCLUDE_FROM_ALLensures it's only built when needed by dependent targetsOptionally, consider adding a brief comment explaining the purpose:
🔎 Suggested documentation
+# FlatBuffers for IR serialization set(FLATBUFFERS_BUILD_TESTS OFF) add_subdirectory(third_party/flatbuffers EXCLUDE_FROM_ALL)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.gitmodulesCMakeLists.txtmllm/CMakeLists.txtthird_party/flatbufferstools/mllm-ir-netron/README.md
✅ Files skipped from review due to trivial changes (1)
- third_party/flatbuffers
🧰 Additional context used
📓 Path-based instructions (2)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/CMakeLists.txt
{mllm,mllm-cli,pymllm}/**/*.{toml,json,txt,md}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When suggesting new dependencies, evaluate their maturity, licensing, and maintenance status. Prefer well-established and actively maintained libraries.
Files:
mllm/CMakeLists.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-android
- GitHub Check: build-macos
- GitHub Check: build-x86
🔇 Additional comments (2)
.gitmodules (1)
40-42: No issues with the FlatBuffers target name. The code correctly usesflatbuffers(lowercase), which is the proper CMake target exported by FlatBuffers when included viaadd_subdirectory(third_party/flatbuffers)in the root CMakeLists.txt. The namespaced targetflatbuffers::flatbuffersdoes not exist in FlatBuffers—onlyFlatBuffers::FlatBuffers(CamelCase) is provided as an alias for find_package usage, which is not relevant here.Likely an incorrect or invalid review comment.
mllm/CMakeLists.txt (1)
35-35: Remove unused FlatBuffers dependency or initialize the git submodule and use correct target name.The target name
flatbuffersis incorrect for the current build configuration. When usingadd_subdirectory(third_party/flatbuffers), the CMake target should beFlatBuffers(the interface alias exported by FlatBuffers' CMakeLists.txt). Additionally, thethird_party/flatbuffersgit submodule is currently empty and will cause the build to fail. More importantly, FlatBuffers is not used anywhere in the codebase—no includes or references exist—making this a dead dependency that should be removed entirely.
UbiquitousLearning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- Added a function `cloneQuantizationSpecType` to clone quantization specifications without duplicating scale data. - Updated `LLMQuantRecipeRMSNormPattern` to ensure weight quantization methods match input types without sharing. - Modified `RMSNormOp` to register parameters in the IR context if they do not already exist. - Changed `mllm_uint4_t` structure to use `uint8_t` instead of `int8_t` for data representation. - Added multiple TODO comments in `QnnAOTNodeTensor` and `QnnAOTEnv` regarding the handling of constant values and static weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
♻️ Duplicate comments (6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
684-685: Add defensive config validation to prevent runtime exceptions.Lines 684-685 directly access nested config keys without validation. The nlohmann::json library throws exceptions if keys don't exist. Follow the defensive pattern used elsewhere in the codebase (e.g., AOTPipeline.cpp, MarkTensorIO.cpp) by checking key existence with
.contains()before access.🔎 Proposed fix
- auto config = AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"]["linear"]; - auto use_config = config["fallback"]; + if (!AOTCompileContext::getInstance().getConfig().contains("quant_recipe") || + !AOTCompileContext::getInstance().getConfig()["quant_recipe"].contains("builtin_llm_pass") || + !AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"].contains("linear")) { + MLLM_WARN("LinearPattern: quant_recipe.builtin_llm_pass.linear config not found"); + return false; + } + auto config = AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"]["linear"]; + if (!config.contains("fallback")) { + MLLM_WARN("LinearPattern: fallback config not found"); + return false; + } + auto use_config = config["fallback"];mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)
286-293: FIXME comment should be removed (duplicate concern).The mapping of
kInt4/kUInt4toQNN_DATATYPE_SFIXED_POINT_4/QNN_DATATYPE_UFIXED_POINT_4is correct per QNN documentation. The FIXME comment on line 286 is misleading and should be removed.As noted in a previous review, the 4-bit quantized types are actively used (e.g., "w4a16" quantization recipes), and the mapping follows the same pattern as 8-bit and 16-bit variants.
336-336: Assertion onquant_recipeshould be replaced with graceful handling (duplicate concern).Line 336 uses
MLLM_RT_ASSERTwhich will crash ifquant_recipeis missing. A previous review suggested checking for null and returning default parameters instead.This is the same issue flagged in previous reviews. Consider implementing the suggested fix from the past review comment.
473-493:addOperationimplementation incomplete (duplicate concern).The method constructs a
Qnn_OpConfig_tbut has TODO comments for critical parts (lines 484-490): parameters, inputs, outputs, validation, and actually adding the node to the QNN graph. Line 492 only stores the operation locally without registering it with the QNN backend.This is the same issue flagged in previous reviews. The method is currently non-functional as a stub.
495-501:compile()is a stub (duplicate concern).The method only sets
is_compiled_to true without performing actual compilation. This may cause issues if code expects compiled graphs.This was flagged in previous reviews. Consider tracking this with a TODO comment or implementing the compilation logic.
802-805:captureAOTGraphreturns nullptr (duplicate concern).The stub implementation returns nullptr, which will cause null pointer issues for callers expecting a valid graph.
This was flagged in previous reviews.
🧹 Nitpick comments (5)
mllm/core/DataTypes.hpp (2)
331-334: Add documentation for the 4-bit signed integer type.Unlike the GGUF quantization types above (lines 111-194), this struct lacks documentation explaining its purpose and usage. Consider adding a Doxygen comment clarifying how the 4-bit value is stored within the 8-bit field and any usage constraints.
🔎 Suggested documentation
+/** + * @struct mllm_int4_t + * @brief 4-bit signed integer type. + * + * Represents a 4-bit signed integer value stored in an 8-bit field. + * Valid range: -8 to 7. + */ typedef struct { // NOLINT int8_t data; } mllm_int4_t;As per coding guidelines, public APIs should have clear docstrings.
336-339: Previous issue resolved; consider adding documentation.The previous concern about using
int8_tinstead ofuint8_thas been addressed—line 337 now correctly usesuint8_tfor the unsigned 4-bit type. However, likemllm_int4_tabove, this struct would benefit from a Doxygen comment explaining its purpose and valid range (0 to 15).🔎 Suggested documentation
+/** + * @struct mllm_uint4_t + * @brief 4-bit unsigned integer type. + * + * Represents a 4-bit unsigned integer value stored in an 8-bit field. + * Valid range: 0 to 15. + */ typedef struct { // NOLINT uint8_t data; } mllm_uint4_t;As per coding guidelines, public APIs should have clear docstrings.
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
275-288: Consider explicitly defining or deleting special member functions.The class declares a default destructor but doesn't explicitly declare copy/move constructors or assignment operators. While the default behavior is correct for the member types used, explicitly deleting these operations would make the intent clearer and prevent accidental copying of Pass instances, which is typically undesirable.
🔎 Proposed enhancement
class LLMQuantRecipePass final : public ir::Pass { public: LLMQuantRecipePass(); ~LLMQuantRecipePass() override = default; + + LLMQuantRecipePass(const LLMQuantRecipePass&) = delete; + LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete; + LLMQuantRecipePass(LLMQuantRecipePass&&) = default; + LLMQuantRecipePass& operator=(LLMQuantRecipePass&&) = default; uint8_t run(const ir::node_ptr_t& op) override;mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
142-147: Remove duplicate TODO comments.Six identical TODO comments clutter the code. Consolidate to a single comment at line 142.
🔎 Proposed fix
- // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned + // TODO: Constant values should also use Static and can be pruned
816-821: Remove duplicate TODO comments.Six identical TODO comments clutter the code (same issue as lines 142-147). Consolidate to a single comment at line 816.
🔎 Proposed fix
- // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned - // TODO Constant value should also use Static!!! And they can be pruned + // TODO: Constant values should also use Static and can be pruned
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
examples/qwen3_qnn_aot/qwen3_qnn_aot.mirmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hppmllm/core/DataTypes.hppmllm/core/aops/RMSNormOp.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
mllm/core/aops/RMSNormOp.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/core/DataTypes.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
mllm/core/aops/RMSNormOp.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/core/DataTypes.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
mllm/core/aops/RMSNormOp.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
mllm/core/aops/RMSNormOp.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/core/DataTypes.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
🧠 Learnings (4)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.
Applied to files:
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Use named constants instead of magic numbers.
Applied to files:
mllm/core/DataTypes.hpp
🧬 Code graph analysis (3)
mllm/core/aops/RMSNormOp.cpp (2)
mllm/core/aops/RMSNormOp.hpp (1)
weight_(34-34)mllm/core/aops/EmbeddingOp.cpp (1)
guard(54-54)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (3)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (13)
QnnAOTNodeTensor(107-107)v(103-105)v(103-103)v(110-110)v(112-112)v(114-114)v(116-116)name(156-156)name(261-261)name(263-263)name(265-265)create(48-50)create(48-48)mllm/compile/ir/linalg/Attribute.hpp (3)
type(68-74)type(68-68)create(76-76)mllm/core/DataTypes.hpp (1)
nameOfType(733-733)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
LLMQuantRecipePass(941-964)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
[error] 23-23: all parameters should be named in a function
(readability-named-parameter,-warnings-as-errors)
[error] 26-26: variable name 'rw' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 67-67: 2 adjacent parameters of 'genSimpleQuantizationSpecAttr' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 69-69: variable 'ret' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 71-71: variable 'spec' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 114-114: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 115-115: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 135-135: 2 adjacent parameters of 'noSharingSingleInAndSingleOutQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 135-135: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 145-145: variable 'quant_spec_i0' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 146-146: variable 'quant_spec_o0' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 169-169: 2 adjacent parameters of 'cloneQuantizationSpecType' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 174-174: variable 'cloned_spec' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
[error] 28-28: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 30-30: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 40-40: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 54-54: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 66-66: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 80-80: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 94-94: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 106-106: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 118-118: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 132-132: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 146-146: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 160-160: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 174-174: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 188-188: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 200-200: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 212-212: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 226-226: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 240-240: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 252-252: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 266-266: parameter name 'op' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 275-275: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator
(cppcoreguidelines-special-member-functions,-warnings-as-errors)
[error] 283-283: parameter name 'p' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-x86
- GitHub Check: build-macos
- GitHub Check: build-android
🔇 Additional comments (4)
mllm/core/aops/RMSNormOp.cpp (1)
9-10: LGTM! Required includes for new IR registration logic.These includes provide the necessary types (
ir::tensor::RegisterOpandir::graph::SubGraphOp) used in the parameter registration block below.mllm/core/DataTypes.hpp (3)
588-593: LGTM!The 4-bit type information is correctly defined with accurate ranges: signed 4-bit (-8 to 7) and unsigned 4-bit (0 to 15). The implementation follows the established pattern used for other basic types in the file.
707-708: LGTM!The self-type-info registrations correctly map the 4-bit enum values to their corresponding C++ types, following the established pattern used throughout the file.
771-772: LGTM!The macro definitions for 4-bit types follow the established naming convention and maintain consistency with the existing compatibility layer, even though lines 737-739 indicate these macros are deprecated.
| break; | ||
| } | ||
| default: { | ||
| NYI("Only support [int16, int8, bf16, f16, sf32] for now."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in data type list.
Line 439 contains "sf32" which should be "f32" to match the standard float32 type abbreviation.
🔎 Proposed fix
default: {
- NYI("Only support [int16, int8, bf16, f16, sf32] for now.");
+ NYI("Only support [int16, int8, bf16, f16, f32] for now.");
}🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around line 439, the
error message lists "sf32" but it should be the standard "f32"; update the
string literal to replace "sf32" with "f32" so the message reads "Only support
[int16, int8, bf16, f16, f32] for now." and ensure no other occurrences of
"sf32" remain in this file.
| MLLM_WARN("LLMQuantRecipeEqualPattern Only support constant Value as second inputs right now. Pls send us a issue or PR " | ||
| "if you want to compare two normal tensor(rather than static-tensor)."); | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the pattern name in warning message.
The warning message references "LLMQuantRecipeEqualPattern" but this code is in LLMQuantRecipeElementwisePattern::rewrite. This copy-paste error could confuse debugging.
🔎 Proposed fix
} else {
- MLLM_WARN("LLMQuantRecipeEqualPattern Only support constant Value as second inputs right now. Pls send us a issue or PR "
+ MLLM_WARN("LLMQuantRecipeElementwisePattern Only support constant Value as second inputs right now. Pls send us a issue or PR "
"if you want to compare two normal tensor(rather than static-tensor).");
return false;
}🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around lines 444-446, the
warning message incorrectly references "LLMQuantRecipeEqualPattern" (copy-paste
error) while inside LLMQuantRecipeElementwisePattern::rewrite; update the string
to reference "LLMQuantRecipeElementwisePattern" (or the exact class/function
name used in this file) so the log correctly identifies the emitting pattern,
keeping the rest of the message unchanged.
| break; | ||
| } | ||
| default: { | ||
| NYI("Only support [int16, int8, bf16, f16, f32] for now."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in data type list.
Line 597 contains "f32" where it should match the standard format. The supported types list should be consistent with other patterns.
🔎 Proposed fix
Ensure the data type list matches the switch cases above (lines 585-591) which include kFloat32, kFloat16, kBFloat16, etc.
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around line 597, the NYI
message lists "f32" which is inconsistent with the type names used earlier in
the switch (e.g., kFloat32, kFloat16, kBFloat16); update the message to use the
matching standard name (e.g., "float32") so the supported-types list reads
consistently (for example: [int16, int8, bf16, f16, float32]).
| if (use_config["method"] == "LPBQ") { | ||
| // Unpack | ||
| std::string precision = use_config["precision"]; | ||
| bool sym = use_config["sym"]; | ||
| int block_size = use_config["block_size"]; | ||
| MLLM_RETURN_FALSE_IF_NOT(sym); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add defensive validation for nested config values.
Lines 727-732 access nested config values (use_config["method"], use_config["precision"], etc.) without checking if these keys exist. Add defensive checks before accessing these values to prevent runtime exceptions.
🔎 Proposed fix
- if (use_config["method"] == "LPBQ") {
+ if (!use_config.contains("method") || !use_config.contains("precision") || !use_config.contains("sym") || !use_config.contains("block_size")) {
+ MLLM_WARN("LinearPattern: incomplete config for linear operation");
+ return false;
+ }
+ if (use_config["method"] == "LPBQ") {
// Unpack
std::string precision = use_config["precision"];
bool sym = use_config["sym"];
int block_size = use_config["block_size"];🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around lines 727-732, the
code reads nested config keys (use_config["method"], ["precision"], ["sym"],
["block_size"]) without verifying those keys exist or have expected types; add
defensive validation before each access: check use_config.contains("method") and
that it is a string and equals "LPBQ" (or handle missing), then check
contains("precision") (string), contains("sym") (bool) and
contains("block_size") (int), and use MLLM_RETURN_FALSE_IF_NOT or equivalent to
fail early with clear messages if any key is missing or has the wrong type; if
appropriate provide safe defaults or short-circuit the LPBQ branch when
validations fail.
| if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") { | ||
| addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add defensive config validation for conditional pattern registration.
Line 961 directly accesses nested config keys without validation: config["quant_recipe"]["builtin_llm_pass"]["model"]. The nlohmann::json library throws exceptions if keys don't exist. Use defensive checking with .contains() before access, as noted in past review comments and consistent with patterns used elsewhere in the codebase.
🔎 Proposed fix
- if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
+ if (config.contains("quant_recipe") &&
+ config["quant_recipe"].contains("builtin_llm_pass") &&
+ config["quant_recipe"]["builtin_llm_pass"].contains("model") &&
+ config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100);
}🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around lines 961-963, the
code directly indexes config["quant_recipe"]["builtin_llm_pass"]["model"] which
can throw if any key is missing; add defensive checks before registering the
qwen3 pattern: verify config.contains("quant_recipe") &&
config["quant_recipe"].is_object() &&
config["quant_recipe"].contains("builtin_llm_pass") &&
config["quant_recipe"]["builtin_llm_pass"].is_object() &&
config["quant_recipe"]["builtin_llm_pass"].contains("model") (or use a safe
value lookup) and only compare the model string if present and of string type;
if missing, skip registration (or use a sensible default) to avoid exceptions.
|
|
||
| name_ = v->name(); | ||
| mllm_tensor_ = v->tensor_; | ||
| quant_spec_ = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe access to quant_recipe attribute.
Line 151 directly accesses the quant_recipe attribute without checking if it exists. If the attribute is missing, this will cause a null pointer dereference before the assertion on line 336 in parseQnnQuantizeParamFromIR.
🔎 Proposed fix
name_ = v->name();
mllm_tensor_ = v->tensor_;
- quant_spec_ = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
+ if (auto attr = v->getAttr("quant_recipe")) {
+ quant_spec_ = attr->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
+ }
for (auto s : v->tensor_.shape()) { shape_.emplace_back(s); }🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around line 151, the code directly
dereferences the "quant_recipe" attribute which can be absent; modify the code
to first check for the attribute's presence (e.g., use
v->hasAttr("quant_recipe") or capture getAttr("quant_recipe") into a variable
and check for null) before calling cast_ and accessing spec_; if the attribute
is missing, return or throw a clear error (or set a default) so you avoid a null
dereference and ensure the later assertion in parseQnnQuantizeParamFromIR is not
hit unexpectedly.
|
|
||
| // Prepare data | ||
| auto num_scale_offsets = (uint32_t)v->tensor_.size(cfg->ch_axis); | ||
| Qnn_ScaleOffset_t* scale_array = (Qnn_ScaleOffset_t*)malloc(sizeof(Qnn_ScaleOffset_t) * num_scale_offsets); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing null checks after malloc calls.
Lines 355, 384, and 393 call malloc without checking if allocation succeeded. If allocation fails, subsequent code will dereference null pointers, causing crashes.
🔎 Proposed fix
Add null checks after each malloc:
Qnn_ScaleOffset_t* scale_array = (Qnn_ScaleOffset_t*)malloc(sizeof(Qnn_ScaleOffset_t) * num_scale_offsets);
+ if (!scale_array) {
+ MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed to allocate memory for scale_array");
+ }
MLLM_RT_ASSERT_EQ(num_scale_offsets, cfg->scale.size(0));Apply similar checks at lines 384 and 393.
Also applies to: 384-384, 393-393
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 355, 384 and 393, the
code calls malloc without checking for NULL which can lead to dereferencing null
on allocation failure; after each malloc call check the returned pointer for
NULL, log or propagate an appropriate error, free any previously allocated
memory as needed, and return or set an error status instead of continuing
execution so no null pointer is dereferenced.
|
|
||
| // Register Params | ||
| if (weight_ && !ir_ctx->lookupSymbolTable(getName() + ".weight")) { | ||
| ir::IRWriterGuard guard(ir_ctx, ir_ctx->lookupSymbolTable("init")->cast_<ir::graph::SubGraphOp>()->getTopRegion()); | ||
| ir_ctx->create<ir::tensor::RegisterOp>(ir_ctx->create<ir::tensor::TensorValue>(weight_)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null-safety checks to prevent potential crash.
Line 37 chains multiple pointer operations without null checks:
lookupSymbolTable("init")could return nullptr if the "init" subgraph doesn't existcast_<ir::graph::SubGraphOp>()could fail if the type is incorrectgetTopRegion()could return nullptr
Any failure in this chain will cause undefined behavior or a crash.
🔎 Proposed fix with defensive checks
// Register Params
if (weight_ && !ir_ctx->lookupSymbolTable(getName() + ".weight")) {
- ir::IRWriterGuard guard(ir_ctx, ir_ctx->lookupSymbolTable("init")->cast_<ir::graph::SubGraphOp>()->getTopRegion());
+ auto init_symbol = ir_ctx->lookupSymbolTable("init");
+ if (!init_symbol) {
+ NYI("init subgraph not found in IR context");
+ }
+ auto init_subgraph = init_symbol->cast_<ir::graph::SubGraphOp>();
+ if (!init_subgraph) {
+ NYI("init symbol is not a SubGraphOp");
+ }
+ auto init_region = init_subgraph->getTopRegion();
+ if (!init_region) {
+ NYI("init subgraph has no top region");
+ }
+ ir::IRWriterGuard guard(ir_ctx, init_region);
ir_ctx->create<ir::tensor::RegisterOp>(ir_ctx->create<ir::tensor::TensorValue>(weight_));
}🤖 Prompt for AI Agents
In mllm/core/aops/RMSNormOp.cpp around lines 34-39, the chained calls
ir_ctx->lookupSymbolTable("init")->cast_<ir::graph::SubGraphOp>()->getTopRegion()
can dereference null/invalid pointers; add null-safety checks: first store the
result of lookupSymbolTable("init") in a local variable and verify it's
non-null, then attempt the cast and verify the cast succeeded (or use a
safe/dynamic cast API), then check the returned region is non-null before
constructing the IRWriterGuard; if any check fails, skip registration or
log/handle the error gracefully instead of proceeding to call methods on null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
pymllm/backends/qualcomm/transformers/static_qwen3.py (1)
1-3: Module appears incomplete with only unused imports.This file contains only import statements with no implementation, docstring, or module-level documentation. Please verify whether this is intentional (e.g., a stub for future implementation) or if implementation is missing from this PR.
If this is a placeholder, consider adding:
- A module-level docstring explaining the intended purpose
- A TODO comment indicating what needs to be implemented
If you'd like, I can help generate the implementation structure based on the quantization patterns mentioned in the PR objectives. Would you like me to open an issue to track the completion of this module?
pyproject.toml (1)
31-32: Add version constraints for torch and torchao.Both
torchandtorchaoare used in quantization features (particularly inpymllm/backends/qualcomm/transformers/static_qwen3.pyfor pt2e quantization) without version constraints, which can lead to reproducibility issues and unexpected breaking changes. PyTorch is particularly known for introducing breaking changes between major versions, and torchao's quantization functionality is version-sensitive.Consider pinning to specific versions or specifying minimum versions, similar to other dependencies in this file (e.g.,
apache-tvm-ffi == 0.1.0b4,pyyaml >= 6.0.2). At minimum:torch >= 2.0andtorchao >= 0.1.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
pymllm/backends/qualcomm/transformers/README.mdpymllm/backends/qualcomm/transformers/__init__.pypymllm/backends/qualcomm/transformers/static_qwen3.pypymllm/backends/qualcomm/transformers/train.pypyproject.toml
✅ Files skipped from review due to trivial changes (1)
- pymllm/backends/qualcomm/transformers/README.md
🧰 Additional context used
📓 Path-based instructions (5)
{mllm,mllm-cli,pymllm}/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).
Files:
pymllm/backends/qualcomm/transformers/static_qwen3.py
{mllm,mllm-cli,pymllm}/**/*.{sh,py}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
If a file starts with a shebang, it must be the first line and contain only the interpreter path and optional single argument, encoded in UTF-8.
Files:
pymllm/backends/qualcomm/transformers/static_qwen3.py
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.
Files:
pymllm/backends/qualcomm/transformers/static_qwen3.py
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.
Files:
pymllm/backends/qualcomm/transformers/static_qwen3.py
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).
Files:
pymllm/backends/qualcomm/transformers/static_qwen3.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-macos
- GitHub Check: build-x86
- GitHub Check: build-android
Summary by CodeRabbit
New Features
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.