feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization #572

chenghuaWang · 2025-12-24T14:06:06Z

Added LLMQuantRecipePass to handle quantization recipes for various operations.
Introduced multiple pattern classes (Index, Elementwise, Transpose, Concat, Repeat, MatMul, Equal, Where, Softmax, Linear, View, Embedding, Qwen3 Attention) to match and rewrite specific Linalg operations.
Removed obsolete QuantPrecisionsCfgPass files.
Created SplitLLMGraphPass with a placeholder for future graph splitting logic.
Enhanced quantization attributes with detailed specifications and improved dumping functionality.
Updated RTTI generation to include new quantization spec classes.
Added utility macros for error handling in common operations.

Summary by CodeRabbit

New Features
- Full LLM quantization recipe and richer LLM quant config; new passes to merge/split LLM head graphs.
- Backend AOT graph/tensor capture with improved tensor quantization handling and QNN integration.
- Typed scalar Tensor ops (add/sub/mul with explicit data-type).
- Added public 4-bit integer types and a 16-bit vector attribute.
Refactor
- Unified backend visitor/pattern interface and IR/RTTI/attribute changes to support quantization specs and new pass flow.
Chores
- Enabled extra example modules, added FlatBuffers submodule, added Python deps and docs for Qualcomm backend.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Added LLMQuantRecipePass to handle quantization recipes for various operations. - Introduced multiple pattern classes (Index, Elementwise, Transpose, Concat, Repeat, MatMul, Equal, Where, Softmax, Linear, View, Embedding, Qwen3 Attention) to match and rewrite specific Linalg operations. - Removed obsolete QuantPrecisionsCfgPass files. - Created SplitLLMGraphPass with a placeholder for future graph splitting logic. - Enhanced quantization attributes with detailed specifications and improved dumping functionality. - Updated RTTI generation to include new quantization spec classes. - Added utility macros for error handling in common operations.

coderabbitai · 2025-12-24T14:06:17Z

📝 Walkthrough

Walkthrough

Adds a pattern-driven LLM quantization-recipe pass, two new QNN AOT passes (merge/split LLM head), makes the AOT pipeline config-driven, extends IR quantization RTTI/attributes and numeric types (4-bit, VectorInt16, QuantizationSpec), updates Region ownership types, refactors QNN AOT wrappers/visitor APIs, and updates build/examples to include FlatBuffers and LLM config changes.

Changes

Cohort / File(s)	Change summary
AOT pipeline & new passes `mllm/backends/qnn/aot/passes/AOTPipeline.cpp`, `mllm/backends/qnn/aot/passes/LLMQuantRecipePass.{cpp,hpp}`, `mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.{cpp,hpp}`, `mllm/backends/qnn/aot/passes/SplitLLMGraphPass.{cpp,hpp}`	Add config-gated pipeline (runs LLMQuantRecipePass when quant_recipe.llm_recipe), implement LLMQuantRecipePass (many pattern match/rewrite rules), add MergeLLMHeadIntoMainGraphPass (move/rewire lm_head), and add placeholder SplitLLMGraphPass.
QNN AOT wrappers & env API `mllm/backends/qnn/aot/QnnWrappersAPI.{hpp,cpp}`	Major API/state changes: QnnAOTNodeTensor gains force_static_weight and parsing helpers; parse helpers accept IR tensor; QnnAOTNodeOperation naming (set/get); QnnAOTGraph addOperation/compile semantics changed; QnnAOTEnv gains captureAOTGraph/captureAOTNodeOp/captureQnnAOTNodeTensor and getFuncSymbol.
Visitor interface & elementwise pattern `mllm/backends/qnn/aot/visitor/Base.hpp`, `.../Elewise.{cpp,hpp}`	Replace virtual addNode(...) with compile(IRWriter&, op); refactor elementwise Add pattern to construct QnnAOTNodeOperation, attach captured tensors, validate context/graph, and register via QnnAOTEnv; remove QnnAOTAddQuantRecipePattern.
IR quantization attrs & RTTI `mllm/compile/ir/linalg/Attribute.{hpp,cpp}`, `mllm/compile/ir/GeneratedRTTIKind.hpp`, `mllm/compile/ir/NodeRTTIClassOfImpl.hpp`, `mllm/compile/ir/rtti_kind_gen.py`	Introduce QuantizationSpec (including kRaw), new LinalgIRQuantizatonSpecAttr with builders and rich dump/stringification, update RTTI enums/macros and generator to include new attributes.
IR ownership & removal semantics `mllm/compile/ir/Node.hpp`, `mllm/compile/ir/Node.cpp`	Region::inputs()/outputs() now use std::list<node_weak_ptr_t> (instead of val_ptr_t); IRWriter::removeOp clears op->belongsTo() when removing ops.
IR builtin attrs & VectorInt16 `mllm/compile/ir/builtin/Attribute.{hpp,cpp}`, `mllm/compile/ir/NodeRTTIClassOfImpl.hpp`	Add VectorInt16Attr with builder, data accessor, dump; add corresponding RTTI macro.
DataTypes & 4-bit types `mllm/core/DataTypes.{hpp,cpp}`	Add DataTypes kInt4/kUInt4 and mllm_int4_t/mllm_uint4_t types, register basic/self type infos, update bytesOfType and nameOfType.
Tensor scalar ops `mllm/core/Tensor.{hpp,cpp}`	Add typed scalar Tensor methods: add/sub/mul(float rhs, DataTypes) and corresponding overloads.
Elementwise constant handling `mllm/core/aops/ElewiseOps.cpp`	When constant input present, select VectorFP32Attr for kFloat32 or VectorInt16Attr for kInt16; warn for unsupported constant types.
RMSNorm trace registration `mllm/core/aops/RMSNormOp.cpp`	Register runtime parameter (weight) in symbol table during trace when needed and emit RegisterOp in init subgraph.
QNN visitor lowering changes `mllm/backends/qnn/aot/visitor/Elewise.cpp`	New compile flow: retrieve AOTCompileContext, validate quant_recipe/graph/context, build and name QnnAOTNodeOperation, attach captured tensors, register op via QnnAOTEnv->captureAOTNodeOp.
Utilities & macros `mllm/utils/Common.hpp`	Add MLLM_RETURN_FALSE_IF_NOT and MLLM_RETURN_FALSE_IF macros for early-false returns.
Examples & build config `examples/CMakeLists.txt`, `examples/qwen3_qnn_aot/compile.cpp`, `examples/qwen3_qnn_aot/qnn_aot_cfg.json`, `examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp`, `CMakeLists.txt`, `mllm/CMakeLists.txt`, `.gitmodules`, `third_party/flatbuffers`	Re-enable example subdirs, remove explicit qnn_aot_env.createContext creation in compile.cpp, restructure qnn_aot_cfg.json to builtin_llm_pass/kvs/fallbacks, adjust attention quant/dequant and masked-softmax code, add FlatBuffers submodule and link flatbuffers.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as User/CLI
  participant AOT as AOTPipeline
  participant Pass as LLMQuantRecipePass
  participant Merge as MergeLLMHeadPass
  participant Env as QnnAOTEnv
  participant Graph as QnnAOTGraph

  CLI->>AOT: request compile (config with quant_recipe.llm_recipe=true)
  AOT->>Pass: run LLMQuantRecipePass on IR
  Note right of Pass `#e6f7ff`: pattern-match & annotate ops\nwith QuantizationSpec attrs
  AOT->>Merge: run MergeLLMHeadIntoMainGraphPass
  Merge->>AOT: relocate lm_head and rewire returns
  AOT->>Env: lowering invokes captureAOTGraph / captureQnnAOTNodeTensor / captureAOTNodeOp
  Env->>Graph: register tensors & ops into graph
  Graph->>Env: compile graph (set is_compiled_)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(Qnn AOT): AOT and AOT Runtime. Qwen3 AOT Mode. #567 — overlapping edits to QNN AOT pipeline, QnnWrappersAPI, and pass/visitor infrastructure.
feat(Qnn AOT): Add MarkTensorIO pass and related changes for QNN AOT pipeline #569 — related changes to the QNN AOT lowering pipeline and LLM quant-recipe passes.
fix(minicpmo): fix minicpmo tokenization logic & streaming generation #549 — touches the same examples/CMakeLists adjustments (example subdirs).

Suggested reviewers

liang1232018
oreomaker
yirongjie

Poem

"I hop through IR by moonlit code,
I stitch the passes, shift the node.
Specs whisper, heads find their place,
Tensors hop and join the race.
A rabbit's patch—compiled with grace. 🐇"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.71% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: implementing LLMQuantRecipePass and associated patterns for quantization in the Qnn AOT backend.
Description check	✅ Passed	The description is mostly complete, listing major changes including LLMQuantRecipePass implementation, pattern classes, and quantization enhancements; however, it lacks specific details about why these changes are necessary or their expected impact.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

mllm/compile/ir/linalg/Attribute.cpp (1)

251-254: Same parenthesis issue for kNone in the second lambda.

This kNone case has the same missing closing parenthesis issue as the first lambda.

🧹 Nitpick comments (10)

mllm/utils/Common.hpp (1)
57-61: Consider adding brief documentation for the new macros.

The new early-return guard macros are clear and follow existing patterns in the codebase. However, adding a brief comment explaining their purpose and typical usage would improve developer ergonomics.
📝 Suggested documentation
+// Early-return guards for boolean predicates
 #define MLLM_RETURN_FALSE_IF_NOT(x) \
   if (!(x)) { return false; }
 
 #define MLLM_RETURN_FALSE_IF(x) \
   if ((x)) { return false; }
Note: Static analysis suggests using constexpr template functions instead of macros, but this is a stylistic preference. The macro approach is consistent with existing codebase conventions (e.g., NYI, RT_ASSERT, FORCE_INLINE) and is appropriate for simple control-flow guards.
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)
25-30: Consider breaking early after finding CallGraphOp.

The walk asserts call_main_graph_op == nullptr on each iteration, meaning it expects exactly one CallGraphOp. Consider returning WALK_BREAK after finding the first one to avoid unnecessary iteration.
🔎 Suggested optimization
   writer.walk<ir::graph::CallGraphOp>(
       [&](ir::IRWriter& /*writer*/, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {
         MLLM_RT_ASSERT_EQ(call_main_graph_op, nullptr);
         call_main_graph_op = call_op;
-        return ir::IRWriter::WalkResult::WALK_CONTINUE;
+        return ir::IRWriter::WalkResult::WALK_BREAK;
       });
82-106: Redundant assertions can be consolidated.

Lines 83-84 and 98-99 perform identical assertions (inputs().size() == 1 and outputs().size() == 1). Since the assertions at lines 83-84 already validate these invariants, the second set at lines 98-99 is redundant.
🔎 Suggested consolidation
   // Make sure the output is right
   {
     MLLM_RT_ASSERT_EQ(llm_head_op->inputs().size(), 1);
     MLLM_RT_ASSERT_EQ(llm_head_op->outputs().size(), 1);
     auto llm_head_input = llm_head_op->inputs().front();
     auto llm_head_output = llm_head_op->outputs().front();
     auto& graph_region_outs = main_graph->outputs();

     // Replace llm_head_input with llm_head_output in graph outputs
     std::replace(graph_region_outs.begin(), graph_region_outs.end(), llm_head_input, llm_head_output);
     std::replace(return_op->inputs().begin(), return_op->inputs().end(), llm_head_input, llm_head_output);
     std::replace(main_graph->getTopRegion()->outputs().begin(), main_graph->getTopRegion()->outputs().end(), llm_head_input,
                  llm_head_output);
-  }
 
-  // Make sure the call_graph_op's output is right
-  {
-    MLLM_RT_ASSERT_EQ(llm_head_op->inputs().size(), 1);
-    MLLM_RT_ASSERT_EQ(llm_head_op->outputs().size(), 1);
-    auto llm_head_input = llm_head_op->inputs().front();
-    auto llm_head_output = llm_head_op->outputs().front();
-    auto& graph_region_outs = call_main_graph_op->outputs();
-
-    // Replace llm_head_input with llm_head_output in call graph outputs
-    std::replace(graph_region_outs.begin(), graph_region_outs.end(), llm_head_input, llm_head_output);
+    // Also update call_graph_op's outputs
+    auto& call_graph_outs = call_main_graph_op->outputs();
+    std::replace(call_graph_outs.begin(), call_graph_outs.end(), llm_head_input, llm_head_output);
   }
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
21-21: Unused variable: writer is created but never used.

The IRWriter instance is constructed but not utilized in the current placeholder implementation. Either remove it until the graph splitting logic is implemented, or mark it as [[maybe_unused]] to suppress compiler warnings.
🔎 Suggested fix
   auto module_op = op->cast_<ir::ModuleOp>();
-  auto writer = ir::IRWriter(getCtx(), module_op->getTopRegion());
+  [[maybe_unused]] auto writer = ir::IRWriter(getCtx(), module_op->getTopRegion());

   // TODO: Implement graph splitting logic here
Or simply remove the line until it's needed.
23-23: Placeholder TODO noted.

The pass is a placeholder for future graph-splitting logic. The TODO comment follows the coding guidelines format.

Would you like me to open an issue to track the implementation of this graph splitting logic?
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
11-18: Consider explicitly deleting copy/move operations for Rule of 5 compliance.

Per cppcoreguidelines-special-member-functions, defining a destructor typically implies copy/move should also be addressed. Since passes are meant to be used via shared_ptr from the factory, explicitly deleting copy/move operations clarifies intent and prevents accidental slicing.
🔎 Proposed fix
 class SplitLLMGraphPass final : public ir::Pass {
  public:
   SplitLLMGraphPass() = default;
+  SplitLLMGraphPass(const SplitLLMGraphPass&) = delete;
+  SplitLLMGraphPass& operator=(const SplitLLMGraphPass&) = delete;
+  SplitLLMGraphPass(SplitLLMGraphPass&&) = delete;
+  SplitLLMGraphPass& operator=(SplitLLMGraphPass&&) = delete;
 
   ~SplitLLMGraphPass() override = default;
 
   uint8_t run(const ir::node_ptr_t& op) override;
 };
mllm/compile/ir/linalg/Attribute.cpp (1)

166-270: Extract duplicated gen_quant_spec_str lambda into a shared utility.

The gen_quant_spec_str lambda is duplicated verbatim in both LinalgIRQuantizatonAnnotationAttr::dump (lines 26-126) and LinalgIRQuantizatonSpecAttr::dump (lines 167-267). Consider extracting this into a free function or static method to avoid code duplication and simplify future maintenance.
mllm/compile/ir/linalg/Attribute.hpp (1)
325-344: Consider applying Rule of 5 for consistency.

Per static analysis, this class defines a non-default destructor but lacks explicit copy/move operations. While the base class may handle this, explicitly deleting them would clarify intent. The short parameter name p in dump() is consistent with other dump methods in the codebase, so that hint can be ignored.
🔎 Proposed fix
 class LinalgIRQuantizatonSpecAttr final : public LinalgIRAttr {
  public:
   QuantizationSpec::ptr_t spec_;
 
   DEFINE_SPECIFIC_IR_CLASS(LinalgIRQuantizatonSpecAttr);
 
   ~LinalgIRQuantizatonSpecAttr() override;
+  LinalgIRQuantizatonSpecAttr(const LinalgIRQuantizatonSpecAttr&) = delete;
+  LinalgIRQuantizatonSpecAttr& operator=(const LinalgIRQuantizatonSpecAttr&) = delete;
 
   LinalgIRQuantizatonSpecAttr();
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

23-26: Multiple TODO patterns are placeholders.

These patterns match but don't perform actual rewrites. Ensure these are tracked for completion before the quantization pipeline is used in production.

Would you like me to open issues to track completion of these TODO rewrite implementations?

Also applies to: 39-42, 52-55, 65-68, 78-81, 91-94, 104-107, 117-120, 130-133, 143-146
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
187-200: Consider explicit copy/move deletion for Rule of 5 compliance.

Similar to other pass classes, LLMQuantRecipePass defines a destructor without explicit copy/move operations. Since passes are used via shared_ptr, explicitly deleting these operations clarifies intent.
🔎 Proposed fix
 class LLMQuantRecipePass final : public ir::Pass {
  public:
   LLMQuantRecipePass();
+  LLMQuantRecipePass(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete;
 
   ~LLMQuantRecipePass() override = default;

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4cbe15 and eff1182.

📒 Files selected for processing (22)

examples/CMakeLists.txt
examples/qwen3_qnn_aot/compile.cpp
examples/qwen3_qnn_aot/qnn_aot_cfg.json
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp
mllm/backends/qnn/aot/passes/QuantPrecisionsCfgPass.cpp
mllm/backends/qnn/aot/passes/QuantPrecisionsCfgPass.hpp
mllm/backends/qnn/aot/passes/SplitGraphPass.cpp
mllm/backends/qnn/aot/passes/SplitGraphPass.hpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp
mllm/compile/ir/GeneratedRTTIKind.hpp
mllm/compile/ir/Node.cpp
mllm/compile/ir/Node.hpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/compile/ir/linalg/Attribute.hpp
mllm/compile/ir/rtti_kind_gen.py
mllm/utils/Common.hpp

💤 Files with no reviewable changes (1)

examples/qwen3_qnn_aot/compile.cpp

🧰 Additional context used

📓 Path-based instructions (5)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/utils/Common.hpp
mllm/compile/ir/rtti_kind_gen.py
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
mllm/compile/ir/GeneratedRTTIKind.hpp
mllm/compile/ir/Node.hpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp
mllm/compile/ir/linalg/Attribute.hpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/compile/ir/Node.cpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/utils/Common.hpp
mllm/compile/ir/rtti_kind_gen.py
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
mllm/compile/ir/GeneratedRTTIKind.hpp
mllm/compile/ir/Node.hpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp
mllm/compile/ir/linalg/Attribute.hpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/compile/ir/Node.cpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/utils/Common.hpp
mllm/compile/ir/rtti_kind_gen.py
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
mllm/compile/ir/GeneratedRTTIKind.hpp
mllm/compile/ir/Node.hpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp
mllm/compile/ir/linalg/Attribute.hpp
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/compile/ir/Node.cpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp

{mllm,mllm-cli,pymllm}/**/*.{sh,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

If a file starts with a shebang, it must be the first line and contain only the interpreter path and optional single argument, encoded in UTF-8.

Files:

mllm/compile/ir/rtti_kind_gen.py

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/compile/ir/rtti_kind_gen.py
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/compile/ir/Node.cpp
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp

🧠 Learnings (3)

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Ensure functions that can fail return appropriate error codes or raise exceptions.

Applied to files:

mllm/utils/Common.hpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Use named constants instead of magic numbers.

Applied to files:

mllm/utils/Common.hpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Add comments for complex algorithms or non-obvious logic.

Applied to files:

examples/CMakeLists.txt

🧬 Code graph analysis (9)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (3)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (13)

op (18-18)

op (30-30)

op (44-44)

op (58-58)

op (72-72)

op (86-86)

op (100-100)

op (112-112)

op (124-124)

op (138-138)

op (152-152)

op (164-164)

op (178-178)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (1)

op (17-17)

mllm/compile/ir/Node.hpp (2)

op (468-468)

op (470-470)

mllm/compile/ir/linalg/Attribute.hpp (1)

mllm/compile/ir/linalg/Attribute.cpp (3)

LinalgIRQuantizatonSpecAttr (160-160)

LinalgIRQuantizatonSpecAttr (162-162)

LinalgIRQuantizatonSpecAttr (164-164)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (2)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)

op (17-17)

mllm/compile/ir/Node.hpp (2)

op (468-468)

op (470-470)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (4)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (43)

op (18-18)

op (30-30)

op (44-44)

op (58-58)

op (72-72)

op (86-86)

op (100-100)

op (112-112)

op (124-124)

op (138-138)

op (152-152)

op (164-164)

op (178-178)

op (193-193)

writer (20-20)

writer (32-32)

writer (46-46)

writer (60-60)

writer (74-74)

writer (88-88)

writer (102-102)

writer (114-114)

writer (126-126)

writer (140-140)

writer (154-154)

writer (166-166)

writer (180-180)

LLMQuantRecipePass (189-189)

LLMQuantRecipePass (191-191)

p (195-195)

make_shared (22-22)

make_shared (34-36)

make_shared (48-50)

make_shared (62-64)

make_shared (76-78)

make_shared (90-92)

make_shared (104-104)

make_shared (116-116)

make_shared (128-130)

make_shared (142-144)

make_shared (156-156)

make_shared (168-170)

make_shared (182-184)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)

op (17-17)

mllm/compile/ir/linalg/Attribute.hpp (9)

node (27-27)

node (27-27)

node (322-322)

node (322-322)

node (343-343)

node (343-343)

create (76-76)

p (318-318)

p (337-337)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)

run (16-26)

run (16-16)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (4)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)

run (16-109)

run (16-16)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)

op (17-17)

mllm/compile/ir/Node.hpp (4)

op (468-468)

op (470-470)

IRWriter (354-354)

IRWriter (355-355)

mllm/compile/ir/Node.cpp (1)

IRWriter (290-291)

mllm/compile/ir/linalg/Attribute.cpp (2)

mllm/core/DataTypes.hpp (1)

nameOfType (713-713)

mllm/compile/ir/linalg/Attribute.hpp (12)

p (318-318)

p (337-337)

spec (99-104)

spec (129-134)

spec (159-164)

spec (191-196)

spec (225-230)

spec (258-262)

spec (293-297)

ctx (320-320)

ctx (339-339)

ctx (341-341)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

LLMQuantRecipePass (293-311)

mllm/backends/qnn/aot/passes/AOTPipeline.cpp (4)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (2)

createMarkQnnGraphPass (68-68)

createMarkQnnGraphPass (68-68)

mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)

createOpNamingPass (88-88)

createOpNamingPass (88-88)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)

createMergeLLMHeadIntoMainGraphPass (111-111)

createMergeLLMHeadIntoMainGraphPass (111-111)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

createLLMQuantRecipePass (364-364)

createLLMQuantRecipePass (364-364)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp (1)

mllm/compile/ir/GeneratedRTTIKind.hpp (2)

v (161-161)

v (161-161)

🪛 Clang (14.0.6)

mllm/utils/Common.hpp

[error] 57-57: function-like macro 'MLLM_RETURN_FALSE_IF_NOT' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 60-60: function-like macro 'MLLM_RETURN_FALSE_IF' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp

[error] 11-11: class 'SplitLLMGraphPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/compile/ir/linalg/Attribute.hpp

[error] 325-325: class 'LinalgIRQuantizatonSpecAttr' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 337-337: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp

[error] 11-11: class 'MergeLLMHeadIntoMainGraphPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 18-18: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 30-30: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 44-44: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 58-58: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 72-72: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 86-86: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 100-100: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 112-112: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 124-124: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 138-138: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 152-152: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 164-164: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 178-178: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 187-187: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 195-195: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp

[error] 340-340: function-like macro 'RTTI_RK_ATTR_LINALGIRATTR_QUANTIZATIONSPEC_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-macos
GitHub Check: build-android
GitHub Check: build-x86

🔇 Additional comments (13)

examples/CMakeLists.txt (1)

6-7: The subdirectories are properly configured. Both examples/minicpm_o and examples/minicpm4 exist with valid CMakeLists.txt files and appropriate source code, confirming that the add_subdirectory() calls are valid.

mllm/compile/ir/GeneratedRTTIKind.hpp (1)

139-139: LGTM!

The new RK_Attr_LinalgIRAttr_QuantizationSpec enum value is correctly positioned between QuantizationAnnotation and Last, maintaining the RTTI hierarchy ordering required for classof range checks.

mllm/compile/ir/rtti_kind_gen.py (1)

219-219: LGTM!

The QuantizationSpec attribute derivation is correctly added alongside QuantizationAnnotation, ensuring the RTTI generator produces the corresponding enum value and macro in the generated files.

mllm/compile/ir/Node.cpp (2)

63-65: LGTM!

The implementation correctly aligns with the header changes, returning std::list<node_weak_ptr_t>& for both inputs() and outputs().

298-311: Good addition: clearing belongsTo on op removal.

The addition of op->belongsTo() = nullptr ensures proper cleanup of the parent relationship when an op is completely removed. This is intentional—removeOp is used for deletion with full cleanup, while removeOpWithoutEdgeCut is used for moving ops between regions where the parent reference will be updated by the subsequent insert function (e.g., insertOpAtLast, insertOpAtPos).

mllm/compile/ir/NodeRTTIClassOfImpl.hpp (1)

340-342: LGTM!

The new RTTI_RK_ATTR_LINALGIRATTR_QUANTIZATIONSPEC_IMPL macro follows the established pattern in this auto-generated file. While static analysis suggests using constexpr template functions instead of macros (which is generally good advice), this would require refactoring the entire RTTI generation system and is outside the scope of this change.

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (2)

11-18: LGTM!

The pass class follows the established pattern for IR passes in this codebase. Regarding the static analysis hint about the rule of five: since this class has no data members and is primarily used via shared_ptr through the factory function, the default-generated copy/move operations are safe.

17-17: No action needed: return types are already consistent across all passes.

Verification shows that all passes in the codebase—including MergeLLMHeadIntoMainGraphPass and SplitLLMGraphPass—consistently use uint8_t as the return type for the run() method. The SplitLLMGraphPass.hpp file at line 17 contains uint8_t run(const ir::node_ptr_t& op) override;, which is identical in style to the method in question. No inconsistency exists.

Likely an incorrect or invalid review comment.

mllm/compile/ir/Node.hpp (1)

125-127: Breaking API change is properly implemented.

The Region::inputs() and Region::outputs() methods return std::list<node_weak_ptr_t>& and all usages throughout the codebase have been updated to work with this type. Callers properly iterate over the lists, assign them directly, or explicitly cast to Val via ->cast_<ir::Val>() when the underlying value is needed. No incompatibilities remain.

examples/qwen3_qnn_aot/qnn_aot_cfg.json (1)

18-46: LGTM!

The new builtin_llm_pass structure is well-organized with explicit configuration for model type, lm_head, linear, and kv_cache quantization settings. The per-field descriptors with method, symmetry, and precision provide clear and maintainable configuration.

mllm/compile/ir/linalg/Attribute.hpp (1)

65-77: LGTM!

The QuantizationSpecRaw struct follows the established pattern of other quantization spec types with proper UUID assignment and factory methods.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

293-311: Unused config variable except for model check.

The config variable is fetched at line 294 but only used at line 308 to check the model type. This is fine for now, but consider moving the config fetch inside the conditional or using it for other pattern configurations as well.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)

16-185: LGTM!

The pattern class declarations follow a consistent structure with isMatch, rewrite, and static create factory methods. The short parameter names (op, p) are idiomatic for this codebase and can be safely ignored per static analysis.

mllm/backends/qnn/aot/passes/AOTPipeline.cpp

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

mllm/compile/ir/linalg/Attribute.cpp

…pdate related patterns

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mllm/compile/ir/linalg/Attribute.cpp (1)

26-126: Extract duplicated lambda to eliminate 240+ lines of repetition.

The gen_quant_spec_str lambda is defined identically in both LinalgIRQuantizatonAnnotationAttr::dump() and LinalgIRQuantizatonSpecAttr::dump(), resulting in significant code duplication. Extract this logic to a static helper function or free function in an anonymous namespace.

🔎 Proposed refactoring to eliminate duplication

Add a helper function before the class method implementations:

+namespace {
+std::string gen_quant_spec_str(const QuantizationSpec::ptr_t& q) {
+  std::stringstream ss;
+  ss << "QuantSpec(";
+  switch (q->type) {
+    case QuantizationSpecType::kSymPerTensor: {
+      auto _q = std::static_pointer_cast<QuantizationSpecSymPerTensor>(q);
+      ss << "SymPerTensor(";
+      ss << "quant_min: " << _q->quant_min;
+      ss << ", quant_max: " << _q->quant_max;
+      ss << ", quant_to_type: " << nameOfType(_q->quant_to_type);
+      ss << ", scale_type: " << nameOfType(_q->scale_type);
+      ss << ")";
+      break;
+    }
+    // ... (include all other cases)
+    case QuantizationSpecType::kNone: {
+      ss << "None()";
+      break;
+    }
+    case QuantizationSpecType::kRaw: {
+      auto _q = std::static_pointer_cast<QuantizationSpecRaw>(q);
+      ss << "Raw(";
+      ss << "type: " << nameOfType(_q->type_);
+      ss << ")";
+      break;
+    }
+  }
+  ss << ", uuid=" << q->uuid << ")";
+  return ss.str();
+}
+}  // namespace

Then replace both lambda definitions with calls to the helper:

 void LinalgIRQuantizatonAnnotationAttr::dump(IRPrinter& p) {
-  auto gen_quant_spec_str = [](const QuantizationSpec::ptr_t& q) -> std::string {
-    // ... 100 lines of duplicated code
-  };
-
   p.print("QuantAnnotation");
   // ... rest of method

 void LinalgIRQuantizatonSpecAttr::dump(IRPrinter& p) {
-  auto gen_quant_spec_str = [](const QuantizationSpec::ptr_t& q) -> std::string {
-    // ... 100 lines of duplicated code
-  };
-
   p.print("{}", gen_quant_spec_str(spec_));

Also applies to: 167-267

♻️ Duplicate comments (2)

mllm/compile/ir/linalg/Attribute.cpp (1)
251-254: Critical: Missing closing parenthesis in kNone case.

The kNone case outputs "None(" but breaks without emitting the closing parenthesis, producing malformed output like "None(, uuid=...)". The first occurrence of this lambda (line 111) correctly outputs "None()", but this second occurrence still has the bug.
🔎 Required fix
       case QuantizationSpecType::kNone: {
-        ss << "None(";
+        ss << "None()";
         break;
       }
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

365-371: Assertion on rewrite() failure will crash the pass.

This was flagged in a past review. Using MLLM_RT_ASSERT(pattern.second->rewrite(...)) causes the pass to abort on a failed rewrite. Consider logging a warning and continuing to process other ops, or document that failing rewrites are considered fatal errors.

🧹 Nitpick comments (7)

mllm/compile/ir/linalg/Attribute.cpp (2)
144-150: Consider using size_t for the loop counter.

weight_idx is declared as int but is compared against annotation_.weights.size(), which returns size_t. While not causing a bug here, using size_t would avoid signed/unsigned comparison warnings and better express intent.
🔎 Suggested type adjustment
-  int weight_idx = 0;
+  size_t weight_idx = 0;
   for (const auto& [name, spec] : annotation_.weights) {
155-158: Unused ctx parameter in build methods.

All three build() methods accept an IRContext* ctx parameter but never use it. If this parameter is reserved for future functionality or API consistency, consider adding a comment or [[maybe_unused]] attribute to document the intent and suppress compiler warnings.
🔎 Example with attribute annotation
-LinalgIRQuantizatonAnnotationAttr::ptr_t LinalgIRQuantizatonAnnotationAttr::build(IRContext* ctx) {
+LinalgIRQuantizatonAnnotationAttr::ptr_t LinalgIRQuantizatonAnnotationAttr::build([[maybe_unused]] IRContext* ctx) {
   auto ret = std::make_shared<LinalgIRQuantizatonAnnotationAttr>();
   return ret;
 }
Also applies to: 272-281
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (3)
4-11: Missing standard library headers.

The header uses std::vector and std::string in the class declaration but doesn't include <vector> or <string>. While these might be transitively included through other headers, it's best practice to explicitly include what you use.
🔎 Proposed fix
 #pragma once

+#include <string>
 #include <unordered_map>
+#include <vector>

 #include "mllm/compile/ir/Node.hpp"
 #include "mllm/compile/passes/Pass.hpp"
194-207: Consider explicitly deleting copy and move operations.

The class defines a custom constructor and destructor but doesn't explicitly define or delete copy/move constructors and assignment operators. Since this pass holds registered patterns that likely shouldn't be duplicated, explicitly deleting these operations clarifies intent and prevents accidental copies.
🔎 Proposed fix
 class LLMQuantRecipePass final : public ir::Pass {
  public:
   LLMQuantRecipePass();
+  LLMQuantRecipePass(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass(LLMQuantRecipePass&&) = delete;
+  LLMQuantRecipePass& operator=(LLMQuantRecipePass&&) = delete;

   ~LLMQuantRecipePass() override = default;
14-18: Add documentation for the utility function.

Per coding guidelines, public APIs should have clear documentation. This function name is descriptive but quite long—a brief comment explaining when/why to use it and what conditions must hold would help maintainers.
🔎 Example documentation
 //===----------------------------------------------------------------------===//
 // Utility functions
 //===----------------------------------------------------------------------===//
+/// Propagates the quantization spec from a single-input op's input to its single output,
+/// and sets the op's quant_recipe annotation attribute. Returns false if preconditions
+/// (single input/output, input has quant_recipe) are not met.
 bool shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr(const ir::IRContext::ptr_t& ctx,
                                                                   const ir::linalg::LinalgIROp::ptr_t& op);
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
224-275: EmbeddingOp itself is not marked with quant_recipe.

Line 272 sets quant_recipe only on node->nextOp() (the CastTypeOp), but the original node (EmbeddingOp) is not marked. While isMatch checks nextOp()->getAttr("quant_recipe") to avoid re-processing, this asymmetry means inspecting the EmbeddingOp alone won't reveal it's been processed. Consider marking both ops for consistency and easier debugging.
🔎 Proposed fix
   // Attach to quantize node
   node->nextOp()->setAttr("quant_recipe", annotation_attr);
+  // Mark embedding op as processed (can reuse annotation or create a marker attribute)
+  node->setAttr("quant_recipe", annotation_attr);

   return true;
377-377: Inconsistent return value.

Line 348 returns ir::PASS_RET_SUCCESS but line 377 returns the literal 0. Use the named constant consistently for clarity and maintainability.
🔎 Proposed fix
-  return 0;
+  return ir::PASS_RET_SUCCESS;

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eff1182 and 4f94986.

📒 Files selected for processing (3)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/compile/ir/linalg/Attribute.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/compile/ir/linalg/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

🧬 Code graph analysis (2)

mllm/compile/ir/linalg/Attribute.cpp (2)

mllm/core/DataTypes.hpp (1)

nameOfType (713-713)

mllm/compile/ir/linalg/Attribute.hpp (9)

p (318-318)

p (337-337)

spec (99-104)

spec (129-134)

spec (159-164)

spec (191-196)

spec (225-230)

spec (258-262)

spec (293-297)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (2)

mllm/compile/ir/linalg/Attribute.hpp (11)

ctx (320-320)

ctx (339-339)

ctx (341-341)

node (27-27)

node (27-27)

node (322-322)

node (322-322)

node (343-343)

node (343-343)

p (318-318)

p (337-337)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)

op (17-17)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 14-14: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 15-15: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 18-18: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 25-25: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 37-37: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 51-51: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 65-65: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 79-79: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 93-93: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 107-107: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 119-119: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 131-131: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 145-145: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 159-159: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 171-171: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 185-185: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 194-194: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 202-202: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-x86
GitHub Check: build-macos
GitHub Check: build-android

🔇 Additional comments (2)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

176-204: LGTM - Past review issue addressed.

Both branches now consistently set view_op->setAttr("quant_recipe", annotation_attr), resolving the previously flagged asymmetry.

380-383: patterns_ map usage requires codebase verification.

The review claims the patterns_ map is unused and suggests removal. However, this requires confirming whether the map is accessed in other methods, derived classes, or external code that depends on this class. Before removal, verify all usages of patterns_ across the codebase.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

- Introduced new methods in QnnAOTNodeTensor for parsing tensor types, data types, and quantization parameters from IR. - Updated QnnAOTNodeTensor constructor to accept a flag for static weight handling. - Implemented QnnAOTGraph methods for adding operations and compiling the graph. - Enhanced capture methods in QnnAOTEnv for managing AOT graphs and tensors. - Refactored LLMQuantRecipePass to support recursive graph visiting and improved quantization pattern matching. - Updated Elewise visitor patterns to compile operations and register them in the AOT graph. - Removed obsolete methods and streamlined the interface for better clarity and maintainability.

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
102-104: Factory method doesn't pass force_static_weight to constructor.

QnnAOTNodeTensor::create() accepts force_static_weight but doesn't forward it to the constructor.
🔎 Proposed fix
 static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) {
-    return std::make_shared<QnnAOTNodeTensor>(v);
+    return std::make_shared<QnnAOTNodeTensor>(v, force_static_weight);
   }
This fix should be applied in QnnWrappersAPI.hpp at lines 102-104.

♻️ Duplicate comments (3)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (3)

264-265: Missing defensive config validation before nested access.

Accessing config["quant_recipe"]["builtin_llm_pass"]["linear"] without checking if these keys exist can throw an exception. This was flagged in a previous review.

520-538: Missing defensive config validation for Qwen3 pattern registration.

Line 535 accesses nested config keys without validation. This was flagged in a previous review.

78-81: TODO stub rewrite() returns true, silently consuming matched ops.

This was flagged in a previous review.

🧹 Nitpick comments (5)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
169-229: Tensor type parsing handles cases correctly but has a silent fallthrough.

The switch at line 173-174 for kTensorMemTypes_Start does nothing and falls through, leaving ret_qnn_tensor_type as QNN_TENSOR_TYPE_UNDEFINED. Consider adding a log or assertion if this is unexpected.
🔎 Optional: Add explicit handling
     case kTensorMemTypes_Start: {
+      MLLM_WARN("Unexpected kTensorMemTypes_Start in parseQnnTensorTypeFromIR");
       break;
     }
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)

127-131: Factory method initializes only op_name_, leaving other members uninitialized.

The create() method only sets op_name_. The static analysis correctly flags that name_, param_scalar, param_tensor, inputs, and outputs are not initialized. While vectors default-construct to empty, name_ will be empty which may be intentional. Consider adding a default constructor that initializes all members.

170-176: Public member variables in QnnAOTGraph.

The static analysis flags is_compiled_, op_node_, and all_tensors_ as public. While this may be intentional for convenience, consider using accessors for better encapsulation.

mllm/backends/qnn/aot/visitor/Base.hpp (1)

24-31: QnnAOTQuantRecipeBasePattern has no abstract methods.

The class only has default implementations returning false. Consider making it abstract or clarifying its purpose with a comment.
mllm/backends/qnn/aot/visitor/Elewise.cpp (1)
17-46: Implementation correctly uses new capture API.

The compile method properly:

Retrieves environment context

Validates required attributes

Captures input/output tensors

Creates and registers the QNN operation

However, consider adding null checks for the cast results on lines 33-35.
🔎 Optional: Add null checks for casts
   // Start to attach!
   auto i_0 = op->inputs().front()->cast_<ir::tensor::TensorValue>();
   auto i_1 = (*(std::next(op->inputs().begin())))->cast_<ir::tensor::TensorValue>();
   auto o_0 = op->outputs().front()->cast_<ir::tensor::TensorValue>();
+  MLLM_RETURN_FALSE_IF_NOT(i_0 && i_1 && o_0);

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f94986 and 3ae3c54.

📒 Files selected for processing (8)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/visitor/Base.hpp
mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/visitor/Elewise.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/visitor/Elewise.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

🧠 Learnings (2)

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

🧬 Code graph analysis (5)

mllm/backends/qnn/aot/visitor/Base.hpp (3)

mllm/backends/qnn/aot/visitor/Elewise.hpp (2)

writer (16-16)

op (14-14)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)

op (17-17)

mllm/backends/qnn/aot/visitor/Elewise.cpp (4)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (16)

isMatch (73-76)

isMatch (73-73)

isMatch (86-92)

isMatch (86-86)

isMatch (119-122)

isMatch (119-119)

isMatch (132-135)

isMatch (132-132)

isMatch (169-172)

isMatch (169-169)

isMatch (182-185)

isMatch (182-182)

isMatch (195-198)

isMatch (195-195)

isMatch (208-211)

isMatch (208-208)

mllm/backends/qnn/aot/visitor/Elewise.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (5)

qnn_context_name (268-268)

qnn_context_name (270-271)

qnn_context_name (273-274)

create (47-49)

create (47-47)

mllm/backends/qnn/aot/visitor/Elewise.hpp (4)

mllm/backends/qnn/aot/visitor/Base.hpp (9)

writer (19-19)

writer (19-19)

writer (21-21)

writer (30-30)

writer (30-30)

op (17-17)

op (17-17)

op (28-28)

op (28-28)

mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)

op (17-17)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (17)

QnnAOTNodeTensor (106-106)

v (102-104)

v (102-102)

v (109-109)

v (111-111)

v (113-113)

v (115-115)

name (151-151)

name (253-253)

name (255-255)

name (257-257)

qnn_op (170-170)

qnn_context_name (268-268)

qnn_context_name (270-271)

qnn_context_name (273-274)

create (47-49)

create (47-47)

mllm/compile/ir/Node.hpp (8)

name (302-302)

name (306-306)

op (468-468)

op (470-470)

create (265-300)

create (265-265)

create (369-401)

create (369-369)

mllm/backends/qnn/QNNUtils.hpp (4)

name (191-192)

name (195-196)

create (255-257)

create (255-255)

mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)

env (26-26)

env (26-26)

mllm/backends/qnn/aot/visitor/Elewise.cpp (2)

compile (17-46)

compile (17-17)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)

mllm/backends/qnn/QNNUtils.hpp (2)

name (191-192)

name (195-196)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/visitor/Base.hpp

[error] 21-21: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Elewise.hpp

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 22-22: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

[error] 24-24: variable name 'rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 49-49: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 50-50: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 106-106: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 123-123: constructor does not initialize these fields: name_, op_name_, param_scalar, param_tensor, inputs, outputs

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 174-174: member variable 'is_compiled_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 175-175: member variable 'op_node_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 176-176: member variable 'all_tensors_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 271-271: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-android
GitHub Check: build-x86
GitHub Check: build-macos

🔇 Additional comments (10)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)

179-180: Establishes consistent quantization handling for KV cache states.

Adding the de-quantization step for value_states aligns its precision handling with key_states (lines 172-173), ensuring both KV components undergo the documented "de-quantization and quantization again" pattern before concatenation. This quantization recipe handling is intentional for QNN AOT compatibility.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

21-42: LGTM for recursive graph visiting logic.

The recursive traversal correctly walks through subgraphs via CallGraphOp and applies pattern matching. The guard for existing quant_recipe prevents reprocessing.

49-68: Utility function logic is correct.

The helper properly validates preconditions, creates annotation attributes, and propagates quant specs from input to output.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (3)

772-808: Tensor capture and caching logic looks correct.

The method properly caches static tensors at context level and normal tensors at graph level, avoiding duplicate creation.

141-167: Constructor properly initializes QNN tensor structure.

The v2 tensor initialization is comprehensive and handles the static weight case correctly.

231-321: Data type mapping is comprehensive.

The switch covers all expected data types with appropriate QNN mappings. The FIXME at line 278-279 is noted but acceptable for initial implementation.

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)

267-276: New capture API methods are well-designed.

The new captureAOTGraph, captureAOTNodeOp, and captureQnnAOTNodeTensor methods provide a clean interface for building QNN graphs programmatically.

mllm/backends/qnn/aot/visitor/Elewise.hpp (1)

12-21: Interface update aligns with base class change.

The QnnAOTAddPattern now correctly implements the compile method from the updated QnnAOTBasePattern interface.

mllm/backends/qnn/aot/visitor/Elewise.cpp (1)

13-15: Match condition includes using_qnn attribute check.

The pattern only matches AddOps that have been marked for QNN processing, preventing unintended matches.

mllm/backends/qnn/aot/visitor/Base.hpp (1)

13-22: Interface change from addNode to compile is properly implemented across all derived classes.

The new compile(ir::IRWriter& writer, const ir::op_ptr_t& op) signature provides a cleaner interface that aligns with the writer-based compilation approach. All derived classes (e.g., QnnAOTAddPattern in Elewise.hpp) have been updated to implement this method, and there are no remaining references to the old interface.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

coderabbitai · 2025-12-25T13:47:27Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

+Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR(const ir::tensor::TensorValue::ptr_t& v) {
+  Qnn_QuantizeParams_t ret = QNN_QUANTIZE_PARAMS_INIT;
+
+  MLLM_RT_ASSERT(v->getAttr("quant_recipe"));
+  auto quant_spec = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
+
+  switch (quant_spec->type) {
+    case ir::linalg::QuantizationSpecType::kRaw: {
+      break;
+    }
+    case ir::linalg::QuantizationSpecType::kSymPerTensor: {
+      auto cfg = std::static_pointer_cast<ir::linalg::QuantizationSpecSymPerTensor>(quant_spec);
+      ret.encodingDefinition = QNN_DEFINITION_DEFINED;
+      ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_SCALE_OFFSET;
+      ret.scaleOffsetEncoding = Qnn_ScaleOffset_t{.scale = cfg->scale.item<float>(), .offset = 0};
+      break;
+    }
+    case ir::linalg::QuantizationSpecType::kSymPerChannel: {
+      auto cfg = std::static_pointer_cast<ir::linalg::QuantizationSpecSymPerChannel>(quant_spec);
+      ret.encodingDefinition = QNN_DEFINITION_DEFINED;
+      ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_AXIS_SCALE_OFFSET;
+      ret.axisScaleOffsetEncoding = Qnn_AxisScaleOffset_t{
+          .axis = cfg->ch_axis,
+          .numScaleOffsets = (uint32_t)v->tensor_.shape()[cfg->ch_axis],
+          .scaleOffset = nullptr,  // TODO make this correct.
+      };
+      break;
+    }
+    case ir::linalg::QuantizationSpecType::kSymPerBlock:
+    case ir::linalg::QuantizationSpecType::kAsymPerTensor:
+    case ir::linalg::QuantizationSpecType::kAsymPerChannel:
+    case ir::linalg::QuantizationSpecType::kAsymPerBlock: {
+      MLLM_ERROR_EXIT(ExitCode::kCoreError, "Can't handle [kSymPerBlock, kAsymPerTensor, kAsymPerChannel, kAsymPerBlock] type");
+    }
+    case ir::linalg::QuantizationSpecType::kLPBQ: {
+      ret.encodingDefinition = QNN_DEFINITION_DEFINED;
+      ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_BLOCKWISE_EXPANSION;
+      ret.blockwiseExpansion = nullptr;  // TODO
+      break;
+    }
+    default: {
+      MLLM_ERROR_EXIT(ExitCode::kCoreError, "Can't handle kNone type");
+    }
+  }
+
+  return ret;
 }


⚠️ Potential issue | 🟠 Major

parseQnnQuantizeParamFromIR asserts quant_recipe exists but calling code may not guarantee this.

Line 328 uses MLLM_RT_ASSERT which will crash if quant_recipe is missing. Consider returning a default or handling gracefully.

🔎 Proposed fix: return default params if no quant_recipe

Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR(const ir::tensor::TensorValue::ptr_t& v) { Qnn_QuantizeParams_t ret = QNN_QUANTIZE_PARAMS_INIT; - MLLM_RT_ASSERT(v->getAttr("quant_recipe")); + if (!v->getAttr("quant_recipe")) { + return ret; // Return default params if no quant_recipe + } auto quant_spec = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 325-371, the code unconditionally asserts that v->getAttr("quant_recipe") exists which can crash when callers don't provide it; instead check if v->getAttr("quant_recipe") is null and if so return the default-initialized Qnn_QuantizeParams_t (ret) immediately (optionally emit a debug/info log), otherwise cast and proceed as existing; replace the MLLM_RT_ASSERT with a null-check and early return to handle missing quant_recipe gracefully.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

coderabbitai · 2025-12-25T13:47:27Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

+bool QnnAOTGraph::compile() {
+  if (is_compiled_) { return true; }
+  // TODO
+
+  is_compiled_ = true;
+  return true;
+}


⚠️ Potential issue | 🟡 Minor

compile() is a stub that only sets is_compiled_ to true.

The method doesn't perform actual compilation. This may cause issues if code expects compiled graphs.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

coderabbitai · 2025-12-25T13:47:27Z

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

+  static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) {
+    return std::make_shared<QnnAOTNodeTensor>(v);
+  }


⚠️ Potential issue | 🔴 Critical

Factory method doesn't forward force_static_weight to constructor.

The create() method accepts force_static_weight but ignores it when calling the constructor.

🔎 Proposed fix

static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) { - return std::make_shared<QnnAOTNodeTensor>(v); + return std::make_shared<QnnAOTNodeTensor>(v, force_static_weight); }

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.hpp around lines 102 to 104, the static factory method accepts force_static_weight but does not pass it to the QnnAOTNodeTensor constructor; update the create() implementation to forward the force_static_weight argument to the constructor (or call the correct constructor overload that accepts it) so the created QnnAOTNodeTensor is constructed with the intended static-weight behavior.

…uantRecipePass

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (8)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
535-537: Add defensive validation for config access to prevent runtime exceptions.

Line 535 directly accesses config["quant_recipe"]["builtin_llm_pass"]["model"] without validating key existence. The nlohmann::json library throws exceptions when keys are missing. Other code in this codebase (AOTPipeline.cpp:17-18, MarkTensorIO.cpp:22) defensively uses .contains() before access.
🔎 Recommended defensive check
-  if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
+  if (config.contains("quant_recipe") && 
+      config["quant_recipe"].contains("builtin_llm_pass") &&
+      config["quant_recipe"]["builtin_llm_pass"].contains("model") &&
+      config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
     addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100);
   }
264-264: Add defensive validation for nested config access.

Line 264 directly accesses config["quant_recipe"]["builtin_llm_pass"]["linear"] without validating key existence. The nlohmann::json library throws exceptions for missing keys. Other code in the codebase (AOTPipeline.cpp, MarkTensorIO.cpp) uses .contains() to defensively check keys before access.
🔎 Recommended defensive check
+  if (!config.contains("quant_recipe") || 
+      !config["quant_recipe"].contains("builtin_llm_pass") ||
+      !config["quant_recipe"]["builtin_llm_pass"].contains("linear")) {
+    MLLM_WARN("Missing required config keys for linear quantization");
+    return false;
+  }
   auto config = AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"]["linear"];
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)
141-168: Direct access to quant_recipe attribute without null check.

Line 144 unconditionally accesses v->getAttr("quant_recipe") and casts it without verifying the attribute exists. If quant_recipe is missing, this will crash. Additionally, line 154 calls parseQnnQuantizeParamFromIR(v) which also asserts on quant_recipe (line 329), creating redundant checks. Consider consolidating the attribute access and adding a null check at the beginning of the constructor.
🔎 Proposed fix: Check for quant_recipe once and handle gracefully
 QnnAOTNodeTensor::QnnAOTNodeTensor(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight) {
   name_ = v->name();
   mllm_tensor_ = v->tensor_;
-  quant_spec_ = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
+  auto quant_attr = v->getAttr("quant_recipe");
+  if (quant_attr) {
+    quant_spec_ = quant_attr->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
+  }
   for (auto s : v->tensor_.shape()) { shape_.emplace_back(s); }
326-406: parseQnnQuantizeParamFromIR asserts quant_recipe exists but calling code may not guarantee this.

Line 329 uses MLLM_RT_ASSERT which will crash if quant_recipe is missing. This is the same attribute accessed in the constructor (line 144). Consider returning a default Qnn_QuantizeParams_t if the attribute is missing, or checking at a higher level before construction.
🔎 Proposed fix: Return default params if no quant_recipe
 Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR(const ir::tensor::TensorValue::ptr_t& v) {
   Qnn_QuantizeParams_t ret = QNN_QUANTIZE_PARAMS_INIT;
 
-  MLLM_RT_ASSERT(v->getAttr("quant_recipe"));
+  if (!v->getAttr("quant_recipe")) {
+    return ret;  // Return default params if no quant_recipe
+  }
   auto quant_spec = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
466-486: addOperation has multiple TODOs and doesn't actually add the op to QNN.

The method constructs qnn_op_config but never calls the QNN interface to register the operation with the backend. Lines 477-483 contain TODOs for params, inputs, outputs, validations, and graph registration. This leaves operations in an incomplete state.

Would you like me to help implement the operation registration logic or open an issue to track this?

488-494: compile() is a stub that only sets is_compiled_ to true.

The method doesn't perform actual graph compilation or validation. This may cause issues if downstream code expects a fully compiled graph.

795-798: captureAOTGraph always returns nullptr.

The stub implementation returns nullptr, which will cause null pointer dereferences in callers expecting a valid QnnAOTGraph::ptr_t.
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
103-105: Factory method doesn't forward force_static_weight to constructor.

The create() method accepts force_static_weight but ignores it when calling the constructor (line 104). This means the parameter has no effect, breaking the intended static weight functionality.
🔎 Proposed fix
   static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) {
-    return std::make_shared<QnnAOTNodeTensor>(v);
+    return std::make_shared<QnnAOTNodeTensor>(v, force_static_weight);
   }

🧹 Nitpick comments (4)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
21-23: Name the anonymous function parameter for clarity.

Line 22's patterns_w_priority_ parameter is unnamed in the function signature, which reduces readability and violates the coding guidelines requiring all parameters to be named.
🔎 Proposed fix
-void recursiveVisitGraph(const ir::IRContext::ptr_t& ctx,
-                         const std::vector<std::pair<int, ir::Pattern::ptr_t>>& patterns_w_priority_,
-                         const ir::graph::SubGraphOp::ptr_t& sub_graph_ir) {
+void recursiveVisitGraph(const ir::IRContext::ptr_t& ctx,
+                         const std::vector<std::pair<int, ir::Pattern::ptr_t>>& patterns_with_priority,
+                         const ir::graph::SubGraphOp::ptr_t& sub_graph_ir) {
Then update references inside the function accordingly.
As per coding guidelines, Clang requires all parameters to be named.

284-297: Consider caching compiled regex patterns to avoid repeated compilation.

The loop compiles a regex for each config key on every Linear operation processed (line 288). For models with many Linear ops, this repeated compilation could impact performance.

🔎 Performance optimization

Consider caching compiled patterns in a static map initialized once, or compile patterns during pass construction rather than per-op processing.
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
807-843: Consider defensive checks for static weight determination.

Lines 814-816 determine __qnn_enable_static_weight based on memory type checks. The logic appears correct, but the compound condition (v->tensor_.memType() <= kParams_End && v->tensor_.memType() >= kParams_Start) assumes the enum ordering is contiguous. If the enum definition changes or additional memory types are inserted, this could break.
🔎 More robust static weight check
   // Check if this value want static qnn weight. The static qnn weight will be shared through one context in diff graphs!
-  if (v->tensor_.memType() == kGlobal || (v->tensor_.memType() <= kParams_End && v->tensor_.memType() >= kParams_Start)) {
+  bool is_param_type = (v->tensor_.memType() == kParamsMMAP || 
+                        v->tensor_.memType() == kParamsNormal);
+  if (v->tensor_.memType() == kGlobal || is_param_type) {
     __qnn_enable_static_weight = true;
   }
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)

162-162: Consider encapsulating public data members.

Static analysis flags public members (name_ on line 162, is_compiled_, op_node_, all_tensors_ on lines 182-184) as violating encapsulation. While the current struct-like design may be intentional for AOT tooling, exposing internal state increases coupling and makes it harder to enforce invariants. If the codebase evolves toward stricter encapsulation, consider adding accessor methods.

Based on coding guidelines recommending adherence to best practices like the Google C++ Style Guide.

Also applies to: 182-184

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ae3c54 and 90fab85.

📒 Files selected for processing (3)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp

🧠 Learnings (3)

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 22-22: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

[error] 24-24: variable name 'rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 49-49: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 50-50: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 107-107: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 128-128: constructor does not initialize these fields: name_, op_name_, param_scalar, param_tensor, inputs, outputs, unreachable_handle_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 174-174: constructor does not initialize these fields: op_node_, all_tensors_, graph_name_, belongs_context_name_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 182-182: member variable 'is_compiled_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 183-183: member variable 'op_node_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 184-184: member variable 'all_tensors_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 279-279: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-macos
GitHub Check: build-android
GitHub Check: build-x86

🔇 Additional comments (6)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
28-33: Verify that aborting on rewrite failure is the intended behavior.

Line 30 uses MLLM_RT_ASSERT(pattern.second->rewrite(iw, some_op)), which will abort execution if any pattern's rewrite returns false. Given that several patterns have TODO stubs that may return true/false inconsistently, this could lead to unexpected crashes during development or runtime.

If rewrite failures should be non-fatal, consider logging a warning and continuing:
🔎 Alternative approach
           if (pattern.second->isMatch(some_op)) {
-            MLLM_RT_ASSERT(pattern.second->rewrite(iw, some_op));
+            if (!pattern.second->rewrite(iw, some_op)) {
+              MLLM_WARN("Pattern rewrite failed for op, continuing with next pattern");
+            } else {
-            break;
+              break;
+            }
           }
312-312: Handle asymmetric quantization if sym requirement is relaxed.

Line 312 returns false when sym is false: MLLM_RETURN_FALSE_IF_NOT(sym). If asymmetric quantization support is planned, this will require additional implementation.

Is asymmetric quantization planned for future releases? If so, consider adding a TODO comment indicating the enhancement, or document this limitation.
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

170-230: LGTM!

The tensor type parsing logic is comprehensive and correctly maps IR memory types to QNN tensor types. The attribute-based override mechanism (lines 225-227) appropriately prioritizes explicit graph input/output attributes over inferred types.

454-459: LGTM!

Simple getter/setter methods for operation naming. The fluent interface pattern (returning this from setName) is consistent with other methods in the class.

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)

118-126: LGTM!

The refactored internal representation improves efficiency by caching frequently accessed fields (name_, mllm_tensor_, shape_) rather than repeatedly querying the IR. The unreachable_handle_ vector for memory management is a practical approach to ensure cleanup of dynamically allocated QNN structures.

132-136: LGTM!

The factory method for QnnAOTNodeOperation and the name accessors follow consistent patterns. The fluent interface (returning ptr_t from setName) enables method chaining, which is used throughout the codebase.

Also applies to: 156-159

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

coderabbitai · 2025-12-25T16:05:09Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

+    // FIXME: Maybe error here.
+    case kInt4: {
+      ret = QNN_DATATYPE_SFIXED_POINT_4;
+      break;
+    }
+    case kUInt4: {
+      ret = QNN_DATATYPE_UFIXED_POINT_4;
+      break;


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n mllm/backends/qnn/aot/QnnWrappersAPI.cpp | head -330 | tail -100

Repository: UbiquitousLearning/mllm

Length of output: 3274

🏁 Script executed:

rg "kInt4|kUInt4" --type cpp --type h -B 2 -A 2

Repository: UbiquitousLearning/mllm

Length of output: 1338

🏁 Script executed:

web_search QNN QNN_DATATYPE_SFIXED_POINT_4 UFIXED_POINT_4 Qualcomm neural network fixed point 4-bit datatype

Repository: UbiquitousLearning/mllm

Length of output: 159

🌐 Web query:

QNN Qualcomm SFIXED_POINT_4 UFIXED_POINT_4 datatype specification

💡 Result:

Short answer: In Qualcomm QNN, QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 are 4‑bit signed/unsigned fixed‑point (quantized) tensor formats (two 4‑bit values are packed into one byte when packing is enabled). Representations and semantics:

SFIXED_POINT_4: 4‑bit signed two’s‑complement integer (range −8..+7) stored as fixed‑point quantized values (dequantize with per‑tensor / per‑channel scale ± zero‑point as provided by the model).

UFIXED_POINT_4: 4‑bit unsigned integer (range 0..15) stored as fixed‑point quantized values (dequantize with scale/zero‑point).

Packed storage: when 4‑bit packing is used, two 4‑bit elements are stored per byte (low nibble = first value, high nibble = second). When not packed the values may be stored using 8‑bit containers (i.e., as S/UFIXED_POINT_8). QNN toolchains provide flags such as --pack_4_bit_weights to enable packing.
Sources: Qualcomm QNN toolchain notes and examples (discussion of QNN_DATATYPE_SFIXED_POINT_4 / UFIXED_POINT_4 and 4‑bit packing). [1][2]

Sources
[1] Qualcomm AI Hub / QAIRT job docs (shows QNN support for INT4/4‑bit quantization options).
[2] Implementation notes / community writeup describing 4‑bit packing and QNN_DATATYPE_SFIXED_POINT_4 / QNN_DATATYPE_UFIXED_POINT_4 (packing two 4‑bit values per byte and relation to 8‑bit storage).

Remove the unnecessary FIXME comment on line 279.

The mapping of kInt4/kUInt4 to QNN_DATATYPE_SFIXED_POINT_4/QNN_DATATYPE_UFIXED_POINT_4 is correct per QNN documentation (4-bit signed and unsigned fixed-point formats respectively). The types are actively used in quantization recipes (e.g., "w4a16" in LLMQuantRecipePass.cpp), and the mapping follows the same pattern as the 8-bit and 16-bit quantized variants. The FIXME comment can be safely removed.

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 279 to 286, remove the unnecessary "// FIXME: Maybe error." comment preceding the kInt4/kUInt4 cases; the mapping to QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 is correct so simply delete that FIXME line and keep the case branches as-is without altering logic or indentation.

- Introduced `VectorInt16Attr` for handling vectors of int16_t attributes. - Added new 4-bit integer types: `mllm_int4_t` and `mllm_uint4_t` with corresponding type info. - Enhanced `Tensor` class with new methods for element-wise addition, subtraction, and multiplication that accept a `DataTypes` parameter. - Updated element-wise operation handling to support `VectorInt16Attr` for int16 tensor constants. - Ensured UUID generation for quantization specifications in `QuantizationSpecAsymPerBlock` and `QuantizationSpecLPBQ`.

coderabbitai

Actionable comments posted: 7

♻️ Duplicate comments (2)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
795-798: captureAOTGraph returns nullptr—implement graph retrieval.

The captureAOTGraph method (lines 795-798) always returns nullptr. Callers expecting a valid QnnAOTGraph::ptr_t will encounter null pointer errors. Implement the method to look up the graph from contexts_[qnn_context_name]->graphs_[g_name] and return it, or document why this stub is acceptable.
🔎 Proposed fix
 QnnAOTGraph::ptr_t QnnAOTEnv::captureAOTGraph(const std::string& qnn_context_name, const std::string& g_name) {
-  // TODO
-  return nullptr;
+  MLLM_RT_ASSERT_EQ(contexts_.count(qnn_context_name), 1);
+  MLLM_RT_ASSERT_EQ(contexts_[qnn_context_name]->graphs_.count(g_name), 1);
+  return contexts_[qnn_context_name]->graphs_[g_name];
 }
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
865-867: Add defensive validation for nested config key access to prevent runtime exceptions.

Line 865 directly accesses nested config keys without validation: config["quant_recipe"]["builtin_llm_pass"]["model"]. The nlohmann::json library throws exceptions when accessing missing keys via operator[]. Other code in the codebase (e.g., AOTPipeline.cpp, MarkTensorIO.cpp) defensively checks keys with .contains() before access. Apply the same pattern here:
-  if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
+  if (config.contains("quant_recipe") 
+      && config["quant_recipe"].contains("builtin_llm_pass")
+      && config["quant_recipe"]["builtin_llm_pass"].contains("model")
+      && config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
     addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100);
   }

🧹 Nitpick comments (7)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
837-837: Complete the Qwen3 Attention pattern implementation or document deferral.

Line 837's TODO comment indicates incomplete implementation. The pattern successfully identifies Q/K/V/O linear operations (lines 814-835) but performs no transformation. This creates a sophisticated stub that matches complex patterns but applies no quantization annotations, which may confuse maintainers.

Do you want me to open an issue to track completing this implementation, or should a comment be added explaining why transformation is deferred?

69-71: Initialize local pointer variables to nullptr for safety.

Lines 69 and 71 declare pointer variables without initialization. While they're assigned in all switch branches before use, defensive initialization to nullptr prevents undefined behavior if the switch logic changes or an unexpected case is added.
🔎 Proposed fix
-  ir::linalg::LinalgIRQuantizatonSpecAttr::ptr_t ret = nullptr;
+  ir::linalg::LinalgIRQuantizatonSpecAttr::ptr_t ret = nullptr;  // Already initialized
 
-  ir::linalg::QuantizationSpec::ptr_t spec = nullptr;
+  ir::linalg::QuantizationSpec::ptr_t spec = nullptr;  // Already initialized
Note: These are already initialized correctly. The static analyzer warning can be safely ignored.
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
272-285: Add copy/move constructor/assignment operators or document the rule-of-zero.

The LLMQuantRecipePass class (line 272) defines a default destructor but lacks copy constructor, copy assignment operator, move constructor, and move assignment operator. Since the class manages STL containers with shared_ptr members, consider explicitly deleting copy operations and defaulting move operations to make ownership semantics clear.
🔎 Proposed fix
 class LLMQuantRecipePass final : public ir::Pass {
  public:
   LLMQuantRecipePass();
 
+  // Explicitly delete copy operations
+  LLMQuantRecipePass(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete;
+
+  // Explicitly default move operations
+  LLMQuantRecipePass(LLMQuantRecipePass&&) = default;
+  LLMQuantRecipePass& operator=(LLMQuantRecipePass&&) = default;
+
   ~LLMQuantRecipePass() override = default;
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)
194-195: Extract magic number to a named constant.

Line 194 uses the magic number -20 for the masking offset. This value appears to be a minimum attention score for masked positions. Extract it to a named constant to improve readability and maintainability.
🔎 Proposed fix
+  // Attention mask offset for masked positions
+  static constexpr float kMaskedAttentionOffset = -20.0f;
+
   // Masked Softmax
   auto attn_min = attn.min(-1, true);
-  float minus_value = -20;
-  attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min.add(minus_value, kInt16));
+  attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min.add(kMaskedAttentionOffset, kInt16));
As per coding guidelines: Use named constants instead of magic numbers.
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

488-494: Implement graph compilation or add tracking comment.

The compile() method (lines 488-494) only sets is_compiled_ flag without performing actual compilation. The TODO comment suggests this is incomplete. If compilation is deferred intentionally, document the expected behavior; otherwise, implement the QNN graph compilation via the backend interface.

Additional context

Typical QNN graph compilation involves calling qnn_interface_.graphFinalize() to optimize and prepare the graph for execution. If this is intentionally deferred to a later pass, add a detailed comment explaining the design decision.

mllm/core/Tensor.hpp (1)

289-291: LGTM - new scalar arithmetic methods with explicit DataTypes parameter.

The new add, sub, and mul methods provide useful control over the data type of the scalar RHS operand, complementing the existing operator+, operator-, operator* methods.

Consider adding a div(float rhs, DataTypes data_type = kFloat32) method for symmetry with operator/(float) if this use case is anticipated.
mllm/core/Tensor.cpp (1)
219-226: Consider adding kInt16PerTensorSym support for consistency.

The sub() method's type switch omits kInt16PerTensorSym, while both add() (line 182) and mul() (line 268) include it. This matches the existing inconsistency in operator- (line 198-205), but for consistency and completeness, consider adding this case unless it's intentionally excluded.
🔎 Proposed addition
     switch (data_type) {
       case kFloat32: *(rhs_tensor.ptr<float>()) = rhs; break;
       case kFloat16: *(rhs_tensor.ptr<half_float::half>()) = half_float::half(rhs); break;
       case kInt32: *(rhs_tensor.ptr<int32_t>()) = rhs; break;
       case kInt16: *(rhs_tensor.ptr<int16_t>()) = rhs; break;
       case kInt8: *(rhs_tensor.ptr<int8_t>()) = rhs; break;
+      case kInt16PerTensorSym: *(rhs_tensor.ptr<int16_t>()) = rhs; break;
       default: NYI("Type is not supported"); break;
     }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90fab85 and 426b7c1.

📒 Files selected for processing (17)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
examples/qwen3_qnn_aot/qwen3_qnn_aot_quant_recipe.mir
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/compile/ir/GeneratedRTTIKind.hpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
mllm/compile/ir/builtin/Attribute.cpp
mllm/compile/ir/builtin/Attribute.hpp
mllm/compile/ir/linalg/Attribute.hpp
mllm/compile/ir/rtti_kind_gen.py
mllm/core/DataTypes.cpp
mllm/core/DataTypes.hpp
mllm/core/Tensor.cpp
mllm/core/Tensor.hpp
mllm/core/aops/ElewiseOps.cpp

🚧 Files skipped from review as they are similar to previous changes (2)

mllm/compile/ir/rtti_kind_gen.py
mllm/compile/ir/GeneratedRTTIKind.hpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/compile/ir/builtin/Attribute.hpp
mllm/compile/ir/linalg/Attribute.hpp
mllm/core/DataTypes.hpp
mllm/core/aops/ElewiseOps.cpp
mllm/core/DataTypes.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/core/Tensor.hpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
mllm/core/Tensor.cpp
mllm/compile/ir/builtin/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/compile/ir/builtin/Attribute.hpp
mllm/compile/ir/linalg/Attribute.hpp
mllm/core/DataTypes.hpp
mllm/core/aops/ElewiseOps.cpp
mllm/core/DataTypes.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/core/Tensor.hpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
mllm/core/Tensor.cpp
mllm/compile/ir/builtin/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/core/aops/ElewiseOps.cpp
mllm/core/DataTypes.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/core/Tensor.cpp
mllm/compile/ir/builtin/Attribute.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/compile/ir/builtin/Attribute.hpp
mllm/compile/ir/linalg/Attribute.hpp
mllm/core/DataTypes.hpp
mllm/core/aops/ElewiseOps.cpp
mllm/core/DataTypes.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/core/Tensor.hpp
mllm/compile/ir/NodeRTTIClassOfImpl.hpp
mllm/core/Tensor.cpp
mllm/compile/ir/builtin/Attribute.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

🧠 Learnings (3)

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

🧬 Code graph analysis (7)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (2)

mllm/nn/Functional.hpp (1)

where (161-161)

mllm/nn/Functional.cpp (2)

where (203-206)

where (203-203)

mllm/compile/ir/builtin/Attribute.hpp (1)

mllm/compile/ir/builtin/Attribute.cpp (14)

VectorInt16Attr (107-107)

VectorInt16Attr (109-109)

data (23-23)

data (23-23)

data (39-39)

data (39-39)

data (53-53)

data (53-53)

data (85-85)

data (85-85)

data (99-99)

data (99-99)

data (113-113)

data (113-113)

mllm/compile/ir/linalg/Attribute.hpp (1)

mllm/compile/ir/linalg/Attribute.cpp (3)

LinalgIRQuantizatonSpecAttr (160-160)

LinalgIRQuantizatonSpecAttr (162-162)

LinalgIRQuantizatonSpecAttr (164-164)

mllm/core/Tensor.hpp (1)

mllm/core/Tensor.cpp (16)

rhs (126-128)

rhs (126-126)

rhs (130-132)

rhs (130-130)

rhs (134-136)

rhs (134-134)

rhs (138-140)

rhs (138-138)

rhs (148-168)

rhs (148-148)

rhs (192-211)

rhs (192-192)

rhs (234-254)

rhs (234-234)

rhs (278-297)

rhs (278-278)

mllm/core/Tensor.cpp (4)

mllm/core/Tensor.hpp (17)

rhs (272-272)

rhs (273-273)

rhs (274-274)

rhs (275-275)

rhs (278-278)

rhs (283-283)

rhs (284-284)

rhs (285-285)

rhs (286-286)

rhs (289-289)

rhs (290-290)

rhs (291-291)

rhs (296-296)

rhs (298-298)

rhs (300-300)

rhs (302-302)

device (399-399)

mllm/engine/Context.hpp (1)

device (31-31)

mllm/core/TensorViewImpl.cpp (2)

device (10-10)

device (10-10)

mllm/engine/Context.cpp (2)

instance (14-17)

instance (14-14)

mllm/compile/ir/builtin/Attribute.cpp (1)

mllm/compile/ir/builtin/Attribute.hpp (15)

VectorInt16Attr (158-158)

VectorInt16Attr (160-160)

VectorInt16Attr (162-162)

p (42-42)

p (82-82)

p (100-100)

p (120-120)

p (142-142)

p (164-164)

data (40-40)

data (62-62)

data (80-80)

data (124-124)

data (146-146)

data (168-168)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

LLMQuantRecipePass (845-868)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 23-23: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

[error] 26-26: variable name 'rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 67-67: 2 adjacent parameters of 'genSimpleQuantizationSpecAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 69-69: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 71-71: variable 'spec' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 114-114: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 115-115: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 135-135: 2 adjacent parameters of 'noSharingSingleInAndSingleOutQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 135-135: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 145-145: variable 'quant_spec_i0' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 146-146: variable 'quant_spec_o0' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

[error] 24-24: 2 adjacent parameters of 'makeRotaryPosEmbedding' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 194-194: 20 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/compile/ir/builtin/Attribute.hpp

[error] 154-154: class 'VectorInt16Attr' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 164-164: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/compile/ir/linalg/Attribute.hpp

[error] 329-329: class 'LinalgIRQuantizatonSpecAttr' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 341-341: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp

[error] 340-340: function-like macro 'RTTI_RK_ATTR_LINALGIRATTR_QUANTIZATIONSPEC_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

[error] 369-369: function-like macro 'RTTI_RK_ATTR_BUILTINIRATTR_VECTORINT16ATTR_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 28-28: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 30-30: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 37-37: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 51-51: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 63-63: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 77-77: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 91-91: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 103-103: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 115-115: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 129-129: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 143-143: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 157-157: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 171-171: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 185-185: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 197-197: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 209-209: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 223-223: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 237-237: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 249-249: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 263-263: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 272-272: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 280-280: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-android
GitHub Check: build-macos
GitHub Check: build-x86

🔇 Additional comments (8)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp (1)

340-342: LGTM - new RTTI macros follow established patterns.

The new macros for QuantizationSpec and VectorInt16Attr are consistent with the existing RTTI infrastructure. Since this is auto-generated code, the static analysis hints about preferring constexpr template functions can be safely disregarded.

Also applies to: 369-371

mllm/compile/ir/builtin/Attribute.cpp (1)

107-119: LGTM - VectorInt16Attr implementation follows established pattern.

The implementation mirrors VectorFP32Attr consistently, including constructors, dump, data() accessor, and build() factory method.

mllm/compile/ir/builtin/Attribute.hpp (1)

154-174: LGTM - VectorInt16Attr declaration follows existing patterns.

The class mirrors VectorFP32Attr exactly. The static analysis warnings about Rule of Five and short parameter name p can be disregarded since:

The same pattern is used by VectorFP32Attr and other attribute classes - the shared_ptr semantics of DEFINE_SPECIFIC_IR_CLASS handle proper copying.

The parameter name p for IRPrinter is used consistently across all dump methods in this file.

mllm/core/aops/ElewiseOps.cpp (1)

42-54: LGTM - type-specific constant attribute creation.

The switch statement appropriately handles kFloat32 and kInt16 constant types with corresponding attribute classes, and logs a warning for unsupported types. This provides a clear path for extending support to additional types in the future.

mllm/core/DataTypes.hpp (1)

588-593: LGTM - type info registrations and macros for 4-bit types.

The MLLM_DEFINE_BASIC_TYPE_INFO, MLLM_DEFINE_SELF_TYPE_INFO, and type alias macros are correctly defined for the new 4-bit types, with appropriate min/max values (signed: -8 to 7, unsigned: 0 to 15).

Also applies to: 707-708, 771-772

mllm/compile/ir/linalg/Attribute.hpp (3)

65-77: LGTM - QuantizationSpecRaw follows established spec pattern.

The new QuantizationSpecRaw struct correctly inherits from QuantizationSpec, sets the type to kRaw, and generates a UUID. The factory methods (create(DataTypes) and create()) mirror the pattern used by other spec types.

247-247: LGTM - UUID generation added to QuantizationSpecAsymPerBlock and QuantizationSpecLPBQ.

Adding UUID generation to the create() methods ensures these specs have unique identifiers, consistent with other quantization spec types.

Also applies to: 262-262, 283-283, 299-299

329-348: LGTM - LinalgIRQuantizatonSpecAttr follows existing attribute patterns.

The new class mirrors LinalgIRQuantizatonAnnotationAttr in structure. The static analysis warnings about Rule of Five and short parameter name p can be disregarded per established codebase conventions (same pattern used by other IR attribute classes).

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

coderabbitai · 2025-12-29T06:50:23Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

+void QnnAOTGraph::addOperation(const QnnAOTNodeOperation::ptr_t& qnn_op) {
+  auto env = AOTCompileContext::getInstance().getEnv();
+  auto qnn_interface = env->getFuncSymbol().qnn_interface_;
+
+  Qnn_OpConfig_t qnn_op_config = QNN_OPCONFIG_INIT;
+  qnn_op_config.version = QNN_OPCONFIG_VERSION_1;
+  qnn_op_config.v1 = QNN_OPCONFIG_V1_INIT;
+  qnn_op_config.v1.name = qnn_op->name_.c_str();
+  qnn_op_config.v1.packageName = qnn_op->package_name_.c_str();
+  qnn_op_config.v1.typeName = qnn_op->op_name_.c_str();
+
+  // TODO PARAMs
+  // TODO Inputs
+  // TODO Outputs
+
+  // TODO node validations
+
+  // TODO add node to graph.
+
+  op_node_.insert({qnn_op->getName(), qnn_op});
+}


⚠️ Potential issue | 🟠 Major

Complete the QNN operation addition or document the stub.

The addOperation method (lines 466-486) constructs a Qnn_OpConfig_t and populates basic fields but has TODO comments for critical parts (lines 477-483): parameters, inputs, outputs, validation, and actually adding the node to the QNN graph. The method stores the operation locally but never registers it with the QNN backend, making this a non-functional stub.

Would you like me to help implement the QNN operation addition logic, or should this be tracked in a separate issue?

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 466-486, implement the TODOs: build the full Qnn_OpConfig_t by converting the qnn_op's parameters into QNN parameter entries, map its inputs and outputs into QNN tensor descriptors/array entries, and set any required versioned fields (sizes/counts) instead of leaving them empty; perform node-level validations (required fields present, compatible shapes/dtypes) and return/raise errors on failure; call the QNN backend API via qnn_interface (use the appropriate graph add/register function on the interface, pass the populated Qnn_OpConfig_t and capture the resulting node handle/ID), check the backend return status and log or propagate errors, and only then store the operation in op_node_ (or update the stored entry with the backend node handle). Ensure any temporary C strings or buffers referenced by Qnn_OpConfig_t remain valid for the call (allocate/copy if necessary) and free them on error/cleanup.

coderabbitai · 2025-12-29T06:50:23Z

mllm/core/DataTypes.cpp

+    CASE(kInt4)
+    CASE(kUInt4)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if lanesOfType is called with kInt4/kUInt4 anywhere rg -n 'lanesOfType.*kInt4|lanesOfType.*kUInt4' --type=cpp

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Check generic usages of lanesOfType that might pass kInt4/kUInt4 rg -n 'lanesOfType\s*\(' --type=cpp -B 2 -A 3

Repository: UbiquitousLearning/mllm

Length of output: 13068

🏁 Script executed:

# Examine the actual lanesOfType function implementation and surrounding context fd DataTypes.cpp --type f | xargs cat -n | sed -n '1,200p'

Repository: UbiquitousLearning/mllm

Length of output: 5688

🏁 Script executed:

# Check if kInt4/kUInt4 are used elsewhere in the codebase in ways that might call lanesOfType rg -n 'kInt4|kUInt4' --type=cpp -B 1 -A 1 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 1941

Missing kInt4/kUInt4 cases in lanesOfType function.

The new 4-bit types are added to bytesOfType and nameOfType, but lanesOfType (lines 9–56) lacks corresponding CASE entries. Since these types have MllmDataTypeInfo specializations defined (DataTypes.hpp lines 707–708), calling lanesOfType(kInt4) or lanesOfType(kUInt4) will trigger the NYI("Unknown data type") path instead of returning valid lane counts.

Add CASE(kInt4) and CASE(kUInt4) to lanesOfType to maintain consistency and prevent runtime failures.

🤖 Prompt for AI Agents

In mllm/core/DataTypes.cpp around lines 100–101, the lanesOfType function is missing CASE(kInt4) and CASE(kUInt4) entries; add CASE(kInt4) and CASE(kUInt4) to the switch so those enum values return the correct lane counts (use the same pattern as the other integer TYPE cases and reference the MllmDataTypeInfo specializations defined in DataTypes.hpp for the 4-bit types) to avoid falling through to NYI("Unknown data type").

mllm/core/DataTypes.hpp

coderabbitai · 2025-12-29T06:50:23Z

mllm/core/Tensor.cpp

+Tensor Tensor::add(float rhs, DataTypes data_type) {
+  auto rhs_tensor = Tensor::empty({1}, data_type, device()).alloc();
+  if (device() != kCPU) {
+    Context::instance().buildOpAndSubmitTask(
+        OpTypes::kFill, aops::FillOpOptions{.type = aops::FillOpTypes::kSpecific, .value = rhs}, {rhs_tensor});
+  } else {
+    switch (data_type) {
+      case kFloat32: *(rhs_tensor.ptr<float>()) = rhs; break;
+      case kFloat16: *(rhs_tensor.ptr<half_float::half>()) = half_float::half(rhs); break;
+      case kInt32: *(rhs_tensor.ptr<int32_t>()) = rhs; break;
+      case kInt16: *(rhs_tensor.ptr<int16_t>()) = rhs; break;
+      case kInt8: *(rhs_tensor.ptr<int8_t>()) = rhs; break;
+      case kInt16PerTensorSym: *(rhs_tensor.ptr<int16_t>()) = rhs; break;
+      default: NYI("Type is not supported"); break;
+    }
+  }
+  auto opts = aops::AddOpOptions{};
+  opts.setInputsConstant(0, 0);
+  opts.setInputsConstant(1, 1);
+  return Context::instance().buildOpAndSubmitTask(OpTypes::kAdd, opts, {*this, rhs_tensor})[0];
+}


🛠️ Refactor suggestion | 🟠 Major

Consider extracting common scalar operation logic to reduce duplication.

The add(), sub(), and mul() methods duplicate significant logic from their corresponding operators (operator+, operator-, operator*). Each method repeats ~21 lines of tensor creation, type switching, and option configuration.

Consider refactoring to extract a common helper function that handles scalar-to-tensor conversion with configurable type, then use it in both the operators and these new methods.

🔎 Example refactoring approach

// Helper to convert scalar to tensor with specified type Tensor Tensor::scalarToTensor(float value, DataTypes data_type) { auto rhs_tensor = Tensor::empty({1}, data_type, device()).alloc(); if (device() != kCPU) { Context::instance().buildOpAndSubmitTask( OpTypes::kFill, aops::FillOpOptions{.type = aops::FillOpTypes::kSpecific, .value = value}, {rhs_tensor}); } else { switch (data_type) { case kFloat32: *(rhs_tensor.ptr<float>()) = value; break; case kFloat16: *(rhs_tensor.ptr<half_float::half>()) = half_float::half(value); break; case kInt32: *(rhs_tensor.ptr<int32_t>()) = value; break; case kInt16: *(rhs_tensor.ptr<int16_t>()) = value; break; case kInt8: *(rhs_tensor.ptr<int8_t>()) = value; break; case kInt16PerTensorSym: *(rhs_tensor.ptr<int16_t>()) = value; break; default: NYI("Type is not supported"); break; } } return rhs_tensor; } Tensor Tensor::add(float rhs, DataTypes data_type) { auto rhs_tensor = scalarToTensor(rhs, data_type); auto opts = aops::AddOpOptions{}; opts.setInputsConstant(0, 0); opts.setInputsConstant(1, 1); return Context::instance().buildOpAndSubmitTask(OpTypes::kAdd, opts, {*this, rhs_tensor})[0]; }

Then apply the same pattern to sub(), mul(), and potentially update the operator overloads to use this helper.

Also applies to: 213-232, 256-276

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the file and confirm the code location find . -name "Tensor.cpp" -type f | head -5

Repository: UbiquitousLearning/mllm

Length of output: 89

🏁 Script executed:

# Search for usage patterns of these methods with explicit data_type arguments rg -n '\.add\(.*,\s*k[A-Z]' --type=cpp -A 2 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 592

🏁 Script executed:

# Look for similar sub() and mul() methods rg -n '\.sub\(.*,\s*k[A-Z]|\.mul\(.*,\s*k[A-Z]' --type=cpp -A 2 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 463

🏁 Script executed:

# Check operation backend definitions for type handling rg -n 'AddOp|SubOp|MulOp' mllm --type=cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 2168

🏁 Script executed:

# Look for validation logic in Tensor class rg -n 'buildOpAndSubmitTask.*Add|buildOpAndSubmitTask.*Sub|buildOpAndSubmitTask.*Mul' --type=cpp -A 5 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 6330

🏁 Script executed:

# Check if there's any type validation or casting logic rg -n 'dtype\(\)|data_type' mllm/core/Tensor.cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 1083

🏁 Script executed:

# Find and examine AddOp, SubOp, MulOp implementations find . -name "*Op.hpp" -o -name "*Op.cpp" | grep -E "Add|Sub|Mul" | head -10

Repository: UbiquitousLearning/mllm

Length of output: 446

🏁 Script executed:

# Look at the operation backend headers/implementations find mllm/core/aops -name "*.hpp" -o -name "*.cpp" | head -20

Repository: UbiquitousLearning/mllm

Length of output: 646

🏁 Script executed:

# Check if there's type compatibility logic in the operation system rg -n 'typeCompatible|validateType|checkType|dtype.*dtype' mllm/core/aops --type=cpp --type=hpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 94

🏁 Script executed:

# Look for any type casting or broadcasting in arithmetic operations rg -n 'cast|broadcast|promote.*type' mllm/core/aops --type=cpp --type=hpp -i | head -20

Repository: UbiquitousLearning/mllm

Length of output: 94

🏁 Script executed:

# Examine the full context of how add method is declared in header rg -n 'Tensor::add\(float' mllm/core --type=hpp -B 2 -A 5

Repository: UbiquitousLearning/mllm

Length of output: 94

🏁 Script executed:

# Check the actual file size and see more context around these methods wc -l mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 91

🏁 Script executed:

# Find AddOp, SubOp, MulOp implementations find mllm/core/aops -name "*AddOp*" -o -name "*SubOp*" -o -name "*MulOp*"

Repository: UbiquitousLearning/mllm

Length of output: 122

🏁 Script executed:

# Look at the add method declaration in Tensor.h find mllm -name "Tensor.h" -o -name "Tensor.hpp" | xargs cat | grep -A 5 "Tensor add"

Repository: UbiquitousLearning/mllm

Length of output: 357

🏁 Script executed:

# Check if there are any type validation or compatibility checks in the tensor operations rg 'dtype\(\).*!=.*data_type|data_type.*!=.*dtype' mllm --type-list | head -5

Repository: UbiquitousLearning/mllm

Length of output: 456

🏁 Script executed:

# Look for type compatibility in the backend operations find mllm -name "*.h" -o -name "*.hpp" | xargs grep -l "AddOp\|SubOp\|MulOp" 2>/dev/null | head -5

Repository: UbiquitousLearning/mllm

Length of output: 272

🏁 Script executed:

# Examine the AddOp implementation find mllm/core/aops -name "AddOp.cpp" -o -name "AddOp.h"

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Get the full context of the add() method and nearby methods sed -n '148,280p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 6076

🏁 Script executed:

# Check the ElewiseOps implementations to see if they handle mixed types head -100 mllm/backends/cpu/ops/ElewiseOps.hpp

Repository: UbiquitousLearning/mllm

Length of output: 3068

🏁 Script executed:

# Look for any type broadcasting or casting logic in the elewise operations grep -n 'dtype\|type.*cast\|promote' mllm/backends/cpu/ops/ElewiseOps.hpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Check if there's documentation about type compatibility in AddOp, SubOp, MulOp find mllm/core/aops -name "*.hpp" | xargs grep -l "AddOp\|SubOp\|MulOp"

Repository: UbiquitousLearning/mllm

Length of output: 124

🏁 Script executed:

# Look at the actual implementation or at least the class definition sed -n '1,50p' mllm/core/aops/MatMulOp.hpp

Repository: UbiquitousLearning/mllm

Length of output: 1710

🏁 Script executed:

# Search for any tests that show valid type combinations for add/sub/mul find . -name "*test*" -type f | xargs grep -l "\.add\(.*k[A-Z]\|\.sub\(.*k[A-Z]\|\.mul\(.*k[A-Z]" 2>/dev/null | head -5

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Check if there's a pattern for mixed-type operations - look at the Qwen example more carefully sed -n '188,200p' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

Repository: UbiquitousLearning/mllm

Length of output: 555

🏁 Script executed:

# Look at the ElewiseOps.hpp to see AddOp, SubOp, MulOp base class definitions head -200 mllm/core/aops/ElewiseOps.hpp

Repository: UbiquitousLearning/mllm

Length of output: 3196

🏁 Script executed:

# Check CPU implementation of AddOp to see if there's type handling find mllm/backends/cpu/ops -name "*.cpp" | xargs grep -l "CPUAddOp\|CPUSubOp\|CPUMulOp" | head -1 | xargs head -300

Repository: UbiquitousLearning/mllm

Length of output: 12943

🏁 Script executed:

# Look for any comments or documentation about type compatibility rg -n 'type.*compati|mixed.*type|broadcast.*type' mllm/core --type=cpp | head -10

Repository: UbiquitousLearning/mllm

Length of output: 208

🏁 Script executed:

# Check if operations validate input tensor types before execution rg -n 'dtype.*==\|dtype.*!=\|type.*mismatch' mllm/backends/cpu/ops --type=cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Look at the actual CPU Add operation implementation find mllm/backends/cpu -name "*.cpp" | xargs grep -A 30 "void CPUAddOp::forward"

Repository: UbiquitousLearning/mllm

Length of output: 2883

🏁 Script executed:

# Check how output dtype is determined in the operation context rg -n 'buildOpAndSubmitTask.*Add|buildOpAndSubmitTask.*Sub|buildOpAndSubmitTask.*Mul' mllm/core --type=cpp -B 5 -A 10 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 3179

🏁 Script executed:

# Look for how output tensor dtype is set when operations are executed rg -n 'output\.dtype\(\)|dtype.*output' mllm/core/aops/ElewiseOps.cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 178

🏁 Script executed:

# Check if there's any type validation or conversion in the BaseOp or Context when executing operations rg -n 'dtype.*mismatch|type.*check|validateInput' mllm/core --type=cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Look at how the output tensor is created to see what dtype it gets sed -n '100,150p' mllm/core/aops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1503

🏁 Script executed:

# Check how output dtype is determined more carefully - look at the macro definition sed -n '1,120p' mllm/core/aops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 9353

🏁 Script executed:

# Verify the actual output type by looking at the full macro grep -A 20 '__MLLM_ELEWISE_OP_IMPL\|__MLLM_ELEWISE_OP_DEFINE' mllm/core/aops/ElewiseOps.cpp | head -60

Repository: UbiquitousLearning/mllm

Length of output: 2887

🏁 Script executed:

# Check what happens when inputs have mismatched dtypes - look for any type casting rg -n 'cast.*dtype|dtype.*cast|input.*cast' mllm/backends/cpu/ops/ElewiseOps.cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 713

🏁 Script executed:

# See if there's validation in the Context that checks input compatibility find mllm/core -name "Context*" | head -3

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Check if there are any tests showing mixed-type add/sub/mul operations find . -path "*/test*" -name "*.cpp" -o -name "*.hpp" | xargs grep -l "add.*kInt\|mul.*kFloat\|sub.*kInt" 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 875

🏁 Script executed:

# Let me understand the exact problem - when add() uses different data_type, what happens at the backend # Check if there's any type casting or validation when inputs have different dtypes rg -n 'input0\.ptr.*input1\.ptr|inputs\[0\]\.ptr.*inputs\[1\]\.ptr' mllm/backends/cpu/ops/ElewiseOps.cpp | head -10

Repository: UbiquitousLearning/mllm

Length of output: 1330

🏁 Script executed:

# Look more carefully at what happens when input types don't match sed -n '200,250p' mllm/backends/cpu/ops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 2334

🏁 Script executed:

# Check if there are type compatibility assertions or checks rg -n 'assert.*dtype|CHECK.*dtype|validate.*dtype' mllm --type=cpp | head -10

Repository: UbiquitousLearning/mllm

Length of output: 181

🏁 Script executed:

# See if the Qwen example actually uses the result of the mixed-type operation sed -n '188,200p' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

Repository: UbiquitousLearning/mllm

Length of output: 555

🏁 Script executed:

# Let me trace exactly what happens in the Qwen3 case - what is attn_min's dtype? sed -n '185,198p' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

Repository: UbiquitousLearning/mllm

Length of output: 627

🏁 Script executed:

# Check what type scale_ is in the Qwen3 model grep -n 'scale_' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp | head -10

Repository: UbiquitousLearning/mllm

Length of output: 179

🏁 Script executed:

# Now verify the critical issue - if input1 is int16 but output is float, what happens? # The backend will call input1.ptr<float>() on an int16 tensor - this is unsafe casting # Let's confirm the reshape macro determines output dtype as inputs[0].dtype() sed -n '28,50p' mllm/core/aops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1991

🏁 Script executed:

# Check if there's any type validation before the backend operation is called rg -n 'input.*dtype.*==\|inputs\[.*\]\.dtype' mllm/core/Tensor.cpp | grep -A 2 -B 2 'buildOpAndSubmitTask.*Add\|buildOpAndSubmitTask.*Sub\|buildOpAndSubmitTask.*Mul'

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Look at what happens when we have a float tensor and call .add(scalar, kInt16) # Does it actually work or fail? Check CPU backend implementation more carefully sed -n '140,160p' mllm/backends/cpu/ops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1390

🏁 Script executed:

# Verify the critical type safety issue - when input1 is int16 but we're casting to float # Let's see what happens when we have float input0 and int16 input1 being added sed -n '140,180p' mllm/backends/cpu/ops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 2432

🏁 Script executed:

# Check if there's any backend that properly handles mixed-type operations find mllm/backends -name "ElewiseOps.cpp" -o -name "ElewiseOps.hpp" | head -5

Repository: UbiquitousLearning/mllm

Length of output: 140

🏁 Script executed:

# Look for any OpenCL or other backend implementations that might handle mixed types find mllm/backends -path "*/ops/Elewise*" -type f | head -10

Repository: UbiquitousLearning/mllm

Length of output: 218

🏁 Script executed:

# Search for any documentation or comments about type compatibility in mixed operations rg -n 'mixed.*type|type.*mismatch|type.*compat' mllm --type=cpp --type=hpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 94

🏁 Script executed:

# Critical check: verify what happens when add() is called with a different data_type # Let's trace through the exact scenario: float tensor with .add(scalar, kInt16) # The output will be float (from reshape using inputs[0].dtype()) # But input1 will be int16 # In CPU backend, when dtype=kFloat32, it does input1.ptr<mllm_fp32_t>() on an int16 tensor # This is unsafe! Let's verify by looking at the actual case python3 << 'EOF' # Let's trace what happens in the Qwen3 example: # attn_min is float32 (output of min operation) # attn_min.add(minus_value, kInt16) creates: # - input0 = attn_min (float32) # - input1 = rhs_tensor (int16, shape {1}) # - output = Tensor::empty with dtype = inputs[0].dtype() = float32 # # In CPU backend forward() for AddOp: # - dtype = output.dtype() = float32 # - Executes case kFloat32: # - Calls input1.ptr<mllm_fp32_t>() on a tensor that is actually int16 # - This is undefined behavior / type safety violation print("Identified critical issue:") print("- add/sub/mul with different data_type allows mixed-type operations") print("- Output dtype is always LHS dtype (inputs[0].dtype())") print("- Backend assumes input1 has same dtype as output") print("- input1.ptr<float>() called on int16 data = TYPE SAFETY BUG") EOF

Repository: UbiquitousLearning/mllm

Length of output: 153

Mixed-type scalar operations contain a critical type safety bug where the backend unsafely casts mismatched input types.

When add(), sub(), or mul() are called with a data_type parameter differing from the tensor's dtype, the operation framework creates a type mismatch:

Output dtype is forced to match the LHS tensor (via inputs[0].dtype() in the reshape macro)

Input1 is created with the specified data_type parameter

The CPU backend's forward() method switches on output dtype and directly casts both inputs to that type without validation

This causes undefined behavior. For instance, float_tensor.add(scalar, kInt16) results in:

input0: float32

input1: int16

output: float32

When the backend processes the float32 case, it calls input1.ptr<mllm_fp32_t>() on an int16 tensor, violating type safety.

This bug is triggered in the Qwen3 model where attn_min.add(minus_value, kInt16) mixes float32 and int16 types.

Required fixes:

Validate type compatibility in the reshape macro (reject mismatches or auto-cast consistently)

Ensure backend implementations only cast between compatible types

Add assertions or checks before unsafe pointer casts

🤖 Prompt for AI Agents

In mllm/core/Tensor.cpp around lines 170-190, the scalar add implementation creates rhs tensors with an explicit data_type that can differ from the LHS tensor dtype, which leads to unsafe backend pointer casts and UB; fix by (1) validating that the requested data_type matches the LHS dtype or performing an explicit, safe cast/conversion of the scalar into the LHS dtype before creating rhs_tensor (i.e., always allocate rhs_tensor with the same dtype as *this and convert rhs into that representation), (2) update the reshape/op construction so input/output dtypes are consistent (reject mixed-type ops or canonicalize types before building ops), and (3) add runtime checks/assertions in CPU/backend forward paths to assert input tensor dtypes match expected output dtype before performing ptr<T>() casts so mismatches fail early rather than causing UB.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

CMakeLists.txt (1)
263-264: LGTM! Consider adding a brief comment.

The FlatBuffers integration is configured correctly:

Tests are disabled to avoid unnecessary builds

EXCLUDE_FROM_ALL ensures it's only built when needed by dependent targets

Optionally, consider adding a brief comment explaining the purpose:
🔎 Suggested documentation
+# FlatBuffers for IR serialization
 set(FLATBUFFERS_BUILD_TESTS OFF)
 add_subdirectory(third_party/flatbuffers EXCLUDE_FROM_ALL)

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 426b7c1 and c065a81.

📒 Files selected for processing (5)

.gitmodules
CMakeLists.txt
mllm/CMakeLists.txt
third_party/flatbuffers
tools/mllm-ir-netron/README.md

✅ Files skipped from review due to trivial changes (1)

third_party/flatbuffers

🧰 Additional context used

📓 Path-based instructions (2)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/CMakeLists.txt

{mllm,mllm-cli,pymllm}/**/*.{toml,json,txt,md}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

When suggesting new dependencies, evaluate their maturity, licensing, and maintenance status. Prefer well-established and actively maintained libraries.

Files:

mllm/CMakeLists.txt

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-android
GitHub Check: build-macos
GitHub Check: build-x86

🔇 Additional comments (2)

.gitmodules (1)

40-42: No issues with the FlatBuffers target name. The code correctly uses flatbuffers (lowercase), which is the proper CMake target exported by FlatBuffers when included via add_subdirectory(third_party/flatbuffers) in the root CMakeLists.txt. The namespaced target flatbuffers::flatbuffers does not exist in FlatBuffers—only FlatBuffers::FlatBuffers (CamelCase) is provided as an alias for find_package usage, which is not relevant here.

Likely an incorrect or invalid review comment.

mllm/CMakeLists.txt (1)

35-35: Remove unused FlatBuffers dependency or initialize the git submodule and use correct target name.

The target name flatbuffers is incorrect for the current build configuration. When using add_subdirectory(third_party/flatbuffers), the CMake target should be FlatBuffers (the interface alias exported by FlatBuffers' CMakeLists.txt). Additionally, the third_party/flatbuffers git submodule is currently empty and will cause the build to fail. More importantly, FlatBuffers is not used anywhere in the codebase—no includes or references exist—making this a dead dependency that should be removed entirely.

UbiquitousLearning

LGTM

- Added a function `cloneQuantizationSpecType` to clone quantization specifications without duplicating scale data. - Updated `LLMQuantRecipeRMSNormPattern` to ensure weight quantization methods match input types without sharing. - Modified `RMSNormOp` to register parameters in the IR context if they do not already exist. - Changed `mllm_uint4_t` structure to use `uint8_t` instead of `int8_t` for data representation. - Added multiple TODO comments in `QnnAOTNodeTensor` and `QnnAOTEnv` regarding the handling of constant values and static weights.

coderabbitai

Actionable comments posted: 8

♻️ Duplicate comments (6)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
684-685: Add defensive config validation to prevent runtime exceptions.

Lines 684-685 directly access nested config keys without validation. The nlohmann::json library throws exceptions if keys don't exist. Follow the defensive pattern used elsewhere in the codebase (e.g., AOTPipeline.cpp, MarkTensorIO.cpp) by checking key existence with .contains() before access.
🔎 Proposed fix
-  auto config = AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"]["linear"];
-  auto use_config = config["fallback"];
+  if (!AOTCompileContext::getInstance().getConfig().contains("quant_recipe") ||
+      !AOTCompileContext::getInstance().getConfig()["quant_recipe"].contains("builtin_llm_pass") ||
+      !AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"].contains("linear")) {
+    MLLM_WARN("LinearPattern: quant_recipe.builtin_llm_pass.linear config not found");
+    return false;
+  }
+  auto config = AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"]["linear"];
+  if (!config.contains("fallback")) {
+    MLLM_WARN("LinearPattern: fallback config not found");
+    return false;
+  }
+  auto use_config = config["fallback"];
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)

286-293: FIXME comment should be removed (duplicate concern).

The mapping of kInt4/kUInt4 to QNN_DATATYPE_SFIXED_POINT_4/QNN_DATATYPE_UFIXED_POINT_4 is correct per QNN documentation. The FIXME comment on line 286 is misleading and should be removed.

As noted in a previous review, the 4-bit quantized types are actively used (e.g., "w4a16" quantization recipes), and the mapping follows the same pattern as 8-bit and 16-bit variants.

336-336: Assertion on quant_recipe should be replaced with graceful handling (duplicate concern).

Line 336 uses MLLM_RT_ASSERT which will crash if quant_recipe is missing. A previous review suggested checking for null and returning default parameters instead.

This is the same issue flagged in previous reviews. Consider implementing the suggested fix from the past review comment.

473-493: addOperation implementation incomplete (duplicate concern).

The method constructs a Qnn_OpConfig_t but has TODO comments for critical parts (lines 484-490): parameters, inputs, outputs, validation, and actually adding the node to the QNN graph. Line 492 only stores the operation locally without registering it with the QNN backend.

This is the same issue flagged in previous reviews. The method is currently non-functional as a stub.

495-501: compile() is a stub (duplicate concern).

The method only sets is_compiled_ to true without performing actual compilation. This may cause issues if code expects compiled graphs.

This was flagged in previous reviews. Consider tracking this with a TODO comment or implementing the compilation logic.

802-805: captureAOTGraph returns nullptr (duplicate concern).

The stub implementation returns nullptr, which will cause null pointer issues for callers expecting a valid graph.

This was flagged in previous reviews.

🧹 Nitpick comments (5)

mllm/core/DataTypes.hpp (2)
331-334: Add documentation for the 4-bit signed integer type.

Unlike the GGUF quantization types above (lines 111-194), this struct lacks documentation explaining its purpose and usage. Consider adding a Doxygen comment clarifying how the 4-bit value is stored within the 8-bit field and any usage constraints.
🔎 Suggested documentation
+/**
+ * @struct mllm_int4_t
+ * @brief 4-bit signed integer type.
+ * 
+ * Represents a 4-bit signed integer value stored in an 8-bit field.
+ * Valid range: -8 to 7.
+ */
 typedef struct {  // NOLINT
   int8_t data;
 } mllm_int4_t;
As per coding guidelines, public APIs should have clear docstrings.

336-339: Previous issue resolved; consider adding documentation.

The previous concern about using int8_t instead of uint8_t has been addressed—line 337 now correctly uses uint8_t for the unsigned 4-bit type. However, like mllm_int4_t above, this struct would benefit from a Doxygen comment explaining its purpose and valid range (0 to 15).
🔎 Suggested documentation
+/**
+ * @struct mllm_uint4_t
+ * @brief 4-bit unsigned integer type.
+ * 
+ * Represents a 4-bit unsigned integer value stored in an 8-bit field.
+ * Valid range: 0 to 15.
+ */
 typedef struct {  // NOLINT
   uint8_t data;
 } mllm_uint4_t;
As per coding guidelines, public APIs should have clear docstrings.
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
275-288: Consider explicitly defining or deleting special member functions.

The class declares a default destructor but doesn't explicitly declare copy/move constructors or assignment operators. While the default behavior is correct for the member types used, explicitly deleting these operations would make the intent clearer and prevent accidental copying of Pass instances, which is typically undesirable.
🔎 Proposed enhancement
 class LLMQuantRecipePass final : public ir::Pass {
  public:
   LLMQuantRecipePass();
 
   ~LLMQuantRecipePass() override = default;
+  
+  LLMQuantRecipePass(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass(LLMQuantRecipePass&&) = default;
+  LLMQuantRecipePass& operator=(LLMQuantRecipePass&&) = default;
 
   uint8_t run(const ir::node_ptr_t& op) override;
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)
142-147: Remove duplicate TODO comments.

Six identical TODO comments clutter the code. Consolidate to a single comment at line 142.
🔎 Proposed fix
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
+  // TODO: Constant values should also use Static and can be pruned
816-821: Remove duplicate TODO comments.

Six identical TODO comments clutter the code (same issue as lines 142-147). Consolidate to a single comment at line 816.
🔎 Proposed fix
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
+  // TODO: Constant values should also use Static and can be pruned

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c065a81 and f3dcbf8.

📒 Files selected for processing (6)

examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/core/DataTypes.hpp
mllm/core/aops/RMSNormOp.cpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/core/aops/RMSNormOp.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/core/DataTypes.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/core/aops/RMSNormOp.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/core/DataTypes.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/core/aops/RMSNormOp.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/core/aops/RMSNormOp.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/core/DataTypes.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

🧠 Learnings (4)

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.

Applied to files:

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

📚 Learning: 2025-11-25T07:26:06.575Z

Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Use named constants instead of magic numbers.

Applied to files:

mllm/core/DataTypes.hpp

🧬 Code graph analysis (3)

mllm/core/aops/RMSNormOp.cpp (2)

mllm/core/aops/RMSNormOp.hpp (1)

weight_ (34-34)

mllm/core/aops/EmbeddingOp.cpp (1)

guard (54-54)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (3)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (13)

QnnAOTNodeTensor (107-107)

v (103-105)

v (103-103)

v (110-110)

v (112-112)

v (114-114)

v (116-116)

name (156-156)

name (261-261)

name (263-263)

name (265-265)

create (48-50)

create (48-48)

mllm/compile/ir/linalg/Attribute.hpp (3)

type (68-74)

type (68-68)

create (76-76)

mllm/core/DataTypes.hpp (1)

nameOfType (733-733)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

LLMQuantRecipePass (941-964)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 23-23: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

[error] 26-26: variable name 'rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 67-67: 2 adjacent parameters of 'genSimpleQuantizationSpecAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 69-69: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 71-71: variable 'spec' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 114-114: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 115-115: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 135-135: 2 adjacent parameters of 'noSharingSingleInAndSingleOutQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 135-135: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 145-145: variable 'quant_spec_i0' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 146-146: variable 'quant_spec_o0' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 169-169: 2 adjacent parameters of 'cloneQuantizationSpecType' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 174-174: variable 'cloned_spec' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 28-28: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 30-30: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 40-40: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 54-54: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 66-66: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 80-80: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 94-94: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 106-106: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 118-118: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 132-132: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 146-146: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 160-160: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 174-174: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 188-188: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 200-200: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 212-212: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 226-226: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 240-240: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 252-252: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 266-266: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 275-275: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 283-283: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-x86
GitHub Check: build-macos
GitHub Check: build-android

🔇 Additional comments (4)

mllm/core/aops/RMSNormOp.cpp (1)

9-10: LGTM! Required includes for new IR registration logic.

These includes provide the necessary types (ir::tensor::RegisterOp and ir::graph::SubGraphOp) used in the parameter registration block below.

mllm/core/DataTypes.hpp (3)

588-593: LGTM!

The 4-bit type information is correctly defined with accurate ranges: signed 4-bit (-8 to 7) and unsigned 4-bit (0 to 15). The implementation follows the established pattern used for other basic types in the file.

707-708: LGTM!

The self-type-info registrations correctly map the 4-bit enum values to their corresponding C++ types, following the established pattern used throughout the file.

771-772: LGTM!

The macro definitions for 4-bit types follow the established naming convention and maintain consistency with the existing compatibility layer, even though lines 737-739 indicate these macros are deprecated.

coderabbitai · 2025-12-29T12:33:23Z

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

+          break;
+        }
+        default: {
+          NYI("Only support [int16, int8, bf16, f16, sf32] for now.");


⚠️ Potential issue | 🟡 Minor

Fix typo in data type list.

Line 439 contains "sf32" which should be "f32" to match the standard float32 type abbreviation.

🔎 Proposed fix

default: { - NYI("Only support [int16, int8, bf16, f16, sf32] for now."); + NYI("Only support [int16, int8, bf16, f16, f32] for now."); }

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around line 439, the error message lists "sf32" but it should be the standard "f32"; update the string literal to replace "sf32" with "f32" so the message reads "Only support [int16, int8, bf16, f16, f32] for now." and ensure no other occurrences of "sf32" remain in this file.

coderabbitai · 2025-12-29T12:33:24Z

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

+      MLLM_WARN("LLMQuantRecipeEqualPattern Only support constant Value as second inputs right now. Pls send us a issue or PR "
+                "if you want to compare two normal tensor(rather than static-tensor).");
+      return false;


⚠️ Potential issue | 🟡 Minor

Correct the pattern name in warning message.

The warning message references "LLMQuantRecipeEqualPattern" but this code is in LLMQuantRecipeElementwisePattern::rewrite. This copy-paste error could confuse debugging.

🔎 Proposed fix

} else { - MLLM_WARN("LLMQuantRecipeEqualPattern Only support constant Value as second inputs right now. Pls send us a issue or PR " + MLLM_WARN("LLMQuantRecipeElementwisePattern Only support constant Value as second inputs right now. Pls send us a issue or PR " "if you want to compare two normal tensor(rather than static-tensor)."); return false; }

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around lines 444-446, the warning message incorrectly references "LLMQuantRecipeEqualPattern" (copy-paste error) while inside LLMQuantRecipeElementwisePattern::rewrite; update the string to reference "LLMQuantRecipeElementwisePattern" (or the exact class/function name used in this file) so the log correctly identifies the emitting pattern, keeping the rest of the message unchanged.

coderabbitai · 2025-12-29T12:33:24Z

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

+          break;
+        }
+        default: {
+          NYI("Only support [int16, int8, bf16, f16, f32] for now.");


⚠️ Potential issue | 🟡 Minor

Fix typo in data type list.

Line 597 contains "f32" where it should match the standard format. The supported types list should be consistent with other patterns.

🔎 Proposed fix

Ensure the data type list matches the switch cases above (lines 585-591) which include kFloat32, kFloat16, kBFloat16, etc.

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around line 597, the NYI message lists "f32" which is inconsistent with the type names used earlier in the switch (e.g., kFloat32, kFloat16, kBFloat16); update the message to use the matching standard name (e.g., "float32") so the supported-types list reads consistently (for example: [int16, int8, bf16, f16, float32]).

coderabbitai · 2025-12-29T12:33:24Z

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

+    if (use_config["method"] == "LPBQ") {
+      // Unpack
+      std::string precision = use_config["precision"];
+      bool sym = use_config["sym"];
+      int block_size = use_config["block_size"];
+      MLLM_RETURN_FALSE_IF_NOT(sym);


⚠️ Potential issue | 🟠 Major

Add defensive validation for nested config values.

Lines 727-732 access nested config values (use_config["method"], use_config["precision"], etc.) without checking if these keys exist. Add defensive checks before accessing these values to prevent runtime exceptions.

🔎 Proposed fix

- if (use_config["method"] == "LPBQ") { + if (!use_config.contains("method") || !use_config.contains("precision") || !use_config.contains("sym") || !use_config.contains("block_size")) { + MLLM_WARN("LinearPattern: incomplete config for linear operation"); + return false; + } + if (use_config["method"] == "LPBQ") { // Unpack std::string precision = use_config["precision"]; bool sym = use_config["sym"]; int block_size = use_config["block_size"];

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around lines 727-732, the code reads nested config keys (use_config["method"], ["precision"], ["sym"], ["block_size"]) without verifying those keys exist or have expected types; add defensive validation before each access: check use_config.contains("method") and that it is a string and equals "LPBQ" (or handle missing), then check contains("precision") (string), contains("sym") (bool) and contains("block_size") (int), and use MLLM_RETURN_FALSE_IF_NOT or equivalent to fail early with clear messages if any key is missing or has the wrong type; if appropriate provide safe defaults or short-circuit the LPBQ branch when validations fail.

coderabbitai · 2025-12-29T12:33:24Z

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

+  if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
+    addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100);
+  }


⚠️ Potential issue | 🟠 Major

Add defensive config validation for conditional pattern registration.

Line 961 directly accesses nested config keys without validation: config["quant_recipe"]["builtin_llm_pass"]["model"]. The nlohmann::json library throws exceptions if keys don't exist. Use defensive checking with .contains() before access, as noted in past review comments and consistent with patterns used elsewhere in the codebase.

🔎 Proposed fix

- if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") { + if (config.contains("quant_recipe") && + config["quant_recipe"].contains("builtin_llm_pass") && + config["quant_recipe"]["builtin_llm_pass"].contains("model") && + config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") { addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100); }

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around lines 961-963, the code directly indexes config["quant_recipe"]["builtin_llm_pass"]["model"] which can throw if any key is missing; add defensive checks before registering the qwen3 pattern: verify config.contains("quant_recipe") && config["quant_recipe"].is_object() && config["quant_recipe"].contains("builtin_llm_pass") && config["quant_recipe"]["builtin_llm_pass"].is_object() && config["quant_recipe"]["builtin_llm_pass"].contains("model") (or use a safe value lookup) and only compare the model string if present and of string type; if missing, skip registration (or use a sensible default) to avoid exceptions.

coderabbitai · 2025-12-29T12:33:24Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

+
+  name_ = v->name();
+  mllm_tensor_ = v->tensor_;
+  quant_spec_ = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;


⚠️ Potential issue | 🔴 Critical

Unsafe access to quant_recipe attribute.

Line 151 directly accesses the quant_recipe attribute without checking if it exists. If the attribute is missing, this will cause a null pointer dereference before the assertion on line 336 in parseQnnQuantizeParamFromIR.

🔎 Proposed fix

name_ = v->name(); mllm_tensor_ = v->tensor_; - quant_spec_ = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_; + if (auto attr = v->getAttr("quant_recipe")) { + quant_spec_ = attr->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_; + } for (auto s : v->tensor_.shape()) { shape_.emplace_back(s); }

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around line 151, the code directly dereferences the "quant_recipe" attribute which can be absent; modify the code to first check for the attribute's presence (e.g., use v->hasAttr("quant_recipe") or capture getAttr("quant_recipe") into a variable and check for null) before calling cast_ and accessing spec_; if the attribute is missing, return or throw a clear error (or set a default) so you avoid a null dereference and ensure the later assertion in parseQnnQuantizeParamFromIR is not hit unexpectedly.

coderabbitai · 2025-12-29T12:33:24Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

+
+      // Prepare data
+      auto num_scale_offsets = (uint32_t)v->tensor_.size(cfg->ch_axis);
+      Qnn_ScaleOffset_t* scale_array = (Qnn_ScaleOffset_t*)malloc(sizeof(Qnn_ScaleOffset_t) * num_scale_offsets);


⚠️ Potential issue | 🔴 Critical

Missing null checks after malloc calls.

Lines 355, 384, and 393 call malloc without checking if allocation succeeded. If allocation fails, subsequent code will dereference null pointers, causing crashes.

🔎 Proposed fix

Add null checks after each malloc:

Qnn_ScaleOffset_t* scale_array = (Qnn_ScaleOffset_t*)malloc(sizeof(Qnn_ScaleOffset_t) * num_scale_offsets); + if (!scale_array) { + MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed to allocate memory for scale_array"); + } MLLM_RT_ASSERT_EQ(num_scale_offsets, cfg->scale.size(0));

Apply similar checks at lines 384 and 393.

Also applies to: 384-384, 393-393

🤖 Prompt for AI Agents

In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 355, 384 and 393, the code calls malloc without checking for NULL which can lead to dereferencing null on allocation failure; after each malloc call check the returned pointer for NULL, log or propagate an appropriate error, free any previously allocated memory as needed, and return or set an error status instead of continuing execution so no null pointer is dereferenced.

coderabbitai · 2025-12-29T12:33:24Z

mllm/core/aops/RMSNormOp.cpp

+
+  // Register Params
+  if (weight_ && !ir_ctx->lookupSymbolTable(getName() + ".weight")) {
+    ir::IRWriterGuard guard(ir_ctx, ir_ctx->lookupSymbolTable("init")->cast_<ir::graph::SubGraphOp>()->getTopRegion());
+    ir_ctx->create<ir::tensor::RegisterOp>(ir_ctx->create<ir::tensor::TensorValue>(weight_));
+  }


⚠️ Potential issue | 🔴 Critical

Add null-safety checks to prevent potential crash.

Line 37 chains multiple pointer operations without null checks:

lookupSymbolTable("init") could return nullptr if the "init" subgraph doesn't exist

cast_<ir::graph::SubGraphOp>() could fail if the type is incorrect

getTopRegion() could return nullptr

Any failure in this chain will cause undefined behavior or a crash.

🔎 Proposed fix with defensive checks

// Register Params if (weight_ && !ir_ctx->lookupSymbolTable(getName() + ".weight")) { - ir::IRWriterGuard guard(ir_ctx, ir_ctx->lookupSymbolTable("init")->cast_<ir::graph::SubGraphOp>()->getTopRegion()); + auto init_symbol = ir_ctx->lookupSymbolTable("init"); + if (!init_symbol) { + NYI("init subgraph not found in IR context"); + } + auto init_subgraph = init_symbol->cast_<ir::graph::SubGraphOp>(); + if (!init_subgraph) { + NYI("init symbol is not a SubGraphOp"); + } + auto init_region = init_subgraph->getTopRegion(); + if (!init_region) { + NYI("init subgraph has no top region"); + } + ir::IRWriterGuard guard(ir_ctx, init_region); ir_ctx->create<ir::tensor::RegisterOp>(ir_ctx->create<ir::tensor::TensorValue>(weight_)); }

🤖 Prompt for AI Agents

In mllm/core/aops/RMSNormOp.cpp around lines 34-39, the chained calls ir_ctx->lookupSymbolTable("init")->cast_<ir::graph::SubGraphOp>()->getTopRegion() can dereference null/invalid pointers; add null-safety checks: first store the result of lookupSymbolTable("init") in a local variable and verify it's non-null, then attempt the cast and verify the cast succeeded (or use a safe/dynamic cast API), then check the returned region is non-null before constructing the IRWriterGuard; if any check fails, skip registration or log/handle the error gracefully instead of proceeding to call methods on null.

…kend

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

pymllm/backends/qualcomm/transformers/static_qwen3.py (1)

1-3: Module appears incomplete with only unused imports.

This file contains only import statements with no implementation, docstring, or module-level documentation. Please verify whether this is intentional (e.g., a stub for future implementation) or if implementation is missing from this PR.

If this is a placeholder, consider adding:

A module-level docstring explaining the intended purpose

A TODO comment indicating what needs to be implemented

If you'd like, I can help generate the implementation structure based on the quantization patterns mentioned in the PR objectives. Would you like me to open an issue to track the completion of this module?

pyproject.toml (1)

31-32: Add version constraints for torch and torchao.

Both torch and torchao are used in quantization features (particularly in pymllm/backends/qualcomm/transformers/static_qwen3.py for pt2e quantization) without version constraints, which can lead to reproducibility issues and unexpected breaking changes. PyTorch is particularly known for introducing breaking changes between major versions, and torchao's quantization functionality is version-sensitive.

Consider pinning to specific versions or specifying minimum versions, similar to other dependencies in this file (e.g., apache-tvm-ffi == 0.1.0b4, pyyaml >= 6.0.2). At minimum: torch >= 2.0 and torchao >= 0.1.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3dcbf8 and bacdc16.

📒 Files selected for processing (5)

pymllm/backends/qualcomm/transformers/README.md
pymllm/backends/qualcomm/transformers/__init__.py
pymllm/backends/qualcomm/transformers/static_qwen3.py
pymllm/backends/qualcomm/transformers/train.py
pyproject.toml

✅ Files skipped from review due to trivial changes (1)

pymllm/backends/qualcomm/transformers/README.md

🧰 Additional context used

📓 Path-based instructions (5)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

pymllm/backends/qualcomm/transformers/static_qwen3.py

{mllm,mllm-cli,pymllm}/**/*.{sh,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

If a file starts with a shebang, it must be the first line and contain only the interpreter path and optional single argument, encoded in UTF-8.

Files:

pymllm/backends/qualcomm/transformers/static_qwen3.py

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

pymllm/backends/qualcomm/transformers/static_qwen3.py

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

pymllm/backends/qualcomm/transformers/static_qwen3.py

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

pymllm/backends/qualcomm/transformers/static_qwen3.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-macos
GitHub Check: build-x86
GitHub Check: build-android

chenghuaWang and others added 3 commits December 24, 2025 06:43

feat: add MergeLLMHeadIntoMainGraph pass and update related files

d0e94bd

Merge branch 'UbiquitousLearning:main' into wch-main

eff1182

chenghuaWang requested review from liang1232018 and oreomaker as code owners December 24, 2025 14:06

coderabbitai bot reviewed Dec 24, 2025

View reviewed changes

feat: add utility function to share quantization specifications and u…

4f94986

…pdate related patterns

coderabbitai bot reviewed Dec 25, 2025

View reviewed changes

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp Show resolved Hide resolved

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp Outdated Show resolved Hide resolved

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp Show resolved Hide resolved

coderabbitai bot reviewed Dec 25, 2025

View reviewed changes

feat: update quantization attribute handling in Qnn wrappers and LLMQ…

90fab85

…uantRecipePass

coderabbitai bot reviewed Dec 25, 2025

View reviewed changes

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

feat: add flatbuffers for ir serialize

c065a81

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

UbiquitousLearning approved these changes Dec 29, 2025

View reviewed changes

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

feat: add initial files for Transformers quantization on Qualcomm bac…

bacdc16

…kend

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

chenghuaWang merged commit 7c7c10c into UbiquitousLearning:main Dec 29, 2025
4 checks passed

This was referenced Dec 30, 2025

feat: add LLM2QnnLoweringPass and update graph splitting logic #577

Merged

fix(qualcomm): Qnn AOT refactor #578

Merged

fix: Qualcomm QNN AOT Pass #579

Merged

feat(qualcomm): Qnn AOT Lowering pass #580

Merged

coderabbitai bot mentioned this pull request Jan 7, 2026

feat(qualcomm): Qnn AOT Lowering Passes #584

Merged

feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization #572

feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization #572

Uh oh!

Conversation

chenghuaWang commented Dec 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

UbiquitousLearning left a comment

Choose a reason for hiding this comment

Uh oh!

chenghuaWang commented Dec 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 24, 2025 •

edited

Loading