Skip to content

Conversation

@chenghuaWang
Copy link
Collaborator

@chenghuaWang chenghuaWang commented Dec 24, 2025

  • Added LLMQuantRecipePass to handle quantization recipes for various operations.
  • Introduced multiple pattern classes (Index, Elementwise, Transpose, Concat, Repeat, MatMul, Equal, Where, Softmax, Linear, View, Embedding, Qwen3 Attention) to match and rewrite specific Linalg operations.
  • Removed obsolete QuantPrecisionsCfgPass files.
  • Created SplitLLMGraphPass with a placeholder for future graph splitting logic.
  • Enhanced quantization attributes with detailed specifications and improved dumping functionality.
  • Updated RTTI generation to include new quantization spec classes.
  • Added utility macros for error handling in common operations.

Summary by CodeRabbit

  • New Features

    • Full LLM quantization recipe and richer LLM quant config; new passes to merge/split LLM head graphs.
    • Backend AOT graph/tensor capture with improved tensor quantization handling and QNN integration.
    • Typed scalar Tensor ops (add/sub/mul with explicit data-type).
    • Added public 4-bit integer types and a 16-bit vector attribute.
  • Refactor

    • Unified backend visitor/pattern interface and IR/RTTI/attribute changes to support quantization specs and new pass flow.
  • Chores

    • Enabled extra example modules, added FlatBuffers submodule, added Python deps and docs for Qualcomm backend.

✏️ Tip: You can customize this high-level summary in your review settings.

chenghuaWang and others added 3 commits December 24, 2025 06:43
- Added LLMQuantRecipePass to handle quantization recipes for various operations.
- Introduced multiple pattern classes (Index, Elementwise, Transpose, Concat, Repeat, MatMul, Equal, Where, Softmax, Linear, View, Embedding, Qwen3 Attention) to match and rewrite specific Linalg operations.
- Removed obsolete QuantPrecisionsCfgPass files.
- Created SplitLLMGraphPass with a placeholder for future graph splitting logic.
- Enhanced quantization attributes with detailed specifications and improved dumping functionality.
- Updated RTTI generation to include new quantization spec classes.
- Added utility macros for error handling in common operations.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

Adds a pattern-driven LLM quantization-recipe pass, two new QNN AOT passes (merge/split LLM head), makes the AOT pipeline config-driven, extends IR quantization RTTI/attributes and numeric types (4-bit, VectorInt16, QuantizationSpec), updates Region ownership types, refactors QNN AOT wrappers/visitor APIs, and updates build/examples to include FlatBuffers and LLM config changes.

Changes

Cohort / File(s) Change summary
AOT pipeline & new passes
mllm/backends/qnn/aot/passes/AOTPipeline.cpp, mllm/backends/qnn/aot/passes/LLMQuantRecipePass.{cpp,hpp}, mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.{cpp,hpp}, mllm/backends/qnn/aot/passes/SplitLLMGraphPass.{cpp,hpp}
Add config-gated pipeline (runs LLMQuantRecipePass when quant_recipe.llm_recipe), implement LLMQuantRecipePass (many pattern match/rewrite rules), add MergeLLMHeadIntoMainGraphPass (move/rewire lm_head), and add placeholder SplitLLMGraphPass.
QNN AOT wrappers & env API
mllm/backends/qnn/aot/QnnWrappersAPI.{hpp,cpp}
Major API/state changes: QnnAOTNodeTensor gains force_static_weight and parsing helpers; parse helpers accept IR tensor; QnnAOTNodeOperation naming (set/get); QnnAOTGraph addOperation/compile semantics changed; QnnAOTEnv gains captureAOTGraph/captureAOTNodeOp/captureQnnAOTNodeTensor and getFuncSymbol.
Visitor interface & elementwise pattern
mllm/backends/qnn/aot/visitor/Base.hpp, .../Elewise.{cpp,hpp}
Replace virtual addNode(...) with compile(IRWriter&, op); refactor elementwise Add pattern to construct QnnAOTNodeOperation, attach captured tensors, validate context/graph, and register via QnnAOTEnv; remove QnnAOTAddQuantRecipePattern.
IR quantization attrs & RTTI
mllm/compile/ir/linalg/Attribute.{hpp,cpp}, mllm/compile/ir/GeneratedRTTIKind.hpp, mllm/compile/ir/NodeRTTIClassOfImpl.hpp, mllm/compile/ir/rtti_kind_gen.py
Introduce QuantizationSpec (including kRaw), new LinalgIRQuantizatonSpecAttr with builders and rich dump/stringification, update RTTI enums/macros and generator to include new attributes.
IR ownership & removal semantics
mllm/compile/ir/Node.hpp, mllm/compile/ir/Node.cpp
Region::inputs()/outputs() now use std::list<node_weak_ptr_t> (instead of val_ptr_t); IRWriter::removeOp clears op->belongsTo() when removing ops.
IR builtin attrs & VectorInt16
mllm/compile/ir/builtin/Attribute.{hpp,cpp}, mllm/compile/ir/NodeRTTIClassOfImpl.hpp
Add VectorInt16Attr with builder, data accessor, dump; add corresponding RTTI macro.
DataTypes & 4-bit types
mllm/core/DataTypes.{hpp,cpp}
Add DataTypes kInt4/kUInt4 and mllm_int4_t/mllm_uint4_t types, register basic/self type infos, update bytesOfType and nameOfType.
Tensor scalar ops
mllm/core/Tensor.{hpp,cpp}
Add typed scalar Tensor methods: add/sub/mul(float rhs, DataTypes) and corresponding overloads.
Elementwise constant handling
mllm/core/aops/ElewiseOps.cpp
When constant input present, select VectorFP32Attr for kFloat32 or VectorInt16Attr for kInt16; warn for unsupported constant types.
RMSNorm trace registration
mllm/core/aops/RMSNormOp.cpp
Register runtime parameter (weight) in symbol table during trace when needed and emit RegisterOp in init subgraph.
QNN visitor lowering changes
mllm/backends/qnn/aot/visitor/Elewise.cpp
New compile flow: retrieve AOTCompileContext, validate quant_recipe/graph/context, build and name QnnAOTNodeOperation, attach captured tensors, register op via QnnAOTEnv->captureAOTNodeOp.
Utilities & macros
mllm/utils/Common.hpp
Add MLLM_RETURN_FALSE_IF_NOT and MLLM_RETURN_FALSE_IF macros for early-false returns.
Examples & build config
examples/CMakeLists.txt, examples/qwen3_qnn_aot/compile.cpp, examples/qwen3_qnn_aot/qnn_aot_cfg.json, examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp, CMakeLists.txt, mllm/CMakeLists.txt, .gitmodules, third_party/flatbuffers
Re-enable example subdirs, remove explicit qnn_aot_env.createContext creation in compile.cpp, restructure qnn_aot_cfg.json to builtin_llm_pass/kvs/fallbacks, adjust attention quant/dequant and masked-softmax code, add FlatBuffers submodule and link flatbuffers.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as User/CLI
  participant AOT as AOTPipeline
  participant Pass as LLMQuantRecipePass
  participant Merge as MergeLLMHeadPass
  participant Env as QnnAOTEnv
  participant Graph as QnnAOTGraph

  CLI->>AOT: request compile (config with quant_recipe.llm_recipe=true)
  AOT->>Pass: run LLMQuantRecipePass on IR
  Note right of Pass `#e6f7ff`: pattern-match & annotate ops\nwith QuantizationSpec attrs
  AOT->>Merge: run MergeLLMHeadIntoMainGraphPass
  Merge->>AOT: relocate lm_head and rewire returns
  AOT->>Env: lowering invokes captureAOTGraph / captureQnnAOTNodeTensor / captureAOTNodeOp
  Env->>Graph: register tensors & ops into graph
  Graph->>Env: compile graph (set is_compiled_)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • liang1232018
  • oreomaker
  • yirongjie

Poem

"I hop through IR by moonlit code,
I stitch the passes, shift the node.
Specs whisper, heads find their place,
Tensors hop and join the race.
A rabbit's patch—compiled with grace. 🐇"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.71% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: implementing LLMQuantRecipePass and associated patterns for quantization in the Qnn AOT backend.
Description check ✅ Passed The description is mostly complete, listing major changes including LLMQuantRecipePass implementation, pattern classes, and quantization enhancements; however, it lacks specific details about why these changes are necessary or their expected impact.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
mllm/compile/ir/linalg/Attribute.cpp (1)

251-254: Same parenthesis issue for kNone in the second lambda.

This kNone case has the same missing closing parenthesis issue as the first lambda.

🧹 Nitpick comments (10)
mllm/utils/Common.hpp (1)

57-61: Consider adding brief documentation for the new macros.

The new early-return guard macros are clear and follow existing patterns in the codebase. However, adding a brief comment explaining their purpose and typical usage would improve developer ergonomics.

📝 Suggested documentation
+// Early-return guards for boolean predicates
 #define MLLM_RETURN_FALSE_IF_NOT(x) \
   if (!(x)) { return false; }
 
 #define MLLM_RETURN_FALSE_IF(x) \
   if ((x)) { return false; }

Note: Static analysis suggests using constexpr template functions instead of macros, but this is a stylistic preference. The macro approach is consistent with existing codebase conventions (e.g., NYI, RT_ASSERT, FORCE_INLINE) and is appropriate for simple control-flow guards.

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)

25-30: Consider breaking early after finding CallGraphOp.

The walk asserts call_main_graph_op == nullptr on each iteration, meaning it expects exactly one CallGraphOp. Consider returning WALK_BREAK after finding the first one to avoid unnecessary iteration.

🔎 Suggested optimization
   writer.walk<ir::graph::CallGraphOp>(
       [&](ir::IRWriter& /*writer*/, const ir::graph::CallGraphOp::ptr_t& call_op) -> ir::IRWriter::WalkResult {
         MLLM_RT_ASSERT_EQ(call_main_graph_op, nullptr);
         call_main_graph_op = call_op;
-        return ir::IRWriter::WalkResult::WALK_CONTINUE;
+        return ir::IRWriter::WalkResult::WALK_BREAK;
       });

82-106: Redundant assertions can be consolidated.

Lines 83-84 and 98-99 perform identical assertions (inputs().size() == 1 and outputs().size() == 1). Since the assertions at lines 83-84 already validate these invariants, the second set at lines 98-99 is redundant.

🔎 Suggested consolidation
   // Make sure the output is right
   {
     MLLM_RT_ASSERT_EQ(llm_head_op->inputs().size(), 1);
     MLLM_RT_ASSERT_EQ(llm_head_op->outputs().size(), 1);
     auto llm_head_input = llm_head_op->inputs().front();
     auto llm_head_output = llm_head_op->outputs().front();
     auto& graph_region_outs = main_graph->outputs();

     // Replace llm_head_input with llm_head_output in graph outputs
     std::replace(graph_region_outs.begin(), graph_region_outs.end(), llm_head_input, llm_head_output);
     std::replace(return_op->inputs().begin(), return_op->inputs().end(), llm_head_input, llm_head_output);
     std::replace(main_graph->getTopRegion()->outputs().begin(), main_graph->getTopRegion()->outputs().end(), llm_head_input,
                  llm_head_output);
-  }
 
-  // Make sure the call_graph_op's output is right
-  {
-    MLLM_RT_ASSERT_EQ(llm_head_op->inputs().size(), 1);
-    MLLM_RT_ASSERT_EQ(llm_head_op->outputs().size(), 1);
-    auto llm_head_input = llm_head_op->inputs().front();
-    auto llm_head_output = llm_head_op->outputs().front();
-    auto& graph_region_outs = call_main_graph_op->outputs();
-
-    // Replace llm_head_input with llm_head_output in call graph outputs
-    std::replace(graph_region_outs.begin(), graph_region_outs.end(), llm_head_input, llm_head_output);
+    // Also update call_graph_op's outputs
+    auto& call_graph_outs = call_main_graph_op->outputs();
+    std::replace(call_graph_outs.begin(), call_graph_outs.end(), llm_head_input, llm_head_output);
   }
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)

21-21: Unused variable: writer is created but never used.

The IRWriter instance is constructed but not utilized in the current placeholder implementation. Either remove it until the graph splitting logic is implemented, or mark it as [[maybe_unused]] to suppress compiler warnings.

🔎 Suggested fix
   auto module_op = op->cast_<ir::ModuleOp>();
-  auto writer = ir::IRWriter(getCtx(), module_op->getTopRegion());
+  [[maybe_unused]] auto writer = ir::IRWriter(getCtx(), module_op->getTopRegion());

   // TODO: Implement graph splitting logic here

Or simply remove the line until it's needed.


23-23: Placeholder TODO noted.

The pass is a placeholder for future graph-splitting logic. The TODO comment follows the coding guidelines format.

Would you like me to open an issue to track the implementation of this graph splitting logic?

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)

11-18: Consider explicitly deleting copy/move operations for Rule of 5 compliance.

Per cppcoreguidelines-special-member-functions, defining a destructor typically implies copy/move should also be addressed. Since passes are meant to be used via shared_ptr from the factory, explicitly deleting copy/move operations clarifies intent and prevents accidental slicing.

🔎 Proposed fix
 class SplitLLMGraphPass final : public ir::Pass {
  public:
   SplitLLMGraphPass() = default;
+  SplitLLMGraphPass(const SplitLLMGraphPass&) = delete;
+  SplitLLMGraphPass& operator=(const SplitLLMGraphPass&) = delete;
+  SplitLLMGraphPass(SplitLLMGraphPass&&) = delete;
+  SplitLLMGraphPass& operator=(SplitLLMGraphPass&&) = delete;
 
   ~SplitLLMGraphPass() override = default;
 
   uint8_t run(const ir::node_ptr_t& op) override;
 };
mllm/compile/ir/linalg/Attribute.cpp (1)

166-270: Extract duplicated gen_quant_spec_str lambda into a shared utility.

The gen_quant_spec_str lambda is duplicated verbatim in both LinalgIRQuantizatonAnnotationAttr::dump (lines 26-126) and LinalgIRQuantizatonSpecAttr::dump (lines 167-267). Consider extracting this into a free function or static method to avoid code duplication and simplify future maintenance.

mllm/compile/ir/linalg/Attribute.hpp (1)

325-344: Consider applying Rule of 5 for consistency.

Per static analysis, this class defines a non-default destructor but lacks explicit copy/move operations. While the base class may handle this, explicitly deleting them would clarify intent. The short parameter name p in dump() is consistent with other dump methods in the codebase, so that hint can be ignored.

🔎 Proposed fix
 class LinalgIRQuantizatonSpecAttr final : public LinalgIRAttr {
  public:
   QuantizationSpec::ptr_t spec_;
 
   DEFINE_SPECIFIC_IR_CLASS(LinalgIRQuantizatonSpecAttr);
 
   ~LinalgIRQuantizatonSpecAttr() override;
+  LinalgIRQuantizatonSpecAttr(const LinalgIRQuantizatonSpecAttr&) = delete;
+  LinalgIRQuantizatonSpecAttr& operator=(const LinalgIRQuantizatonSpecAttr&) = delete;
 
   LinalgIRQuantizatonSpecAttr();
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

23-26: Multiple TODO patterns are placeholders.

These patterns match but don't perform actual rewrites. Ensure these are tracked for completion before the quantization pipeline is used in production.

Would you like me to open issues to track completion of these TODO rewrite implementations?

Also applies to: 39-42, 52-55, 65-68, 78-81, 91-94, 104-107, 117-120, 130-133, 143-146

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)

187-200: Consider explicit copy/move deletion for Rule of 5 compliance.

Similar to other pass classes, LLMQuantRecipePass defines a destructor without explicit copy/move operations. Since passes are used via shared_ptr, explicitly deleting these operations clarifies intent.

🔎 Proposed fix
 class LLMQuantRecipePass final : public ir::Pass {
  public:
   LLMQuantRecipePass();
+  LLMQuantRecipePass(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete;
 
   ~LLMQuantRecipePass() override = default;
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4cbe15 and eff1182.

📒 Files selected for processing (22)
  • examples/CMakeLists.txt
  • examples/qwen3_qnn_aot/compile.cpp
  • examples/qwen3_qnn_aot/qnn_aot_cfg.json
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp
  • mllm/backends/qnn/aot/passes/QuantPrecisionsCfgPass.cpp
  • mllm/backends/qnn/aot/passes/QuantPrecisionsCfgPass.hpp
  • mllm/backends/qnn/aot/passes/SplitGraphPass.cpp
  • mllm/backends/qnn/aot/passes/SplitGraphPass.hpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp
  • mllm/compile/ir/GeneratedRTTIKind.hpp
  • mllm/compile/ir/Node.cpp
  • mllm/compile/ir/Node.hpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/compile/ir/linalg/Attribute.hpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/utils/Common.hpp
💤 Files with no reviewable changes (1)
  • examples/qwen3_qnn_aot/compile.cpp
🧰 Additional context used
📓 Path-based instructions (5)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/utils/Common.hpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
  • mllm/compile/ir/GeneratedRTTIKind.hpp
  • mllm/compile/ir/Node.hpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/compile/ir/Node.cpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/utils/Common.hpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
  • mllm/compile/ir/GeneratedRTTIKind.hpp
  • mllm/compile/ir/Node.hpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/compile/ir/Node.cpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/utils/Common.hpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
  • mllm/compile/ir/GeneratedRTTIKind.hpp
  • mllm/compile/ir/Node.hpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/compile/ir/Node.cpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
{mllm,mllm-cli,pymllm}/**/*.{sh,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

If a file starts with a shebang, it must be the first line and contain only the interpreter path and optional single argument, encoded in UTF-8.

Files:

  • mllm/compile/ir/rtti_kind_gen.py
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/compile/ir/Node.cpp
  • mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp
  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
🧠 Learnings (3)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Ensure functions that can fail return appropriate error codes or raise exceptions.

Applied to files:

  • mllm/utils/Common.hpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Use named constants instead of magic numbers.

Applied to files:

  • mllm/utils/Common.hpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Add comments for complex algorithms or non-obvious logic.

Applied to files:

  • examples/CMakeLists.txt
🧬 Code graph analysis (9)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (3)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (13)
  • op (18-18)
  • op (30-30)
  • op (44-44)
  • op (58-58)
  • op (72-72)
  • op (86-86)
  • op (100-100)
  • op (112-112)
  • op (124-124)
  • op (138-138)
  • op (152-152)
  • op (164-164)
  • op (178-178)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (1)
  • op (17-17)
mllm/compile/ir/Node.hpp (2)
  • op (468-468)
  • op (470-470)
mllm/compile/ir/linalg/Attribute.hpp (1)
mllm/compile/ir/linalg/Attribute.cpp (3)
  • LinalgIRQuantizatonSpecAttr (160-160)
  • LinalgIRQuantizatonSpecAttr (162-162)
  • LinalgIRQuantizatonSpecAttr (164-164)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (2)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
  • op (17-17)
mllm/compile/ir/Node.hpp (2)
  • op (468-468)
  • op (470-470)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (4)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (43)
  • op (18-18)
  • op (30-30)
  • op (44-44)
  • op (58-58)
  • op (72-72)
  • op (86-86)
  • op (100-100)
  • op (112-112)
  • op (124-124)
  • op (138-138)
  • op (152-152)
  • op (164-164)
  • op (178-178)
  • op (193-193)
  • writer (20-20)
  • writer (32-32)
  • writer (46-46)
  • writer (60-60)
  • writer (74-74)
  • writer (88-88)
  • writer (102-102)
  • writer (114-114)
  • writer (126-126)
  • writer (140-140)
  • writer (154-154)
  • writer (166-166)
  • writer (180-180)
  • LLMQuantRecipePass (189-189)
  • LLMQuantRecipePass (191-191)
  • p (195-195)
  • make_shared (22-22)
  • make_shared (34-36)
  • make_shared (48-50)
  • make_shared (62-64)
  • make_shared (76-78)
  • make_shared (90-92)
  • make_shared (104-104)
  • make_shared (116-116)
  • make_shared (128-130)
  • make_shared (142-144)
  • make_shared (156-156)
  • make_shared (168-170)
  • make_shared (182-184)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
  • op (17-17)
mllm/compile/ir/linalg/Attribute.hpp (9)
  • node (27-27)
  • node (27-27)
  • node (322-322)
  • node (322-322)
  • node (343-343)
  • node (343-343)
  • create (76-76)
  • p (318-318)
  • p (337-337)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (2)
  • run (16-26)
  • run (16-16)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.cpp (4)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)
  • run (16-109)
  • run (16-16)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
  • op (17-17)
mllm/compile/ir/Node.hpp (4)
  • op (468-468)
  • op (470-470)
  • IRWriter (354-354)
  • IRWriter (355-355)
mllm/compile/ir/Node.cpp (1)
  • IRWriter (290-291)
mllm/compile/ir/linalg/Attribute.cpp (2)
mllm/core/DataTypes.hpp (1)
  • nameOfType (713-713)
mllm/compile/ir/linalg/Attribute.hpp (12)
  • p (318-318)
  • p (337-337)
  • spec (99-104)
  • spec (129-134)
  • spec (159-164)
  • spec (191-196)
  • spec (225-230)
  • spec (258-262)
  • spec (293-297)
  • ctx (320-320)
  • ctx (339-339)
  • ctx (341-341)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
  • LLMQuantRecipePass (293-311)
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (4)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.cpp (2)
  • createMarkQnnGraphPass (68-68)
  • createMarkQnnGraphPass (68-68)
mllm/backends/qnn/aot/passes/OpNamingPass.cpp (2)
  • createOpNamingPass (88-88)
  • createOpNamingPass (88-88)
mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.cpp (2)
  • createMergeLLMHeadIntoMainGraphPass (111-111)
  • createMergeLLMHeadIntoMainGraphPass (111-111)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
  • createLLMQuantRecipePass (364-364)
  • createLLMQuantRecipePass (364-364)
mllm/compile/ir/NodeRTTIClassOfImpl.hpp (1)
mllm/compile/ir/GeneratedRTTIKind.hpp (2)
  • v (161-161)
  • v (161-161)
🪛 Clang (14.0.6)
mllm/utils/Common.hpp

[error] 57-57: function-like macro 'MLLM_RETURN_FALSE_IF_NOT' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 60-60: function-like macro 'MLLM_RETURN_FALSE_IF' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp

[error] 11-11: class 'SplitLLMGraphPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/compile/ir/linalg/Attribute.hpp

[error] 325-325: class 'LinalgIRQuantizatonSpecAttr' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 337-337: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp

[error] 11-11: class 'MergeLLMHeadIntoMainGraphPass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 18-18: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 30-30: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 44-44: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 58-58: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 72-72: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 86-86: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 100-100: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 112-112: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 124-124: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 138-138: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 152-152: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 164-164: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 178-178: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 187-187: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 195-195: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp

[error] 340-340: function-like macro 'RTTI_RK_ATTR_LINALGIRATTR_QUANTIZATIONSPEC_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-macos
  • GitHub Check: build-android
  • GitHub Check: build-x86
🔇 Additional comments (13)
examples/CMakeLists.txt (1)

6-7: The subdirectories are properly configured. Both examples/minicpm_o and examples/minicpm4 exist with valid CMakeLists.txt files and appropriate source code, confirming that the add_subdirectory() calls are valid.

mllm/compile/ir/GeneratedRTTIKind.hpp (1)

139-139: LGTM!

The new RK_Attr_LinalgIRAttr_QuantizationSpec enum value is correctly positioned between QuantizationAnnotation and Last, maintaining the RTTI hierarchy ordering required for classof range checks.

mllm/compile/ir/rtti_kind_gen.py (1)

219-219: LGTM!

The QuantizationSpec attribute derivation is correctly added alongside QuantizationAnnotation, ensuring the RTTI generator produces the corresponding enum value and macro in the generated files.

mllm/compile/ir/Node.cpp (2)

63-65: LGTM!

The implementation correctly aligns with the header changes, returning std::list<node_weak_ptr_t>& for both inputs() and outputs().


298-311: Good addition: clearing belongsTo on op removal.

The addition of op->belongsTo() = nullptr ensures proper cleanup of the parent relationship when an op is completely removed. This is intentional—removeOp is used for deletion with full cleanup, while removeOpWithoutEdgeCut is used for moving ops between regions where the parent reference will be updated by the subsequent insert function (e.g., insertOpAtLast, insertOpAtPos).

mllm/compile/ir/NodeRTTIClassOfImpl.hpp (1)

340-342: LGTM!

The new RTTI_RK_ATTR_LINALGIRATTR_QUANTIZATIONSPEC_IMPL macro follows the established pattern in this auto-generated file. While static analysis suggests using constexpr template functions instead of macros (which is generally good advice), this would require refactoring the entire RTTI generation system and is outside the scope of this change.

mllm/backends/qnn/aot/passes/MergeLLMHeadIntoMainGraphPass.hpp (2)

11-18: LGTM!

The pass class follows the established pattern for IR passes in this codebase. Regarding the static analysis hint about the rule of five: since this class has no data members and is primarily used via shared_ptr through the factory function, the default-generated copy/move operations are safe.


17-17: No action needed: return types are already consistent across all passes.

Verification shows that all passes in the codebase—including MergeLLMHeadIntoMainGraphPass and SplitLLMGraphPass—consistently use uint8_t as the return type for the run() method. The SplitLLMGraphPass.hpp file at line 17 contains uint8_t run(const ir::node_ptr_t& op) override;, which is identical in style to the method in question. No inconsistency exists.

Likely an incorrect or invalid review comment.

mllm/compile/ir/Node.hpp (1)

125-127: Breaking API change is properly implemented.

The Region::inputs() and Region::outputs() methods return std::list<node_weak_ptr_t>& and all usages throughout the codebase have been updated to work with this type. Callers properly iterate over the lists, assign them directly, or explicitly cast to Val via ->cast_<ir::Val>() when the underlying value is needed. No incompatibilities remain.

examples/qwen3_qnn_aot/qnn_aot_cfg.json (1)

18-46: LGTM!

The new builtin_llm_pass structure is well-organized with explicit configuration for model type, lm_head, linear, and kv_cache quantization settings. The per-field descriptors with method, symmetry, and precision provide clear and maintainable configuration.

mllm/compile/ir/linalg/Attribute.hpp (1)

65-77: LGTM!

The QuantizationSpecRaw struct follows the established pattern of other quantization spec types with proper UUID assignment and factory methods.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

293-311: Unused config variable except for model check.

The config variable is fetched at line 294 but only used at line 308 to check the model type. This is fine for now, but consider moving the config fetch inside the conditional or using it for other pattern configurations as well.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)

16-185: LGTM!

The pattern class declarations follow a consistent structure with isMatch, rewrite, and static create factory methods. The short parameter names (op, p) are idiomatic for this codebase and can be safely ignored per static analysis.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mllm/compile/ir/linalg/Attribute.cpp (1)

26-126: Extract duplicated lambda to eliminate 240+ lines of repetition.

The gen_quant_spec_str lambda is defined identically in both LinalgIRQuantizatonAnnotationAttr::dump() and LinalgIRQuantizatonSpecAttr::dump(), resulting in significant code duplication. Extract this logic to a static helper function or free function in an anonymous namespace.

🔎 Proposed refactoring to eliminate duplication

Add a helper function before the class method implementations:

+namespace {
+std::string gen_quant_spec_str(const QuantizationSpec::ptr_t& q) {
+  std::stringstream ss;
+  ss << "QuantSpec(";
+  switch (q->type) {
+    case QuantizationSpecType::kSymPerTensor: {
+      auto _q = std::static_pointer_cast<QuantizationSpecSymPerTensor>(q);
+      ss << "SymPerTensor(";
+      ss << "quant_min: " << _q->quant_min;
+      ss << ", quant_max: " << _q->quant_max;
+      ss << ", quant_to_type: " << nameOfType(_q->quant_to_type);
+      ss << ", scale_type: " << nameOfType(_q->scale_type);
+      ss << ")";
+      break;
+    }
+    // ... (include all other cases)
+    case QuantizationSpecType::kNone: {
+      ss << "None()";
+      break;
+    }
+    case QuantizationSpecType::kRaw: {
+      auto _q = std::static_pointer_cast<QuantizationSpecRaw>(q);
+      ss << "Raw(";
+      ss << "type: " << nameOfType(_q->type_);
+      ss << ")";
+      break;
+    }
+  }
+  ss << ", uuid=" << q->uuid << ")";
+  return ss.str();
+}
+}  // namespace

Then replace both lambda definitions with calls to the helper:

 void LinalgIRQuantizatonAnnotationAttr::dump(IRPrinter& p) {
-  auto gen_quant_spec_str = [](const QuantizationSpec::ptr_t& q) -> std::string {
-    // ... 100 lines of duplicated code
-  };
-
   p.print("QuantAnnotation");
   // ... rest of method
 void LinalgIRQuantizatonSpecAttr::dump(IRPrinter& p) {
-  auto gen_quant_spec_str = [](const QuantizationSpec::ptr_t& q) -> std::string {
-    // ... 100 lines of duplicated code
-  };
-
   p.print("{}", gen_quant_spec_str(spec_));

Also applies to: 167-267

♻️ Duplicate comments (2)
mllm/compile/ir/linalg/Attribute.cpp (1)

251-254: Critical: Missing closing parenthesis in kNone case.

The kNone case outputs "None(" but breaks without emitting the closing parenthesis, producing malformed output like "None(, uuid=...)". The first occurrence of this lambda (line 111) correctly outputs "None()", but this second occurrence still has the bug.

🔎 Required fix
       case QuantizationSpecType::kNone: {
-        ss << "None(";
+        ss << "None()";
         break;
       }
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

365-371: Assertion on rewrite() failure will crash the pass.

This was flagged in a past review. Using MLLM_RT_ASSERT(pattern.second->rewrite(...)) causes the pass to abort on a failed rewrite. Consider logging a warning and continuing to process other ops, or document that failing rewrites are considered fatal errors.

🧹 Nitpick comments (7)
mllm/compile/ir/linalg/Attribute.cpp (2)

144-150: Consider using size_t for the loop counter.

weight_idx is declared as int but is compared against annotation_.weights.size(), which returns size_t. While not causing a bug here, using size_t would avoid signed/unsigned comparison warnings and better express intent.

🔎 Suggested type adjustment
-  int weight_idx = 0;
+  size_t weight_idx = 0;
   for (const auto& [name, spec] : annotation_.weights) {

155-158: Unused ctx parameter in build methods.

All three build() methods accept an IRContext* ctx parameter but never use it. If this parameter is reserved for future functionality or API consistency, consider adding a comment or [[maybe_unused]] attribute to document the intent and suppress compiler warnings.

🔎 Example with attribute annotation
-LinalgIRQuantizatonAnnotationAttr::ptr_t LinalgIRQuantizatonAnnotationAttr::build(IRContext* ctx) {
+LinalgIRQuantizatonAnnotationAttr::ptr_t LinalgIRQuantizatonAnnotationAttr::build([[maybe_unused]] IRContext* ctx) {
   auto ret = std::make_shared<LinalgIRQuantizatonAnnotationAttr>();
   return ret;
 }

Also applies to: 272-281

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (3)

4-11: Missing standard library headers.

The header uses std::vector and std::string in the class declaration but doesn't include <vector> or <string>. While these might be transitively included through other headers, it's best practice to explicitly include what you use.

🔎 Proposed fix
 #pragma once

+#include <string>
 #include <unordered_map>
+#include <vector>

 #include "mllm/compile/ir/Node.hpp"
 #include "mllm/compile/passes/Pass.hpp"

194-207: Consider explicitly deleting copy and move operations.

The class defines a custom constructor and destructor but doesn't explicitly define or delete copy/move constructors and assignment operators. Since this pass holds registered patterns that likely shouldn't be duplicated, explicitly deleting these operations clarifies intent and prevents accidental copies.

🔎 Proposed fix
 class LLMQuantRecipePass final : public ir::Pass {
  public:
   LLMQuantRecipePass();
+  LLMQuantRecipePass(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass(LLMQuantRecipePass&&) = delete;
+  LLMQuantRecipePass& operator=(LLMQuantRecipePass&&) = delete;

   ~LLMQuantRecipePass() override = default;

14-18: Add documentation for the utility function.

Per coding guidelines, public APIs should have clear documentation. This function name is descriptive but quite long—a brief comment explaining when/why to use it and what conditions must hold would help maintainers.

🔎 Example documentation
 //===----------------------------------------------------------------------===//
 // Utility functions
 //===----------------------------------------------------------------------===//
+/// Propagates the quantization spec from a single-input op's input to its single output,
+/// and sets the op's quant_recipe annotation attribute. Returns false if preconditions
+/// (single input/output, input has quant_recipe) are not met.
 bool shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr(const ir::IRContext::ptr_t& ctx,
                                                                   const ir::linalg::LinalgIROp::ptr_t& op);
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

224-275: EmbeddingOp itself is not marked with quant_recipe.

Line 272 sets quant_recipe only on node->nextOp() (the CastTypeOp), but the original node (EmbeddingOp) is not marked. While isMatch checks nextOp()->getAttr("quant_recipe") to avoid re-processing, this asymmetry means inspecting the EmbeddingOp alone won't reveal it's been processed. Consider marking both ops for consistency and easier debugging.

🔎 Proposed fix
   // Attach to quantize node
   node->nextOp()->setAttr("quant_recipe", annotation_attr);
+  // Mark embedding op as processed (can reuse annotation or create a marker attribute)
+  node->setAttr("quant_recipe", annotation_attr);

   return true;

377-377: Inconsistent return value.

Line 348 returns ir::PASS_RET_SUCCESS but line 377 returns the literal 0. Use the named constant consistently for clarity and maintainability.

🔎 Proposed fix
-  return 0;
+  return ir::PASS_RET_SUCCESS;
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eff1182 and 4f94986.

📒 Files selected for processing (3)
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/compile/ir/linalg/Attribute.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/compile/ir/linalg/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
🧬 Code graph analysis (2)
mllm/compile/ir/linalg/Attribute.cpp (2)
mllm/core/DataTypes.hpp (1)
  • nameOfType (713-713)
mllm/compile/ir/linalg/Attribute.hpp (9)
  • p (318-318)
  • p (337-337)
  • spec (99-104)
  • spec (129-134)
  • spec (159-164)
  • spec (191-196)
  • spec (225-230)
  • spec (258-262)
  • spec (293-297)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (2)
mllm/compile/ir/linalg/Attribute.hpp (11)
  • ctx (320-320)
  • ctx (339-339)
  • ctx (341-341)
  • node (27-27)
  • node (27-27)
  • node (322-322)
  • node (322-322)
  • node (343-343)
  • node (343-343)
  • p (318-318)
  • p (337-337)
mllm/backends/qnn/aot/passes/SplitLLMGraphPass.hpp (1)
  • op (17-17)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 14-14: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 15-15: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 18-18: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 25-25: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 37-37: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 51-51: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 65-65: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 79-79: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 93-93: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 107-107: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 119-119: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 131-131: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 145-145: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 159-159: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 171-171: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 185-185: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 194-194: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 202-202: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-x86
  • GitHub Check: build-macos
  • GitHub Check: build-android
🔇 Additional comments (2)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

176-204: LGTM - Past review issue addressed.

Both branches now consistently set view_op->setAttr("quant_recipe", annotation_attr), resolving the previously flagged asymmetry.


380-383: patterns_ map usage requires codebase verification.

The review claims the patterns_ map is unused and suggests removal. However, this requires confirming whether the map is accessed in other methods, derived classes, or external code that depends on this class. Before removal, verify all usages of patterns_ across the codebase.

- Introduced new methods in QnnAOTNodeTensor for parsing tensor types, data types, and quantization parameters from IR.
- Updated QnnAOTNodeTensor constructor to accept a flag for static weight handling.
- Implemented QnnAOTGraph methods for adding operations and compiling the graph.
- Enhanced capture methods in QnnAOTEnv for managing AOT graphs and tensors.
- Refactored LLMQuantRecipePass to support recursive graph visiting and improved quantization pattern matching.
- Updated Elewise visitor patterns to compile operations and register them in the AOT graph.
- Removed obsolete methods and streamlined the interface for better clarity and maintainability.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

102-104: Factory method doesn't pass force_static_weight to constructor.

QnnAOTNodeTensor::create() accepts force_static_weight but doesn't forward it to the constructor.

🔎 Proposed fix
 static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) {
-    return std::make_shared<QnnAOTNodeTensor>(v);
+    return std::make_shared<QnnAOTNodeTensor>(v, force_static_weight);
   }

This fix should be applied in QnnWrappersAPI.hpp at lines 102-104.

♻️ Duplicate comments (3)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (3)

264-265: Missing defensive config validation before nested access.

Accessing config["quant_recipe"]["builtin_llm_pass"]["linear"] without checking if these keys exist can throw an exception. This was flagged in a previous review.


520-538: Missing defensive config validation for Qwen3 pattern registration.

Line 535 accesses nested config keys without validation. This was flagged in a previous review.


78-81: TODO stub rewrite() returns true, silently consuming matched ops.

This was flagged in a previous review.

🧹 Nitpick comments (5)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

169-229: Tensor type parsing handles cases correctly but has a silent fallthrough.

The switch at line 173-174 for kTensorMemTypes_Start does nothing and falls through, leaving ret_qnn_tensor_type as QNN_TENSOR_TYPE_UNDEFINED. Consider adding a log or assertion if this is unexpected.

🔎 Optional: Add explicit handling
     case kTensorMemTypes_Start: {
+      MLLM_WARN("Unexpected kTensorMemTypes_Start in parseQnnTensorTypeFromIR");
       break;
     }
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)

127-131: Factory method initializes only op_name_, leaving other members uninitialized.

The create() method only sets op_name_. The static analysis correctly flags that name_, param_scalar, param_tensor, inputs, and outputs are not initialized. While vectors default-construct to empty, name_ will be empty which may be intentional. Consider adding a default constructor that initializes all members.


170-176: Public member variables in QnnAOTGraph.

The static analysis flags is_compiled_, op_node_, and all_tensors_ as public. While this may be intentional for convenience, consider using accessors for better encapsulation.

mllm/backends/qnn/aot/visitor/Base.hpp (1)

24-31: QnnAOTQuantRecipeBasePattern has no abstract methods.

The class only has default implementations returning false. Consider making it abstract or clarifying its purpose with a comment.

mllm/backends/qnn/aot/visitor/Elewise.cpp (1)

17-46: Implementation correctly uses new capture API.

The compile method properly:

  1. Retrieves environment context
  2. Validates required attributes
  3. Captures input/output tensors
  4. Creates and registers the QNN operation

However, consider adding null checks for the cast results on lines 33-35.

🔎 Optional: Add null checks for casts
   // Start to attach!
   auto i_0 = op->inputs().front()->cast_<ir::tensor::TensorValue>();
   auto i_1 = (*(std::next(op->inputs().begin())))->cast_<ir::tensor::TensorValue>();
   auto o_0 = op->outputs().front()->cast_<ir::tensor::TensorValue>();
+  MLLM_RETURN_FALSE_IF_NOT(i_0 && i_1 && o_0);
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f94986 and 3ae3c54.

📒 Files selected for processing (8)
  • examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
  • examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/visitor/Base.hpp
  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/visitor/Elewise.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/visitor/Elewise.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
🧠 Learnings (2)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
🧬 Code graph analysis (5)
mllm/backends/qnn/aot/visitor/Base.hpp (3)
mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
  • writer (16-16)
  • op (14-14)
mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/visitor/Elewise.cpp (4)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (16)
  • isMatch (73-76)
  • isMatch (73-73)
  • isMatch (86-92)
  • isMatch (86-86)
  • isMatch (119-122)
  • isMatch (119-119)
  • isMatch (132-135)
  • isMatch (132-132)
  • isMatch (169-172)
  • isMatch (169-169)
  • isMatch (182-185)
  • isMatch (182-182)
  • isMatch (195-198)
  • isMatch (195-195)
  • isMatch (208-211)
  • isMatch (208-208)
mllm/backends/qnn/aot/visitor/Elewise.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (5)
  • qnn_context_name (268-268)
  • qnn_context_name (270-271)
  • qnn_context_name (273-274)
  • create (47-49)
  • create (47-47)
mllm/backends/qnn/aot/visitor/Elewise.hpp (4)
mllm/backends/qnn/aot/visitor/Base.hpp (9)
  • writer (19-19)
  • writer (19-19)
  • writer (21-21)
  • writer (30-30)
  • writer (30-30)
  • op (17-17)
  • op (17-17)
  • op (28-28)
  • op (28-28)
mllm/backends/qnn/aot/passes/OpNamingPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/passes/MarkTensorIO.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (17)
  • QnnAOTNodeTensor (106-106)
  • v (102-104)
  • v (102-102)
  • v (109-109)
  • v (111-111)
  • v (113-113)
  • v (115-115)
  • name (151-151)
  • name (253-253)
  • name (255-255)
  • name (257-257)
  • qnn_op (170-170)
  • qnn_context_name (268-268)
  • qnn_context_name (270-271)
  • qnn_context_name (273-274)
  • create (47-49)
  • create (47-47)
mllm/compile/ir/Node.hpp (8)
  • name (302-302)
  • name (306-306)
  • op (468-468)
  • op (470-470)
  • create (265-300)
  • create (265-265)
  • create (369-401)
  • create (369-369)
mllm/backends/qnn/QNNUtils.hpp (4)
  • name (191-192)
  • name (195-196)
  • create (255-257)
  • create (255-255)
mllm/backends/qnn/aot/passes/AOTCompileContext.hpp (2)
  • env (26-26)
  • env (26-26)
mllm/backends/qnn/aot/visitor/Elewise.cpp (2)
  • compile (17-46)
  • compile (17-17)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)
mllm/backends/qnn/QNNUtils.hpp (2)
  • name (191-192)
  • name (195-196)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/visitor/Base.hpp

[error] 21-21: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Elewise.hpp

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 22-22: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 24-24: variable name 'rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 49-49: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 50-50: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 106-106: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 123-123: constructor does not initialize these fields: name_, op_name_, param_scalar, param_tensor, inputs, outputs

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 174-174: member variable 'is_compiled_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 175-175: member variable 'op_node_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 176-176: member variable 'all_tensors_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 271-271: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-android
  • GitHub Check: build-x86
  • GitHub Check: build-macos
🔇 Additional comments (10)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)

179-180: Establishes consistent quantization handling for KV cache states.

Adding the de-quantization step for value_states aligns its precision handling with key_states (lines 172-173), ensuring both KV components undergo the documented "de-quantization and quantization again" pattern before concatenation. This quantization recipe handling is intentional for QNN AOT compatibility.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

21-42: LGTM for recursive graph visiting logic.

The recursive traversal correctly walks through subgraphs via CallGraphOp and applies pattern matching. The guard for existing quant_recipe prevents reprocessing.


49-68: Utility function logic is correct.

The helper properly validates preconditions, creates annotation attributes, and propagates quant specs from input to output.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (3)

772-808: Tensor capture and caching logic looks correct.

The method properly caches static tensors at context level and normal tensors at graph level, avoiding duplicate creation.


141-167: Constructor properly initializes QNN tensor structure.

The v2 tensor initialization is comprehensive and handles the static weight case correctly.


231-321: Data type mapping is comprehensive.

The switch covers all expected data types with appropriate QNN mappings. The FIXME at line 278-279 is noted but acceptable for initial implementation.

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)

267-276: New capture API methods are well-designed.

The new captureAOTGraph, captureAOTNodeOp, and captureQnnAOTNodeTensor methods provide a clean interface for building QNN graphs programmatically.

mllm/backends/qnn/aot/visitor/Elewise.hpp (1)

12-21: Interface update aligns with base class change.

The QnnAOTAddPattern now correctly implements the compile method from the updated QnnAOTBasePattern interface.

mllm/backends/qnn/aot/visitor/Elewise.cpp (1)

13-15: Match condition includes using_qnn attribute check.

The pattern only matches AddOps that have been marked for QNN processing, preventing unintended matches.

mllm/backends/qnn/aot/visitor/Base.hpp (1)

13-22: Interface change from addNode to compile is properly implemented across all derived classes.

The new compile(ir::IRWriter& writer, const ir::op_ptr_t& op) signature provides a cleaner interface that aligns with the writer-based compilation approach. All derived classes (e.g., QnnAOTAddPattern in Elewise.hpp) have been updated to implement this method, and there are no remaining references to the old interface.

Comment on lines 325 to 371
Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR(const ir::tensor::TensorValue::ptr_t& v) {
Qnn_QuantizeParams_t ret = QNN_QUANTIZE_PARAMS_INIT;

MLLM_RT_ASSERT(v->getAttr("quant_recipe"));
auto quant_spec = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;

switch (quant_spec->type) {
case ir::linalg::QuantizationSpecType::kRaw: {
break;
}
case ir::linalg::QuantizationSpecType::kSymPerTensor: {
auto cfg = std::static_pointer_cast<ir::linalg::QuantizationSpecSymPerTensor>(quant_spec);
ret.encodingDefinition = QNN_DEFINITION_DEFINED;
ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_SCALE_OFFSET;
ret.scaleOffsetEncoding = Qnn_ScaleOffset_t{.scale = cfg->scale.item<float>(), .offset = 0};
break;
}
case ir::linalg::QuantizationSpecType::kSymPerChannel: {
auto cfg = std::static_pointer_cast<ir::linalg::QuantizationSpecSymPerChannel>(quant_spec);
ret.encodingDefinition = QNN_DEFINITION_DEFINED;
ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_AXIS_SCALE_OFFSET;
ret.axisScaleOffsetEncoding = Qnn_AxisScaleOffset_t{
.axis = cfg->ch_axis,
.numScaleOffsets = (uint32_t)v->tensor_.shape()[cfg->ch_axis],
.scaleOffset = nullptr, // TODO make this correct.
};
break;
}
case ir::linalg::QuantizationSpecType::kSymPerBlock:
case ir::linalg::QuantizationSpecType::kAsymPerTensor:
case ir::linalg::QuantizationSpecType::kAsymPerChannel:
case ir::linalg::QuantizationSpecType::kAsymPerBlock: {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Can't handle [kSymPerBlock, kAsymPerTensor, kAsymPerChannel, kAsymPerBlock] type");
}
case ir::linalg::QuantizationSpecType::kLPBQ: {
ret.encodingDefinition = QNN_DEFINITION_DEFINED;
ret.quantizationEncoding = QNN_QUANTIZATION_ENCODING_BLOCKWISE_EXPANSION;
ret.blockwiseExpansion = nullptr; // TODO
break;
}
default: {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Can't handle kNone type");
}
}

return ret;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

parseQnnQuantizeParamFromIR asserts quant_recipe exists but calling code may not guarantee this.

Line 328 uses MLLM_RT_ASSERT which will crash if quant_recipe is missing. Consider returning a default or handling gracefully.

🔎 Proposed fix: return default params if no quant_recipe
 Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR(const ir::tensor::TensorValue::ptr_t& v) {
   Qnn_QuantizeParams_t ret = QNN_QUANTIZE_PARAMS_INIT;

-  MLLM_RT_ASSERT(v->getAttr("quant_recipe"));
+  if (!v->getAttr("quant_recipe")) {
+    return ret;  // Return default params if no quant_recipe
+  }
   auto quant_spec = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 325-371, the code
unconditionally asserts that v->getAttr("quant_recipe") exists which can crash
when callers don't provide it; instead check if v->getAttr("quant_recipe") is
null and if so return the default-initialized Qnn_QuantizeParams_t (ret)
immediately (optionally emit a debug/info log), otherwise cast and proceed as
existing; replace the MLLM_RT_ASSERT with a null-check and early return to
handle missing quant_recipe gracefully.

Comment on lines +453 to +459
bool QnnAOTGraph::compile() {
if (is_compiled_) { return true; }
// TODO

is_compiled_ = true;
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

compile() is a stub that only sets is_compiled_ to true.

The method doesn't perform actual compilation. This may cause issues if code expects compiled graphs.

Comment on lines +102 to +104
static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) {
return std::make_shared<QnnAOTNodeTensor>(v);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Factory method doesn't forward force_static_weight to constructor.

The create() method accepts force_static_weight but ignores it when calling the constructor.

🔎 Proposed fix
   static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) {
-    return std::make_shared<QnnAOTNodeTensor>(v);
+    return std::make_shared<QnnAOTNodeTensor>(v, force_static_weight);
   }
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.hpp around lines 102 to 104, the static
factory method accepts force_static_weight but does not pass it to the
QnnAOTNodeTensor constructor; update the create() implementation to forward the
force_static_weight argument to the constructor (or call the correct constructor
overload that accepts it) so the created QnnAOTNodeTensor is constructed with
the intended static-weight behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (8)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

535-537: Add defensive validation for config access to prevent runtime exceptions.

Line 535 directly accesses config["quant_recipe"]["builtin_llm_pass"]["model"] without validating key existence. The nlohmann::json library throws exceptions when keys are missing. Other code in this codebase (AOTPipeline.cpp:17-18, MarkTensorIO.cpp:22) defensively uses .contains() before access.

🔎 Recommended defensive check
-  if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
+  if (config.contains("quant_recipe") && 
+      config["quant_recipe"].contains("builtin_llm_pass") &&
+      config["quant_recipe"]["builtin_llm_pass"].contains("model") &&
+      config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
     addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100);
   }

264-264: Add defensive validation for nested config access.

Line 264 directly accesses config["quant_recipe"]["builtin_llm_pass"]["linear"] without validating key existence. The nlohmann::json library throws exceptions for missing keys. Other code in the codebase (AOTPipeline.cpp, MarkTensorIO.cpp) uses .contains() to defensively check keys before access.

🔎 Recommended defensive check
+  if (!config.contains("quant_recipe") || 
+      !config["quant_recipe"].contains("builtin_llm_pass") ||
+      !config["quant_recipe"]["builtin_llm_pass"].contains("linear")) {
+    MLLM_WARN("Missing required config keys for linear quantization");
+    return false;
+  }
   auto config = AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"]["linear"];
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)

141-168: Direct access to quant_recipe attribute without null check.

Line 144 unconditionally accesses v->getAttr("quant_recipe") and casts it without verifying the attribute exists. If quant_recipe is missing, this will crash. Additionally, line 154 calls parseQnnQuantizeParamFromIR(v) which also asserts on quant_recipe (line 329), creating redundant checks. Consider consolidating the attribute access and adding a null check at the beginning of the constructor.

🔎 Proposed fix: Check for quant_recipe once and handle gracefully
 QnnAOTNodeTensor::QnnAOTNodeTensor(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight) {
   name_ = v->name();
   mllm_tensor_ = v->tensor_;
-  quant_spec_ = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
+  auto quant_attr = v->getAttr("quant_recipe");
+  if (quant_attr) {
+    quant_spec_ = quant_attr->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
+  }
   for (auto s : v->tensor_.shape()) { shape_.emplace_back(s); }

326-406: parseQnnQuantizeParamFromIR asserts quant_recipe exists but calling code may not guarantee this.

Line 329 uses MLLM_RT_ASSERT which will crash if quant_recipe is missing. This is the same attribute accessed in the constructor (line 144). Consider returning a default Qnn_QuantizeParams_t if the attribute is missing, or checking at a higher level before construction.

🔎 Proposed fix: Return default params if no quant_recipe
 Qnn_QuantizeParams_t QnnAOTNodeTensor::parseQnnQuantizeParamFromIR(const ir::tensor::TensorValue::ptr_t& v) {
   Qnn_QuantizeParams_t ret = QNN_QUANTIZE_PARAMS_INIT;
 
-  MLLM_RT_ASSERT(v->getAttr("quant_recipe"));
+  if (!v->getAttr("quant_recipe")) {
+    return ret;  // Return default params if no quant_recipe
+  }
   auto quant_spec = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;

466-486: addOperation has multiple TODOs and doesn't actually add the op to QNN.

The method constructs qnn_op_config but never calls the QNN interface to register the operation with the backend. Lines 477-483 contain TODOs for params, inputs, outputs, validations, and graph registration. This leaves operations in an incomplete state.

Would you like me to help implement the operation registration logic or open an issue to track this?


488-494: compile() is a stub that only sets is_compiled_ to true.

The method doesn't perform actual graph compilation or validation. This may cause issues if downstream code expects a fully compiled graph.


795-798: captureAOTGraph always returns nullptr.

The stub implementation returns nullptr, which will cause null pointer dereferences in callers expecting a valid QnnAOTGraph::ptr_t.

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)

103-105: Factory method doesn't forward force_static_weight to constructor.

The create() method accepts force_static_weight but ignores it when calling the constructor (line 104). This means the parameter has no effect, breaking the intended static weight functionality.

🔎 Proposed fix
   static inline ptr_t create(const ir::tensor::TensorValue::ptr_t& v, bool force_static_weight = false) {
-    return std::make_shared<QnnAOTNodeTensor>(v);
+    return std::make_shared<QnnAOTNodeTensor>(v, force_static_weight);
   }
🧹 Nitpick comments (4)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

21-23: Name the anonymous function parameter for clarity.

Line 22's patterns_w_priority_ parameter is unnamed in the function signature, which reduces readability and violates the coding guidelines requiring all parameters to be named.

🔎 Proposed fix
-void recursiveVisitGraph(const ir::IRContext::ptr_t& ctx,
-                         const std::vector<std::pair<int, ir::Pattern::ptr_t>>& patterns_w_priority_,
-                         const ir::graph::SubGraphOp::ptr_t& sub_graph_ir) {
+void recursiveVisitGraph(const ir::IRContext::ptr_t& ctx,
+                         const std::vector<std::pair<int, ir::Pattern::ptr_t>>& patterns_with_priority,
+                         const ir::graph::SubGraphOp::ptr_t& sub_graph_ir) {

Then update references inside the function accordingly.

As per coding guidelines, Clang requires all parameters to be named.


284-297: Consider caching compiled regex patterns to avoid repeated compilation.

The loop compiles a regex for each config key on every Linear operation processed (line 288). For models with many Linear ops, this repeated compilation could impact performance.

🔎 Performance optimization

Consider caching compiled patterns in a static map initialized once, or compile patterns during pass construction rather than per-op processing.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

807-843: Consider defensive checks for static weight determination.

Lines 814-816 determine __qnn_enable_static_weight based on memory type checks. The logic appears correct, but the compound condition (v->tensor_.memType() <= kParams_End && v->tensor_.memType() >= kParams_Start) assumes the enum ordering is contiguous. If the enum definition changes or additional memory types are inserted, this could break.

🔎 More robust static weight check
   // Check if this value want static qnn weight. The static qnn weight will be shared through one context in diff graphs!
-  if (v->tensor_.memType() == kGlobal || (v->tensor_.memType() <= kParams_End && v->tensor_.memType() >= kParams_Start)) {
+  bool is_param_type = (v->tensor_.memType() == kParamsMMAP || 
+                        v->tensor_.memType() == kParamsNormal);
+  if (v->tensor_.memType() == kGlobal || is_param_type) {
     __qnn_enable_static_weight = true;
   }
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (1)

162-162: Consider encapsulating public data members.

Static analysis flags public members (name_ on line 162, is_compiled_, op_node_, all_tensors_ on lines 182-184) as violating encapsulation. While the current struct-like design may be intentional for AOT tooling, exposing internal state increases coupling and makes it harder to enforce invariants. If the codebase evolves toward stricter encapsulation, consider adding accessor methods.

Based on coding guidelines recommending adherence to best practices like the Google C++ Style Guide.

Also applies to: 182-184

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ae3c54 and 90fab85.

📒 Files selected for processing (3)
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
🧠 Learnings (3)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 22-22: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 24-24: variable name 'rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 49-49: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 50-50: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 107-107: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 128-128: constructor does not initialize these fields: name_, op_name_, param_scalar, param_tensor, inputs, outputs, unreachable_handle_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 174-174: constructor does not initialize these fields: op_node_, all_tensors_, graph_name_, belongs_context_name_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 182-182: member variable 'is_compiled_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 183-183: member variable 'op_node_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 184-184: member variable 'all_tensors_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 279-279: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-macos
  • GitHub Check: build-android
  • GitHub Check: build-x86
🔇 Additional comments (6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

28-33: Verify that aborting on rewrite failure is the intended behavior.

Line 30 uses MLLM_RT_ASSERT(pattern.second->rewrite(iw, some_op)), which will abort execution if any pattern's rewrite returns false. Given that several patterns have TODO stubs that may return true/false inconsistently, this could lead to unexpected crashes during development or runtime.

If rewrite failures should be non-fatal, consider logging a warning and continuing:

🔎 Alternative approach
           if (pattern.second->isMatch(some_op)) {
-            MLLM_RT_ASSERT(pattern.second->rewrite(iw, some_op));
+            if (!pattern.second->rewrite(iw, some_op)) {
+              MLLM_WARN("Pattern rewrite failed for op, continuing with next pattern");
+            } else {
-            break;
+              break;
+            }
           }

312-312: Handle asymmetric quantization if sym requirement is relaxed.

Line 312 returns false when sym is false: MLLM_RETURN_FALSE_IF_NOT(sym). If asymmetric quantization support is planned, this will require additional implementation.

Is asymmetric quantization planned for future releases? If so, consider adding a TODO comment indicating the enhancement, or document this limitation.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

170-230: LGTM!

The tensor type parsing logic is comprehensive and correctly maps IR memory types to QNN tensor types. The attribute-based override mechanism (lines 225-227) appropriately prioritizes explicit graph input/output attributes over inferred types.


454-459: LGTM!

Simple getter/setter methods for operation naming. The fluent interface pattern (returning this from setName) is consistent with other methods in the class.

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (2)

118-126: LGTM!

The refactored internal representation improves efficiency by caching frequently accessed fields (name_, mllm_tensor_, shape_) rather than repeatedly querying the IR. The unreachable_handle_ vector for memory management is a practical approach to ensure cleanup of dynamically allocated QNN structures.


132-136: LGTM!

The factory method for QnnAOTNodeOperation and the name accessors follow consistent patterns. The fluent interface (returning ptr_t from setName) enables method chaining, which is used throughout the codebase.

Also applies to: 156-159

Comment on lines +279 to +286
// FIXME: Maybe error here.
case kInt4: {
ret = QNN_DATATYPE_SFIXED_POINT_4;
break;
}
case kUInt4: {
ret = QNN_DATATYPE_UFIXED_POINT_4;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n mllm/backends/qnn/aot/QnnWrappersAPI.cpp | head -330 | tail -100

Repository: UbiquitousLearning/mllm

Length of output: 3274


🏁 Script executed:

rg "kInt4|kUInt4" --type cpp --type h -B 2 -A 2

Repository: UbiquitousLearning/mllm

Length of output: 1338


🏁 Script executed:

web_search
QNN QNN_DATATYPE_SFIXED_POINT_4 UFIXED_POINT_4 Qualcomm neural network fixed point 4-bit datatype

Repository: UbiquitousLearning/mllm

Length of output: 159


🌐 Web query:

QNN Qualcomm SFIXED_POINT_4 UFIXED_POINT_4 datatype specification

💡 Result:

Short answer: In Qualcomm QNN, QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 are 4‑bit signed/unsigned fixed‑point (quantized) tensor formats (two 4‑bit values are packed into one byte when packing is enabled). Representations and semantics:

  • SFIXED_POINT_4: 4‑bit signed two’s‑complement integer (range −8..+7) stored as fixed‑point quantized values (dequantize with per‑tensor / per‑channel scale ± zero‑point as provided by the model).
  • UFIXED_POINT_4: 4‑bit unsigned integer (range 0..15) stored as fixed‑point quantized values (dequantize with scale/zero‑point).
  • Packed storage: when 4‑bit packing is used, two 4‑bit elements are stored per byte (low nibble = first value, high nibble = second). When not packed the values may be stored using 8‑bit containers (i.e., as S/UFIXED_POINT_8). QNN toolchains provide flags such as --pack_4_bit_weights to enable packing.
    Sources: Qualcomm QNN toolchain notes and examples (discussion of QNN_DATATYPE_SFIXED_POINT_4 / UFIXED_POINT_4 and 4‑bit packing). [1][2]

Sources
[1] Qualcomm AI Hub / QAIRT job docs (shows QNN support for INT4/4‑bit quantization options).
[2] Implementation notes / community writeup describing 4‑bit packing and QNN_DATATYPE_SFIXED_POINT_4 / QNN_DATATYPE_UFIXED_POINT_4 (packing two 4‑bit values per byte and relation to 8‑bit storage).


Remove the unnecessary FIXME comment on line 279.

The mapping of kInt4/kUInt4 to QNN_DATATYPE_SFIXED_POINT_4/QNN_DATATYPE_UFIXED_POINT_4 is correct per QNN documentation (4-bit signed and unsigned fixed-point formats respectively). The types are actively used in quantization recipes (e.g., "w4a16" in LLMQuantRecipePass.cpp), and the mapping follows the same pattern as the 8-bit and 16-bit quantized variants. The FIXME comment can be safely removed.

🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 279 to 286, remove the
unnecessary "// FIXME: Maybe error." comment preceding the kInt4/kUInt4 cases;
the mapping to QNN_DATATYPE_SFIXED_POINT_4 and QNN_DATATYPE_UFIXED_POINT_4 is
correct so simply delete that FIXME line and keep the case branches as-is
without altering logic or indentation.

- Introduced `VectorInt16Attr` for handling vectors of int16_t attributes.
- Added new 4-bit integer types: `mllm_int4_t` and `mllm_uint4_t` with corresponding type info.
- Enhanced `Tensor` class with new methods for element-wise addition, subtraction, and multiplication that accept a `DataTypes` parameter.
- Updated element-wise operation handling to support `VectorInt16Attr` for int16 tensor constants.
- Ensured UUID generation for quantization specifications in `QuantizationSpecAsymPerBlock` and `QuantizationSpecLPBQ`.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (2)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

795-798: captureAOTGraph returns nullptr—implement graph retrieval.

The captureAOTGraph method (lines 795-798) always returns nullptr. Callers expecting a valid QnnAOTGraph::ptr_t will encounter null pointer errors. Implement the method to look up the graph from contexts_[qnn_context_name]->graphs_[g_name] and return it, or document why this stub is acceptable.

🔎 Proposed fix
 QnnAOTGraph::ptr_t QnnAOTEnv::captureAOTGraph(const std::string& qnn_context_name, const std::string& g_name) {
-  // TODO
-  return nullptr;
+  MLLM_RT_ASSERT_EQ(contexts_.count(qnn_context_name), 1);
+  MLLM_RT_ASSERT_EQ(contexts_[qnn_context_name]->graphs_.count(g_name), 1);
+  return contexts_[qnn_context_name]->graphs_[g_name];
 }
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

865-867: Add defensive validation for nested config key access to prevent runtime exceptions.

Line 865 directly accesses nested config keys without validation: config["quant_recipe"]["builtin_llm_pass"]["model"]. The nlohmann::json library throws exceptions when accessing missing keys via operator[]. Other code in the codebase (e.g., AOTPipeline.cpp, MarkTensorIO.cpp) defensively checks keys with .contains() before access. Apply the same pattern here:

-  if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
+  if (config.contains("quant_recipe") 
+      && config["quant_recipe"].contains("builtin_llm_pass")
+      && config["quant_recipe"]["builtin_llm_pass"].contains("model")
+      && config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
     addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100);
   }
🧹 Nitpick comments (7)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

837-837: Complete the Qwen3 Attention pattern implementation or document deferral.

Line 837's TODO comment indicates incomplete implementation. The pattern successfully identifies Q/K/V/O linear operations (lines 814-835) but performs no transformation. This creates a sophisticated stub that matches complex patterns but applies no quantization annotations, which may confuse maintainers.

Do you want me to open an issue to track completing this implementation, or should a comment be added explaining why transformation is deferred?


69-71: Initialize local pointer variables to nullptr for safety.

Lines 69 and 71 declare pointer variables without initialization. While they're assigned in all switch branches before use, defensive initialization to nullptr prevents undefined behavior if the switch logic changes or an unexpected case is added.

🔎 Proposed fix
-  ir::linalg::LinalgIRQuantizatonSpecAttr::ptr_t ret = nullptr;
+  ir::linalg::LinalgIRQuantizatonSpecAttr::ptr_t ret = nullptr;  // Already initialized
 
-  ir::linalg::QuantizationSpec::ptr_t spec = nullptr;
+  ir::linalg::QuantizationSpec::ptr_t spec = nullptr;  // Already initialized

Note: These are already initialized correctly. The static analyzer warning can be safely ignored.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)

272-285: Add copy/move constructor/assignment operators or document the rule-of-zero.

The LLMQuantRecipePass class (line 272) defines a default destructor but lacks copy constructor, copy assignment operator, move constructor, and move assignment operator. Since the class manages STL containers with shared_ptr members, consider explicitly deleting copy operations and defaulting move operations to make ownership semantics clear.

🔎 Proposed fix
 class LLMQuantRecipePass final : public ir::Pass {
  public:
   LLMQuantRecipePass();
 
+  // Explicitly delete copy operations
+  LLMQuantRecipePass(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete;
+
+  // Explicitly default move operations
+  LLMQuantRecipePass(LLMQuantRecipePass&&) = default;
+  LLMQuantRecipePass& operator=(LLMQuantRecipePass&&) = default;
+
   ~LLMQuantRecipePass() override = default;
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)

194-195: Extract magic number to a named constant.

Line 194 uses the magic number -20 for the masking offset. This value appears to be a minimum attention score for masked positions. Extract it to a named constant to improve readability and maintainability.

🔎 Proposed fix
+  // Attention mask offset for masked positions
+  static constexpr float kMaskedAttentionOffset = -20.0f;
+
   // Masked Softmax
   auto attn_min = attn.min(-1, true);
-  float minus_value = -20;
-  attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min.add(minus_value, kInt16));
+  attn = nn::functional::where(causal_mask.equal(0.f), attn, attn_min.add(kMaskedAttentionOffset, kInt16));

As per coding guidelines: Use named constants instead of magic numbers.

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

488-494: Implement graph compilation or add tracking comment.

The compile() method (lines 488-494) only sets is_compiled_ flag without performing actual compilation. The TODO comment suggests this is incomplete. If compilation is deferred intentionally, document the expected behavior; otherwise, implement the QNN graph compilation via the backend interface.

Additional context

Typical QNN graph compilation involves calling qnn_interface_.graphFinalize() to optimize and prepare the graph for execution. If this is intentionally deferred to a later pass, add a detailed comment explaining the design decision.

mllm/core/Tensor.hpp (1)

289-291: LGTM - new scalar arithmetic methods with explicit DataTypes parameter.

The new add, sub, and mul methods provide useful control over the data type of the scalar RHS operand, complementing the existing operator+, operator-, operator* methods.

Consider adding a div(float rhs, DataTypes data_type = kFloat32) method for symmetry with operator/(float) if this use case is anticipated.

mllm/core/Tensor.cpp (1)

219-226: Consider adding kInt16PerTensorSym support for consistency.

The sub() method's type switch omits kInt16PerTensorSym, while both add() (line 182) and mul() (line 268) include it. This matches the existing inconsistency in operator- (line 198-205), but for consistency and completeness, consider adding this case unless it's intentionally excluded.

🔎 Proposed addition
     switch (data_type) {
       case kFloat32: *(rhs_tensor.ptr<float>()) = rhs; break;
       case kFloat16: *(rhs_tensor.ptr<half_float::half>()) = half_float::half(rhs); break;
       case kInt32: *(rhs_tensor.ptr<int32_t>()) = rhs; break;
       case kInt16: *(rhs_tensor.ptr<int16_t>()) = rhs; break;
       case kInt8: *(rhs_tensor.ptr<int8_t>()) = rhs; break;
+      case kInt16PerTensorSym: *(rhs_tensor.ptr<int16_t>()) = rhs; break;
       default: NYI("Type is not supported"); break;
     }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90fab85 and 426b7c1.

📒 Files selected for processing (17)
  • examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
  • examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
  • examples/qwen3_qnn_aot/qwen3_qnn_aot_quant_recipe.mir
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/compile/ir/GeneratedRTTIKind.hpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
  • mllm/compile/ir/builtin/Attribute.cpp
  • mllm/compile/ir/builtin/Attribute.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/core/DataTypes.cpp
  • mllm/core/DataTypes.hpp
  • mllm/core/Tensor.cpp
  • mllm/core/Tensor.hpp
  • mllm/core/aops/ElewiseOps.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • mllm/compile/ir/rtti_kind_gen.py
  • mllm/compile/ir/GeneratedRTTIKind.hpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/compile/ir/builtin/Attribute.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
  • mllm/core/DataTypes.hpp
  • mllm/core/aops/ElewiseOps.cpp
  • mllm/core/DataTypes.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/core/Tensor.hpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
  • mllm/core/Tensor.cpp
  • mllm/compile/ir/builtin/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/compile/ir/builtin/Attribute.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
  • mllm/core/DataTypes.hpp
  • mllm/core/aops/ElewiseOps.cpp
  • mllm/core/DataTypes.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/core/Tensor.hpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
  • mllm/core/Tensor.cpp
  • mllm/compile/ir/builtin/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/core/aops/ElewiseOps.cpp
  • mllm/core/DataTypes.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/core/Tensor.cpp
  • mllm/compile/ir/builtin/Attribute.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/compile/ir/builtin/Attribute.hpp
  • mllm/compile/ir/linalg/Attribute.hpp
  • mllm/core/DataTypes.hpp
  • mllm/core/aops/ElewiseOps.cpp
  • mllm/core/DataTypes.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/core/Tensor.hpp
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp
  • mllm/core/Tensor.cpp
  • mllm/compile/ir/builtin/Attribute.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
🧠 Learnings (3)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
🧬 Code graph analysis (7)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (2)
mllm/nn/Functional.hpp (1)
  • where (161-161)
mllm/nn/Functional.cpp (2)
  • where (203-206)
  • where (203-203)
mllm/compile/ir/builtin/Attribute.hpp (1)
mllm/compile/ir/builtin/Attribute.cpp (14)
  • VectorInt16Attr (107-107)
  • VectorInt16Attr (109-109)
  • data (23-23)
  • data (23-23)
  • data (39-39)
  • data (39-39)
  • data (53-53)
  • data (53-53)
  • data (85-85)
  • data (85-85)
  • data (99-99)
  • data (99-99)
  • data (113-113)
  • data (113-113)
mllm/compile/ir/linalg/Attribute.hpp (1)
mllm/compile/ir/linalg/Attribute.cpp (3)
  • LinalgIRQuantizatonSpecAttr (160-160)
  • LinalgIRQuantizatonSpecAttr (162-162)
  • LinalgIRQuantizatonSpecAttr (164-164)
mllm/core/Tensor.hpp (1)
mllm/core/Tensor.cpp (16)
  • rhs (126-128)
  • rhs (126-126)
  • rhs (130-132)
  • rhs (130-130)
  • rhs (134-136)
  • rhs (134-134)
  • rhs (138-140)
  • rhs (138-138)
  • rhs (148-168)
  • rhs (148-148)
  • rhs (192-211)
  • rhs (192-192)
  • rhs (234-254)
  • rhs (234-234)
  • rhs (278-297)
  • rhs (278-278)
mllm/core/Tensor.cpp (4)
mllm/core/Tensor.hpp (17)
  • rhs (272-272)
  • rhs (273-273)
  • rhs (274-274)
  • rhs (275-275)
  • rhs (278-278)
  • rhs (283-283)
  • rhs (284-284)
  • rhs (285-285)
  • rhs (286-286)
  • rhs (289-289)
  • rhs (290-290)
  • rhs (291-291)
  • rhs (296-296)
  • rhs (298-298)
  • rhs (300-300)
  • rhs (302-302)
  • device (399-399)
mllm/engine/Context.hpp (1)
  • device (31-31)
mllm/core/TensorViewImpl.cpp (2)
  • device (10-10)
  • device (10-10)
mllm/engine/Context.cpp (2)
  • instance (14-17)
  • instance (14-14)
mllm/compile/ir/builtin/Attribute.cpp (1)
mllm/compile/ir/builtin/Attribute.hpp (15)
  • VectorInt16Attr (158-158)
  • VectorInt16Attr (160-160)
  • VectorInt16Attr (162-162)
  • p (42-42)
  • p (82-82)
  • p (100-100)
  • p (120-120)
  • p (142-142)
  • p (164-164)
  • data (40-40)
  • data (62-62)
  • data (80-80)
  • data (124-124)
  • data (146-146)
  • data (168-168)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
  • LLMQuantRecipePass (845-868)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 23-23: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 26-26: variable name 'rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 67-67: 2 adjacent parameters of 'genSimpleQuantizationSpecAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 69-69: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 71-71: variable 'spec' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 114-114: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 115-115: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 135-135: 2 adjacent parameters of 'noSharingSingleInAndSingleOutQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 135-135: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 145-145: variable 'quant_spec_i0' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 146-146: variable 'quant_spec_o0' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

[error] 24-24: 2 adjacent parameters of 'makeRotaryPosEmbedding' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 194-194: 20 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/compile/ir/builtin/Attribute.hpp

[error] 154-154: class 'VectorInt16Attr' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 164-164: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/compile/ir/linalg/Attribute.hpp

[error] 329-329: class 'LinalgIRQuantizatonSpecAttr' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 341-341: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp

[error] 340-340: function-like macro 'RTTI_RK_ATTR_LINALGIRATTR_QUANTIZATIONSPEC_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)


[error] 369-369: function-like macro 'RTTI_RK_ATTR_BUILTINIRATTR_VECTORINT16ATTR_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 28-28: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 30-30: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 37-37: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 51-51: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 63-63: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 77-77: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 91-91: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 103-103: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 115-115: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 129-129: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 143-143: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 157-157: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 171-171: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 185-185: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 197-197: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 209-209: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 223-223: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 237-237: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 249-249: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 263-263: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 272-272: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 280-280: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-android
  • GitHub Check: build-macos
  • GitHub Check: build-x86
🔇 Additional comments (8)
mllm/compile/ir/NodeRTTIClassOfImpl.hpp (1)

340-342: LGTM - new RTTI macros follow established patterns.

The new macros for QuantizationSpec and VectorInt16Attr are consistent with the existing RTTI infrastructure. Since this is auto-generated code, the static analysis hints about preferring constexpr template functions can be safely disregarded.

Also applies to: 369-371

mllm/compile/ir/builtin/Attribute.cpp (1)

107-119: LGTM - VectorInt16Attr implementation follows established pattern.

The implementation mirrors VectorFP32Attr consistently, including constructors, dump, data() accessor, and build() factory method.

mllm/compile/ir/builtin/Attribute.hpp (1)

154-174: LGTM - VectorInt16Attr declaration follows existing patterns.

The class mirrors VectorFP32Attr exactly. The static analysis warnings about Rule of Five and short parameter name p can be disregarded since:

  1. The same pattern is used by VectorFP32Attr and other attribute classes - the shared_ptr semantics of DEFINE_SPECIFIC_IR_CLASS handle proper copying.
  2. The parameter name p for IRPrinter is used consistently across all dump methods in this file.
mllm/core/aops/ElewiseOps.cpp (1)

42-54: LGTM - type-specific constant attribute creation.

The switch statement appropriately handles kFloat32 and kInt16 constant types with corresponding attribute classes, and logs a warning for unsupported types. This provides a clear path for extending support to additional types in the future.

mllm/core/DataTypes.hpp (1)

588-593: LGTM - type info registrations and macros for 4-bit types.

The MLLM_DEFINE_BASIC_TYPE_INFO, MLLM_DEFINE_SELF_TYPE_INFO, and type alias macros are correctly defined for the new 4-bit types, with appropriate min/max values (signed: -8 to 7, unsigned: 0 to 15).

Also applies to: 707-708, 771-772

mllm/compile/ir/linalg/Attribute.hpp (3)

65-77: LGTM - QuantizationSpecRaw follows established spec pattern.

The new QuantizationSpecRaw struct correctly inherits from QuantizationSpec, sets the type to kRaw, and generates a UUID. The factory methods (create(DataTypes) and create()) mirror the pattern used by other spec types.


247-247: LGTM - UUID generation added to QuantizationSpecAsymPerBlock and QuantizationSpecLPBQ.

Adding UUID generation to the create() methods ensures these specs have unique identifiers, consistent with other quantization spec types.

Also applies to: 262-262, 283-283, 299-299


329-348: LGTM - LinalgIRQuantizatonSpecAttr follows existing attribute patterns.

The new class mirrors LinalgIRQuantizatonAnnotationAttr in structure. The static analysis warnings about Rule of Five and short parameter name p can be disregarded per established codebase conventions (same pattern used by other IR attribute classes).

Comment on lines +466 to +486
void QnnAOTGraph::addOperation(const QnnAOTNodeOperation::ptr_t& qnn_op) {
auto env = AOTCompileContext::getInstance().getEnv();
auto qnn_interface = env->getFuncSymbol().qnn_interface_;

Qnn_OpConfig_t qnn_op_config = QNN_OPCONFIG_INIT;
qnn_op_config.version = QNN_OPCONFIG_VERSION_1;
qnn_op_config.v1 = QNN_OPCONFIG_V1_INIT;
qnn_op_config.v1.name = qnn_op->name_.c_str();
qnn_op_config.v1.packageName = qnn_op->package_name_.c_str();
qnn_op_config.v1.typeName = qnn_op->op_name_.c_str();

// TODO PARAMs
// TODO Inputs
// TODO Outputs

// TODO node validations

// TODO add node to graph.

op_node_.insert({qnn_op->getName(), qnn_op});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Complete the QNN operation addition or document the stub.

The addOperation method (lines 466-486) constructs a Qnn_OpConfig_t and populates basic fields but has TODO comments for critical parts (lines 477-483): parameters, inputs, outputs, validation, and actually adding the node to the QNN graph. The method stores the operation locally but never registers it with the QNN backend, making this a non-functional stub.

Would you like me to help implement the QNN operation addition logic, or should this be tracked in a separate issue?

🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 466-486, implement the
TODOs: build the full Qnn_OpConfig_t by converting the qnn_op's parameters into
QNN parameter entries, map its inputs and outputs into QNN tensor
descriptors/array entries, and set any required versioned fields (sizes/counts)
instead of leaving them empty; perform node-level validations (required fields
present, compatible shapes/dtypes) and return/raise errors on failure; call the
QNN backend API via qnn_interface (use the appropriate graph add/register
function on the interface, pass the populated Qnn_OpConfig_t and capture the
resulting node handle/ID), check the backend return status and log or propagate
errors, and only then store the operation in op_node_ (or update the stored
entry with the backend node handle). Ensure any temporary C strings or buffers
referenced by Qnn_OpConfig_t remain valid for the call (allocate/copy if
necessary) and free them on error/cleanup.

Comment on lines +100 to +101
CASE(kInt4)
CASE(kUInt4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if lanesOfType is called with kInt4/kUInt4 anywhere
rg -n 'lanesOfType.*kInt4|lanesOfType.*kUInt4' --type=cpp

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Check generic usages of lanesOfType that might pass kInt4/kUInt4
rg -n 'lanesOfType\s*\(' --type=cpp -B 2 -A 3

Repository: UbiquitousLearning/mllm

Length of output: 13068


🏁 Script executed:

# Examine the actual lanesOfType function implementation and surrounding context
fd DataTypes.cpp --type f | xargs cat -n | sed -n '1,200p'

Repository: UbiquitousLearning/mllm

Length of output: 5688


🏁 Script executed:

# Check if kInt4/kUInt4 are used elsewhere in the codebase in ways that might call lanesOfType
rg -n 'kInt4|kUInt4' --type=cpp -B 1 -A 1 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 1941


Missing kInt4/kUInt4 cases in lanesOfType function.

The new 4-bit types are added to bytesOfType and nameOfType, but lanesOfType (lines 9–56) lacks corresponding CASE entries. Since these types have MllmDataTypeInfo specializations defined (DataTypes.hpp lines 707–708), calling lanesOfType(kInt4) or lanesOfType(kUInt4) will trigger the NYI("Unknown data type") path instead of returning valid lane counts.

Add CASE(kInt4) and CASE(kUInt4) to lanesOfType to maintain consistency and prevent runtime failures.

🤖 Prompt for AI Agents
In mllm/core/DataTypes.cpp around lines 100–101, the lanesOfType function is
missing CASE(kInt4) and CASE(kUInt4) entries; add CASE(kInt4) and CASE(kUInt4)
to the switch so those enum values return the correct lane counts (use the same
pattern as the other integer TYPE cases and reference the MllmDataTypeInfo
specializations defined in DataTypes.hpp for the 4-bit types) to avoid falling
through to NYI("Unknown data type").

Comment on lines +170 to +190
Tensor Tensor::add(float rhs, DataTypes data_type) {
auto rhs_tensor = Tensor::empty({1}, data_type, device()).alloc();
if (device() != kCPU) {
Context::instance().buildOpAndSubmitTask(
OpTypes::kFill, aops::FillOpOptions{.type = aops::FillOpTypes::kSpecific, .value = rhs}, {rhs_tensor});
} else {
switch (data_type) {
case kFloat32: *(rhs_tensor.ptr<float>()) = rhs; break;
case kFloat16: *(rhs_tensor.ptr<half_float::half>()) = half_float::half(rhs); break;
case kInt32: *(rhs_tensor.ptr<int32_t>()) = rhs; break;
case kInt16: *(rhs_tensor.ptr<int16_t>()) = rhs; break;
case kInt8: *(rhs_tensor.ptr<int8_t>()) = rhs; break;
case kInt16PerTensorSym: *(rhs_tensor.ptr<int16_t>()) = rhs; break;
default: NYI("Type is not supported"); break;
}
}
auto opts = aops::AddOpOptions{};
opts.setInputsConstant(0, 0);
opts.setInputsConstant(1, 1);
return Context::instance().buildOpAndSubmitTask(OpTypes::kAdd, opts, {*this, rhs_tensor})[0];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Consider extracting common scalar operation logic to reduce duplication.

The add(), sub(), and mul() methods duplicate significant logic from their corresponding operators (operator+, operator-, operator*). Each method repeats ~21 lines of tensor creation, type switching, and option configuration.

Consider refactoring to extract a common helper function that handles scalar-to-tensor conversion with configurable type, then use it in both the operators and these new methods.

🔎 Example refactoring approach
// Helper to convert scalar to tensor with specified type
Tensor Tensor::scalarToTensor(float value, DataTypes data_type) {
  auto rhs_tensor = Tensor::empty({1}, data_type, device()).alloc();
  if (device() != kCPU) {
    Context::instance().buildOpAndSubmitTask(
        OpTypes::kFill, aops::FillOpOptions{.type = aops::FillOpTypes::kSpecific, .value = value}, {rhs_tensor});
  } else {
    switch (data_type) {
      case kFloat32: *(rhs_tensor.ptr<float>()) = value; break;
      case kFloat16: *(rhs_tensor.ptr<half_float::half>()) = half_float::half(value); break;
      case kInt32: *(rhs_tensor.ptr<int32_t>()) = value; break;
      case kInt16: *(rhs_tensor.ptr<int16_t>()) = value; break;
      case kInt8: *(rhs_tensor.ptr<int8_t>()) = value; break;
      case kInt16PerTensorSym: *(rhs_tensor.ptr<int16_t>()) = value; break;
      default: NYI("Type is not supported"); break;
    }
  }
  return rhs_tensor;
}

Tensor Tensor::add(float rhs, DataTypes data_type) {
  auto rhs_tensor = scalarToTensor(rhs, data_type);
  auto opts = aops::AddOpOptions{};
  opts.setInputsConstant(0, 0);
  opts.setInputsConstant(1, 1);
  return Context::instance().buildOpAndSubmitTask(OpTypes::kAdd, opts, {*this, rhs_tensor})[0];
}

Then apply the same pattern to sub(), mul(), and potentially update the operator overloads to use this helper.

Also applies to: 213-232, 256-276


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the file and confirm the code location
find . -name "Tensor.cpp" -type f | head -5

Repository: UbiquitousLearning/mllm

Length of output: 89


🏁 Script executed:

# Search for usage patterns of these methods with explicit data_type arguments
rg -n '\.add\(.*,\s*k[A-Z]' --type=cpp -A 2 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 592


🏁 Script executed:

# Look for similar sub() and mul() methods
rg -n '\.sub\(.*,\s*k[A-Z]|\.mul\(.*,\s*k[A-Z]' --type=cpp -A 2 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 463


🏁 Script executed:

# Check operation backend definitions for type handling
rg -n 'AddOp|SubOp|MulOp' mllm --type=cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 2168


🏁 Script executed:

# Look for validation logic in Tensor class
rg -n 'buildOpAndSubmitTask.*Add|buildOpAndSubmitTask.*Sub|buildOpAndSubmitTask.*Mul' --type=cpp -A 5 -B 2

Repository: UbiquitousLearning/mllm

Length of output: 6330


🏁 Script executed:

# Check if there's any type validation or casting logic
rg -n 'dtype\(\)|data_type' mllm/core/Tensor.cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 1083


🏁 Script executed:

# Find and examine AddOp, SubOp, MulOp implementations
find . -name "*Op.hpp" -o -name "*Op.cpp" | grep -E "Add|Sub|Mul" | head -10

Repository: UbiquitousLearning/mllm

Length of output: 446


🏁 Script executed:

# Look at the operation backend headers/implementations
find mllm/core/aops -name "*.hpp" -o -name "*.cpp" | head -20

Repository: UbiquitousLearning/mllm

Length of output: 646


🏁 Script executed:

# Check if there's type compatibility logic in the operation system
rg -n 'typeCompatible|validateType|checkType|dtype.*dtype' mllm/core/aops --type=cpp --type=hpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 94


🏁 Script executed:

# Look for any type casting or broadcasting in arithmetic operations
rg -n 'cast|broadcast|promote.*type' mllm/core/aops --type=cpp --type=hpp -i | head -20

Repository: UbiquitousLearning/mllm

Length of output: 94


🏁 Script executed:

# Examine the full context of how add method is declared in header
rg -n 'Tensor::add\(float' mllm/core --type=hpp -B 2 -A 5

Repository: UbiquitousLearning/mllm

Length of output: 94


🏁 Script executed:

# Check the actual file size and see more context around these methods
wc -l mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 91


🏁 Script executed:

# Find AddOp, SubOp, MulOp implementations
find mllm/core/aops -name "*AddOp*" -o -name "*SubOp*" -o -name "*MulOp*"

Repository: UbiquitousLearning/mllm

Length of output: 122


🏁 Script executed:

# Look at the add method declaration in Tensor.h
find mllm -name "Tensor.h" -o -name "Tensor.hpp" | xargs cat | grep -A 5 "Tensor add"

Repository: UbiquitousLearning/mllm

Length of output: 357


🏁 Script executed:

# Check if there are any type validation or compatibility checks in the tensor operations
rg 'dtype\(\).*!=.*data_type|data_type.*!=.*dtype' mllm --type-list | head -5

Repository: UbiquitousLearning/mllm

Length of output: 456


🏁 Script executed:

# Look for type compatibility in the backend operations
find mllm -name "*.h" -o -name "*.hpp" | xargs grep -l "AddOp\|SubOp\|MulOp" 2>/dev/null | head -5

Repository: UbiquitousLearning/mllm

Length of output: 272


🏁 Script executed:

# Examine the AddOp implementation
find mllm/core/aops -name "AddOp.cpp" -o -name "AddOp.h"

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Get the full context of the add() method and nearby methods
sed -n '148,280p' mllm/core/Tensor.cpp

Repository: UbiquitousLearning/mllm

Length of output: 6076


🏁 Script executed:

# Check the ElewiseOps implementations to see if they handle mixed types
head -100 mllm/backends/cpu/ops/ElewiseOps.hpp

Repository: UbiquitousLearning/mllm

Length of output: 3068


🏁 Script executed:

# Look for any type broadcasting or casting logic in the elewise operations
grep -n 'dtype\|type.*cast\|promote' mllm/backends/cpu/ops/ElewiseOps.hpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Check if there's documentation about type compatibility in AddOp, SubOp, MulOp
find mllm/core/aops -name "*.hpp" | xargs grep -l "AddOp\|SubOp\|MulOp"

Repository: UbiquitousLearning/mllm

Length of output: 124


🏁 Script executed:

# Look at the actual implementation or at least the class definition
sed -n '1,50p' mllm/core/aops/MatMulOp.hpp

Repository: UbiquitousLearning/mllm

Length of output: 1710


🏁 Script executed:

# Search for any tests that show valid type combinations for add/sub/mul
find . -name "*test*" -type f | xargs grep -l "\.add\(.*k[A-Z]\|\.sub\(.*k[A-Z]\|\.mul\(.*k[A-Z]" 2>/dev/null | head -5

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Check if there's a pattern for mixed-type operations - look at the Qwen example more carefully
sed -n '188,200p' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

Repository: UbiquitousLearning/mllm

Length of output: 555


🏁 Script executed:

# Look at the ElewiseOps.hpp to see AddOp, SubOp, MulOp base class definitions
head -200 mllm/core/aops/ElewiseOps.hpp

Repository: UbiquitousLearning/mllm

Length of output: 3196


🏁 Script executed:

# Check CPU implementation of AddOp to see if there's type handling
find mllm/backends/cpu/ops -name "*.cpp" | xargs grep -l "CPUAddOp\|CPUSubOp\|CPUMulOp" | head -1 | xargs head -300

Repository: UbiquitousLearning/mllm

Length of output: 12943


🏁 Script executed:

# Look for any comments or documentation about type compatibility
rg -n 'type.*compati|mixed.*type|broadcast.*type' mllm/core --type=cpp | head -10

Repository: UbiquitousLearning/mllm

Length of output: 208


🏁 Script executed:

# Check if operations validate input tensor types before execution
rg -n 'dtype.*==\|dtype.*!=\|type.*mismatch' mllm/backends/cpu/ops --type=cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Look at the actual CPU Add operation implementation
find mllm/backends/cpu -name "*.cpp" | xargs grep -A 30 "void CPUAddOp::forward"

Repository: UbiquitousLearning/mllm

Length of output: 2883


🏁 Script executed:

# Check how output dtype is determined in the operation context
rg -n 'buildOpAndSubmitTask.*Add|buildOpAndSubmitTask.*Sub|buildOpAndSubmitTask.*Mul' mllm/core --type=cpp -B 5 -A 10 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 3179


🏁 Script executed:

# Look for how output tensor dtype is set when operations are executed
rg -n 'output\.dtype\(\)|dtype.*output' mllm/core/aops/ElewiseOps.cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 178


🏁 Script executed:

# Check if there's any type validation or conversion in the BaseOp or Context when executing operations
rg -n 'dtype.*mismatch|type.*check|validateInput' mllm/core --type=cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Look at how the output tensor is created to see what dtype it gets
sed -n '100,150p' mllm/core/aops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1503


🏁 Script executed:

# Check how output dtype is determined more carefully - look at the macro definition
sed -n '1,120p' mllm/core/aops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 9353


🏁 Script executed:

# Verify the actual output type by looking at the full macro
grep -A 20 '__MLLM_ELEWISE_OP_IMPL\|__MLLM_ELEWISE_OP_DEFINE' mllm/core/aops/ElewiseOps.cpp | head -60

Repository: UbiquitousLearning/mllm

Length of output: 2887


🏁 Script executed:

# Check what happens when inputs have mismatched dtypes - look for any type casting
rg -n 'cast.*dtype|dtype.*cast|input.*cast' mllm/backends/cpu/ops/ElewiseOps.cpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 713


🏁 Script executed:

# See if there's validation in the Context that checks input compatibility
find mllm/core -name "Context*" | head -3

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Check if there are any tests showing mixed-type add/sub/mul operations
find . -path "*/test*" -name "*.cpp" -o -name "*.hpp" | xargs grep -l "add.*kInt\|mul.*kFloat\|sub.*kInt" 2>/dev/null

Repository: UbiquitousLearning/mllm

Length of output: 875


🏁 Script executed:

# Let me understand the exact problem - when add() uses different data_type, what happens at the backend
# Check if there's any type casting or validation when inputs have different dtypes
rg -n 'input0\.ptr.*input1\.ptr|inputs\[0\]\.ptr.*inputs\[1\]\.ptr' mllm/backends/cpu/ops/ElewiseOps.cpp | head -10

Repository: UbiquitousLearning/mllm

Length of output: 1330


🏁 Script executed:

# Look more carefully at what happens when input types don't match
sed -n '200,250p' mllm/backends/cpu/ops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 2334


🏁 Script executed:

# Check if there are type compatibility assertions or checks
rg -n 'assert.*dtype|CHECK.*dtype|validate.*dtype' mllm --type=cpp | head -10

Repository: UbiquitousLearning/mllm

Length of output: 181


🏁 Script executed:

# See if the Qwen example actually uses the result of the mixed-type operation
sed -n '188,200p' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

Repository: UbiquitousLearning/mllm

Length of output: 555


🏁 Script executed:

# Let me trace exactly what happens in the Qwen3 case - what is attn_min's dtype?
sed -n '185,198p' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp

Repository: UbiquitousLearning/mllm

Length of output: 627


🏁 Script executed:

# Check what type scale_ is in the Qwen3 model
grep -n 'scale_' examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp | head -10

Repository: UbiquitousLearning/mllm

Length of output: 179


🏁 Script executed:

# Now verify the critical issue - if input1 is int16 but output is float, what happens?
# The backend will call input1.ptr<float>() on an int16 tensor - this is unsafe casting
# Let's confirm the reshape macro determines output dtype as inputs[0].dtype()
sed -n '28,50p' mllm/core/aops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1991


🏁 Script executed:

# Check if there's any type validation before the backend operation is called
rg -n 'input.*dtype.*==\|inputs\[.*\]\.dtype' mllm/core/Tensor.cpp | grep -A 2 -B 2 'buildOpAndSubmitTask.*Add\|buildOpAndSubmitTask.*Sub\|buildOpAndSubmitTask.*Mul'

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Look at what happens when we have a float tensor and call .add(scalar, kInt16)
# Does it actually work or fail? Check CPU backend implementation more carefully
sed -n '140,160p' mllm/backends/cpu/ops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1390


🏁 Script executed:

# Verify the critical type safety issue - when input1 is int16 but we're casting to float
# Let's see what happens when we have float input0 and int16 input1 being added
sed -n '140,180p' mllm/backends/cpu/ops/ElewiseOps.cpp

Repository: UbiquitousLearning/mllm

Length of output: 2432


🏁 Script executed:

# Check if there's any backend that properly handles mixed-type operations
find mllm/backends -name "ElewiseOps.cpp" -o -name "ElewiseOps.hpp" | head -5

Repository: UbiquitousLearning/mllm

Length of output: 140


🏁 Script executed:

# Look for any OpenCL or other backend implementations that might handle mixed types
find mllm/backends -path "*/ops/Elewise*" -type f | head -10

Repository: UbiquitousLearning/mllm

Length of output: 218


🏁 Script executed:

# Search for any documentation or comments about type compatibility in mixed operations
rg -n 'mixed.*type|type.*mismatch|type.*compat' mllm --type=cpp --type=hpp | head -20

Repository: UbiquitousLearning/mllm

Length of output: 94


🏁 Script executed:

# Critical check: verify what happens when add() is called with a different data_type
# Let's trace through the exact scenario: float tensor with .add(scalar, kInt16)
# The output will be float (from reshape using inputs[0].dtype())
# But input1 will be int16
# In CPU backend, when dtype=kFloat32, it does input1.ptr<mllm_fp32_t>() on an int16 tensor
# This is unsafe! Let's verify by looking at the actual case

python3 << 'EOF'
# Let's trace what happens in the Qwen3 example:
# attn_min is float32 (output of min operation)
# attn_min.add(minus_value, kInt16) creates:
#   - input0 = attn_min (float32)
#   - input1 = rhs_tensor (int16, shape {1})
#   - output = Tensor::empty with dtype = inputs[0].dtype() = float32
# 
# In CPU backend forward() for AddOp:
#   - dtype = output.dtype() = float32
#   - Executes case kFloat32:
#   - Calls input1.ptr<mllm_fp32_t>() on a tensor that is actually int16
#   - This is undefined behavior / type safety violation

print("Identified critical issue:")
print("- add/sub/mul with different data_type allows mixed-type operations")
print("- Output dtype is always LHS dtype (inputs[0].dtype())")
print("- Backend assumes input1 has same dtype as output")
print("- input1.ptr<float>() called on int16 data = TYPE SAFETY BUG")
EOF

Repository: UbiquitousLearning/mllm

Length of output: 153


Mixed-type scalar operations contain a critical type safety bug where the backend unsafely casts mismatched input types.

When add(), sub(), or mul() are called with a data_type parameter differing from the tensor's dtype, the operation framework creates a type mismatch:

  • Output dtype is forced to match the LHS tensor (via inputs[0].dtype() in the reshape macro)
  • Input1 is created with the specified data_type parameter
  • The CPU backend's forward() method switches on output dtype and directly casts both inputs to that type without validation

This causes undefined behavior. For instance, float_tensor.add(scalar, kInt16) results in:

  • input0: float32
  • input1: int16
  • output: float32

When the backend processes the float32 case, it calls input1.ptr<mllm_fp32_t>() on an int16 tensor, violating type safety.

This bug is triggered in the Qwen3 model where attn_min.add(minus_value, kInt16) mixes float32 and int16 types.

Required fixes:

  1. Validate type compatibility in the reshape macro (reject mismatches or auto-cast consistently)
  2. Ensure backend implementations only cast between compatible types
  3. Add assertions or checks before unsafe pointer casts
🤖 Prompt for AI Agents
In mllm/core/Tensor.cpp around lines 170-190, the scalar add implementation
creates rhs tensors with an explicit data_type that can differ from the LHS
tensor dtype, which leads to unsafe backend pointer casts and UB; fix by (1)
validating that the requested data_type matches the LHS dtype or performing an
explicit, safe cast/conversion of the scalar into the LHS dtype before creating
rhs_tensor (i.e., always allocate rhs_tensor with the same dtype as *this and
convert rhs into that representation), (2) update the reshape/op construction so
input/output dtypes are consistent (reject mixed-type ops or canonicalize types
before building ops), and (3) add runtime checks/assertions in CPU/backend
forward paths to assert input tensor dtypes match expected output dtype before
performing ptr<T>() casts so mismatches fail early rather than causing UB.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CMakeLists.txt (1)

263-264: LGTM! Consider adding a brief comment.

The FlatBuffers integration is configured correctly:

  • Tests are disabled to avoid unnecessary builds
  • EXCLUDE_FROM_ALL ensures it's only built when needed by dependent targets

Optionally, consider adding a brief comment explaining the purpose:

🔎 Suggested documentation
+# FlatBuffers for IR serialization
 set(FLATBUFFERS_BUILD_TESTS OFF)
 add_subdirectory(third_party/flatbuffers EXCLUDE_FROM_ALL)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 426b7c1 and c065a81.

📒 Files selected for processing (5)
  • .gitmodules
  • CMakeLists.txt
  • mllm/CMakeLists.txt
  • third_party/flatbuffers
  • tools/mllm-ir-netron/README.md
✅ Files skipped from review due to trivial changes (1)
  • third_party/flatbuffers
🧰 Additional context used
📓 Path-based instructions (2)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/CMakeLists.txt
{mllm,mllm-cli,pymllm}/**/*.{toml,json,txt,md}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

When suggesting new dependencies, evaluate their maturity, licensing, and maintenance status. Prefer well-established and actively maintained libraries.

Files:

  • mllm/CMakeLists.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-android
  • GitHub Check: build-macos
  • GitHub Check: build-x86
🔇 Additional comments (2)
.gitmodules (1)

40-42: No issues with the FlatBuffers target name. The code correctly uses flatbuffers (lowercase), which is the proper CMake target exported by FlatBuffers when included via add_subdirectory(third_party/flatbuffers) in the root CMakeLists.txt. The namespaced target flatbuffers::flatbuffers does not exist in FlatBuffers—only FlatBuffers::FlatBuffers (CamelCase) is provided as an alias for find_package usage, which is not relevant here.

Likely an incorrect or invalid review comment.

mllm/CMakeLists.txt (1)

35-35: Remove unused FlatBuffers dependency or initialize the git submodule and use correct target name.

The target name flatbuffers is incorrect for the current build configuration. When using add_subdirectory(third_party/flatbuffers), the CMake target should be FlatBuffers (the interface alias exported by FlatBuffers' CMakeLists.txt). Additionally, the third_party/flatbuffers git submodule is currently empty and will cause the build to fail. More importantly, FlatBuffers is not used anywhere in the codebase—no includes or references exist—making this a dead dependency that should be removed entirely.

Copy link
Owner

@UbiquitousLearning UbiquitousLearning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- Added a function `cloneQuantizationSpecType` to clone quantization specifications without duplicating scale data.
- Updated `LLMQuantRecipeRMSNormPattern` to ensure weight quantization methods match input types without sharing.
- Modified `RMSNormOp` to register parameters in the IR context if they do not already exist.
- Changed `mllm_uint4_t` structure to use `uint8_t` instead of `int8_t` for data representation.
- Added multiple TODO comments in `QnnAOTNodeTensor` and `QnnAOTEnv` regarding the handling of constant values and static weights.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)

684-685: Add defensive config validation to prevent runtime exceptions.

Lines 684-685 directly access nested config keys without validation. The nlohmann::json library throws exceptions if keys don't exist. Follow the defensive pattern used elsewhere in the codebase (e.g., AOTPipeline.cpp, MarkTensorIO.cpp) by checking key existence with .contains() before access.

🔎 Proposed fix
-  auto config = AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"]["linear"];
-  auto use_config = config["fallback"];
+  if (!AOTCompileContext::getInstance().getConfig().contains("quant_recipe") ||
+      !AOTCompileContext::getInstance().getConfig()["quant_recipe"].contains("builtin_llm_pass") ||
+      !AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"].contains("linear")) {
+    MLLM_WARN("LinearPattern: quant_recipe.builtin_llm_pass.linear config not found");
+    return false;
+  }
+  auto config = AOTCompileContext::getInstance().getConfig()["quant_recipe"]["builtin_llm_pass"]["linear"];
+  if (!config.contains("fallback")) {
+    MLLM_WARN("LinearPattern: fallback config not found");
+    return false;
+  }
+  auto use_config = config["fallback"];
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (5)

286-293: FIXME comment should be removed (duplicate concern).

The mapping of kInt4/kUInt4 to QNN_DATATYPE_SFIXED_POINT_4/QNN_DATATYPE_UFIXED_POINT_4 is correct per QNN documentation. The FIXME comment on line 286 is misleading and should be removed.

As noted in a previous review, the 4-bit quantized types are actively used (e.g., "w4a16" quantization recipes), and the mapping follows the same pattern as 8-bit and 16-bit variants.


336-336: Assertion on quant_recipe should be replaced with graceful handling (duplicate concern).

Line 336 uses MLLM_RT_ASSERT which will crash if quant_recipe is missing. A previous review suggested checking for null and returning default parameters instead.

This is the same issue flagged in previous reviews. Consider implementing the suggested fix from the past review comment.


473-493: addOperation implementation incomplete (duplicate concern).

The method constructs a Qnn_OpConfig_t but has TODO comments for critical parts (lines 484-490): parameters, inputs, outputs, validation, and actually adding the node to the QNN graph. Line 492 only stores the operation locally without registering it with the QNN backend.

This is the same issue flagged in previous reviews. The method is currently non-functional as a stub.


495-501: compile() is a stub (duplicate concern).

The method only sets is_compiled_ to true without performing actual compilation. This may cause issues if code expects compiled graphs.

This was flagged in previous reviews. Consider tracking this with a TODO comment or implementing the compilation logic.


802-805: captureAOTGraph returns nullptr (duplicate concern).

The stub implementation returns nullptr, which will cause null pointer issues for callers expecting a valid graph.

This was flagged in previous reviews.

🧹 Nitpick comments (5)
mllm/core/DataTypes.hpp (2)

331-334: Add documentation for the 4-bit signed integer type.

Unlike the GGUF quantization types above (lines 111-194), this struct lacks documentation explaining its purpose and usage. Consider adding a Doxygen comment clarifying how the 4-bit value is stored within the 8-bit field and any usage constraints.

🔎 Suggested documentation
+/**
+ * @struct mllm_int4_t
+ * @brief 4-bit signed integer type.
+ * 
+ * Represents a 4-bit signed integer value stored in an 8-bit field.
+ * Valid range: -8 to 7.
+ */
 typedef struct {  // NOLINT
   int8_t data;
 } mllm_int4_t;

As per coding guidelines, public APIs should have clear docstrings.


336-339: Previous issue resolved; consider adding documentation.

The previous concern about using int8_t instead of uint8_t has been addressed—line 337 now correctly uses uint8_t for the unsigned 4-bit type. However, like mllm_int4_t above, this struct would benefit from a Doxygen comment explaining its purpose and valid range (0 to 15).

🔎 Suggested documentation
+/**
+ * @struct mllm_uint4_t
+ * @brief 4-bit unsigned integer type.
+ * 
+ * Represents a 4-bit unsigned integer value stored in an 8-bit field.
+ * Valid range: 0 to 15.
+ */
 typedef struct {  // NOLINT
   uint8_t data;
 } mllm_uint4_t;

As per coding guidelines, public APIs should have clear docstrings.

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)

275-288: Consider explicitly defining or deleting special member functions.

The class declares a default destructor but doesn't explicitly declare copy/move constructors or assignment operators. While the default behavior is correct for the member types used, explicitly deleting these operations would make the intent clearer and prevent accidental copying of Pass instances, which is typically undesirable.

🔎 Proposed enhancement
 class LLMQuantRecipePass final : public ir::Pass {
  public:
   LLMQuantRecipePass();
 
   ~LLMQuantRecipePass() override = default;
+  
+  LLMQuantRecipePass(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass& operator=(const LLMQuantRecipePass&) = delete;
+  LLMQuantRecipePass(LLMQuantRecipePass&&) = default;
+  LLMQuantRecipePass& operator=(LLMQuantRecipePass&&) = default;
 
   uint8_t run(const ir::node_ptr_t& op) override;
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (2)

142-147: Remove duplicate TODO comments.

Six identical TODO comments clutter the code. Consolidate to a single comment at line 142.

🔎 Proposed fix
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
+  // TODO: Constant values should also use Static and can be pruned

816-821: Remove duplicate TODO comments.

Six identical TODO comments clutter the code (same issue as lines 142-147). Consolidate to a single comment at line 816.

🔎 Proposed fix
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
-  // TODO Constant value should also use Static!!! And they can be pruned
+  // TODO: Constant values should also use Static and can be pruned
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c065a81 and f3dcbf8.

📒 Files selected for processing (6)
  • examples/qwen3_qnn_aot/qwen3_qnn_aot.mir
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/core/DataTypes.hpp
  • mllm/core/aops/RMSNormOp.cpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/core/aops/RMSNormOp.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/core/DataTypes.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/core/aops/RMSNormOp.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/core/DataTypes.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/core/aops/RMSNormOp.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/core/aops/RMSNormOp.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/core/DataTypes.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
🧠 Learnings (4)
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh} : TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Suggest adding unit tests for untested complex logic or edge cases.

Applied to files:

  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
📚 Learning: 2025-11-25T07:26:06.575Z
Learnt from: CR
Repo: UbiquitousLearning/mllm PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T07:26:06.575Z
Learning: Applies to {mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi} : Use named constants instead of magic numbers.

Applied to files:

  • mllm/core/DataTypes.hpp
🧬 Code graph analysis (3)
mllm/core/aops/RMSNormOp.cpp (2)
mllm/core/aops/RMSNormOp.hpp (1)
  • weight_ (34-34)
mllm/core/aops/EmbeddingOp.cpp (1)
  • guard (54-54)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (3)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (13)
  • QnnAOTNodeTensor (107-107)
  • v (103-105)
  • v (103-103)
  • v (110-110)
  • v (112-112)
  • v (114-114)
  • v (116-116)
  • name (156-156)
  • name (261-261)
  • name (263-263)
  • name (265-265)
  • create (48-50)
  • create (48-48)
mllm/compile/ir/linalg/Attribute.hpp (3)
  • type (68-74)
  • type (68-68)
  • create (76-76)
mllm/core/DataTypes.hpp (1)
  • nameOfType (733-733)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp (1)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (1)
  • LLMQuantRecipePass (941-964)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

[error] 23-23: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 26-26: variable name 'rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 67-67: 2 adjacent parameters of 'genSimpleQuantizationSpecAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 69-69: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 71-71: variable 'spec' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 114-114: 2 adjacent parameters of 'shareQuantSpecSingleInputToSingleOutputAndSetOpQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 115-115: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 135-135: 2 adjacent parameters of 'noSharingSingleInAndSingleOutQuantAnnoAttr' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 135-135: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 145-145: variable 'quant_spec_i0' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 146-146: variable 'quant_spec_o0' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 169-169: 2 adjacent parameters of 'cloneQuantizationSpecType' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 174-174: variable 'cloned_spec' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 28-28: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 30-30: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 40-40: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 54-54: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 66-66: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 80-80: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 94-94: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 106-106: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 118-118: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 132-132: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 146-146: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 160-160: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 174-174: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 188-188: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 200-200: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 212-212: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 226-226: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 240-240: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 252-252: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 266-266: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 275-275: class 'LLMQuantRecipePass' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 283-283: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-x86
  • GitHub Check: build-macos
  • GitHub Check: build-android
🔇 Additional comments (4)
mllm/core/aops/RMSNormOp.cpp (1)

9-10: LGTM! Required includes for new IR registration logic.

These includes provide the necessary types (ir::tensor::RegisterOp and ir::graph::SubGraphOp) used in the parameter registration block below.

mllm/core/DataTypes.hpp (3)

588-593: LGTM!

The 4-bit type information is correctly defined with accurate ranges: signed 4-bit (-8 to 7) and unsigned 4-bit (0 to 15). The implementation follows the established pattern used for other basic types in the file.


707-708: LGTM!

The self-type-info registrations correctly map the 4-bit enum values to their corresponding C++ types, following the established pattern used throughout the file.


771-772: LGTM!

The macro definitions for 4-bit types follow the established naming convention and maintain consistency with the existing compatibility layer, even though lines 737-739 indicate these macros are deprecated.

break;
}
default: {
NYI("Only support [int16, int8, bf16, f16, sf32] for now.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in data type list.

Line 439 contains "sf32" which should be "f32" to match the standard float32 type abbreviation.

🔎 Proposed fix
         default: {
-          NYI("Only support [int16, int8, bf16, f16, sf32] for now.");
+          NYI("Only support [int16, int8, bf16, f16, f32] for now.");
         }
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around line 439, the
error message lists "sf32" but it should be the standard "f32"; update the
string literal to replace "sf32" with "f32" so the message reads "Only support
[int16, int8, bf16, f16, f32] for now." and ensure no other occurrences of
"sf32" remain in this file.

Comment on lines +444 to +446
MLLM_WARN("LLMQuantRecipeEqualPattern Only support constant Value as second inputs right now. Pls send us a issue or PR "
"if you want to compare two normal tensor(rather than static-tensor).");
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct the pattern name in warning message.

The warning message references "LLMQuantRecipeEqualPattern" but this code is in LLMQuantRecipeElementwisePattern::rewrite. This copy-paste error could confuse debugging.

🔎 Proposed fix
     } else {
-      MLLM_WARN("LLMQuantRecipeEqualPattern Only support constant Value as second inputs right now. Pls send us a issue or PR "
+      MLLM_WARN("LLMQuantRecipeElementwisePattern Only support constant Value as second inputs right now. Pls send us a issue or PR "
                 "if you want to compare two normal tensor(rather than static-tensor).");
       return false;
     }
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around lines 444-446, the
warning message incorrectly references "LLMQuantRecipeEqualPattern" (copy-paste
error) while inside LLMQuantRecipeElementwisePattern::rewrite; update the string
to reference "LLMQuantRecipeElementwisePattern" (or the exact class/function
name used in this file) so the log correctly identifies the emitting pattern,
keeping the rest of the message unchanged.

break;
}
default: {
NYI("Only support [int16, int8, bf16, f16, f32] for now.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in data type list.

Line 597 contains "f32" where it should match the standard format. The supported types list should be consistent with other patterns.

🔎 Proposed fix

Ensure the data type list matches the switch cases above (lines 585-591) which include kFloat32, kFloat16, kBFloat16, etc.

🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around line 597, the NYI
message lists "f32" which is inconsistent with the type names used earlier in
the switch (e.g., kFloat32, kFloat16, kBFloat16); update the message to use the
matching standard name (e.g., "float32") so the supported-types list reads
consistently (for example: [int16, int8, bf16, f16, float32]).

Comment on lines +727 to +732
if (use_config["method"] == "LPBQ") {
// Unpack
std::string precision = use_config["precision"];
bool sym = use_config["sym"];
int block_size = use_config["block_size"];
MLLM_RETURN_FALSE_IF_NOT(sym);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add defensive validation for nested config values.

Lines 727-732 access nested config values (use_config["method"], use_config["precision"], etc.) without checking if these keys exist. Add defensive checks before accessing these values to prevent runtime exceptions.

🔎 Proposed fix
-    if (use_config["method"] == "LPBQ") {
+    if (!use_config.contains("method") || !use_config.contains("precision") || !use_config.contains("sym") || !use_config.contains("block_size")) {
+      MLLM_WARN("LinearPattern: incomplete config for linear operation");
+      return false;
+    }
+    if (use_config["method"] == "LPBQ") {
       // Unpack
       std::string precision = use_config["precision"];
       bool sym = use_config["sym"];
       int block_size = use_config["block_size"];
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around lines 727-732, the
code reads nested config keys (use_config["method"], ["precision"], ["sym"],
["block_size"]) without verifying those keys exist or have expected types; add
defensive validation before each access: check use_config.contains("method") and
that it is a string and equals "LPBQ" (or handle missing), then check
contains("precision") (string), contains("sym") (bool) and
contains("block_size") (int), and use MLLM_RETURN_FALSE_IF_NOT or equivalent to
fail early with clear messages if any key is missing or has the wrong type; if
appropriate provide safe defaults or short-circuit the LPBQ branch when
validations fail.

Comment on lines +961 to +963
if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add defensive config validation for conditional pattern registration.

Line 961 directly accesses nested config keys without validation: config["quant_recipe"]["builtin_llm_pass"]["model"]. The nlohmann::json library throws exceptions if keys don't exist. Use defensive checking with .contains() before access, as noted in past review comments and consistent with patterns used elsewhere in the codebase.

🔎 Proposed fix
-  if (config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
+  if (config.contains("quant_recipe") && 
+      config["quant_recipe"].contains("builtin_llm_pass") &&
+      config["quant_recipe"]["builtin_llm_pass"].contains("model") &&
+      config["quant_recipe"]["builtin_llm_pass"]["model"] == "qwen3") {
     addPattern(LLMQuantRecipeQwen3AttentionPattern::create(), "qwen3_attention", 100);
   }
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp around lines 961-963, the
code directly indexes config["quant_recipe"]["builtin_llm_pass"]["model"] which
can throw if any key is missing; add defensive checks before registering the
qwen3 pattern: verify config.contains("quant_recipe") &&
config["quant_recipe"].is_object() &&
config["quant_recipe"].contains("builtin_llm_pass") &&
config["quant_recipe"]["builtin_llm_pass"].is_object() &&
config["quant_recipe"]["builtin_llm_pass"].contains("model") (or use a safe
value lookup) and only compare the model string if present and of string type;
if missing, skip registration (or use a sensible default) to avoid exceptions.


name_ = v->name();
mllm_tensor_ = v->tensor_;
quant_spec_ = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unsafe access to quant_recipe attribute.

Line 151 directly accesses the quant_recipe attribute without checking if it exists. If the attribute is missing, this will cause a null pointer dereference before the assertion on line 336 in parseQnnQuantizeParamFromIR.

🔎 Proposed fix
   name_ = v->name();
   mllm_tensor_ = v->tensor_;
-  quant_spec_ = v->getAttr("quant_recipe")->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
+  if (auto attr = v->getAttr("quant_recipe")) {
+    quant_spec_ = attr->cast_<ir::linalg::LinalgIRQuantizatonSpecAttr>()->spec_;
+  }
   for (auto s : v->tensor_.shape()) { shape_.emplace_back(s); }
🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around line 151, the code directly
dereferences the "quant_recipe" attribute which can be absent; modify the code
to first check for the attribute's presence (e.g., use
v->hasAttr("quant_recipe") or capture getAttr("quant_recipe") into a variable
and check for null) before calling cast_ and accessing spec_; if the attribute
is missing, return or throw a clear error (or set a default) so you avoid a null
dereference and ensure the later assertion in parseQnnQuantizeParamFromIR is not
hit unexpectedly.


// Prepare data
auto num_scale_offsets = (uint32_t)v->tensor_.size(cfg->ch_axis);
Qnn_ScaleOffset_t* scale_array = (Qnn_ScaleOffset_t*)malloc(sizeof(Qnn_ScaleOffset_t) * num_scale_offsets);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing null checks after malloc calls.

Lines 355, 384, and 393 call malloc without checking if allocation succeeded. If allocation fails, subsequent code will dereference null pointers, causing crashes.

🔎 Proposed fix

Add null checks after each malloc:

       Qnn_ScaleOffset_t* scale_array = (Qnn_ScaleOffset_t*)malloc(sizeof(Qnn_ScaleOffset_t) * num_scale_offsets);
+      if (!scale_array) {
+        MLLM_ERROR_EXIT(ExitCode::kCoreError, "Failed to allocate memory for scale_array");
+      }
       MLLM_RT_ASSERT_EQ(num_scale_offsets, cfg->scale.size(0));

Apply similar checks at lines 384 and 393.

Also applies to: 384-384, 393-393

🤖 Prompt for AI Agents
In mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 355, 384 and 393, the
code calls malloc without checking for NULL which can lead to dereferencing null
on allocation failure; after each malloc call check the returned pointer for
NULL, log or propagate an appropriate error, free any previously allocated
memory as needed, and return or set an error status instead of continuing
execution so no null pointer is dereferenced.

Comment on lines +34 to +39

// Register Params
if (weight_ && !ir_ctx->lookupSymbolTable(getName() + ".weight")) {
ir::IRWriterGuard guard(ir_ctx, ir_ctx->lookupSymbolTable("init")->cast_<ir::graph::SubGraphOp>()->getTopRegion());
ir_ctx->create<ir::tensor::RegisterOp>(ir_ctx->create<ir::tensor::TensorValue>(weight_));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add null-safety checks to prevent potential crash.

Line 37 chains multiple pointer operations without null checks:

  • lookupSymbolTable("init") could return nullptr if the "init" subgraph doesn't exist
  • cast_<ir::graph::SubGraphOp>() could fail if the type is incorrect
  • getTopRegion() could return nullptr

Any failure in this chain will cause undefined behavior or a crash.

🔎 Proposed fix with defensive checks
   // Register Params
   if (weight_ && !ir_ctx->lookupSymbolTable(getName() + ".weight")) {
-    ir::IRWriterGuard guard(ir_ctx, ir_ctx->lookupSymbolTable("init")->cast_<ir::graph::SubGraphOp>()->getTopRegion());
+    auto init_symbol = ir_ctx->lookupSymbolTable("init");
+    if (!init_symbol) {
+      NYI("init subgraph not found in IR context");
+    }
+    auto init_subgraph = init_symbol->cast_<ir::graph::SubGraphOp>();
+    if (!init_subgraph) {
+      NYI("init symbol is not a SubGraphOp");
+    }
+    auto init_region = init_subgraph->getTopRegion();
+    if (!init_region) {
+      NYI("init subgraph has no top region");
+    }
+    ir::IRWriterGuard guard(ir_ctx, init_region);
     ir_ctx->create<ir::tensor::RegisterOp>(ir_ctx->create<ir::tensor::TensorValue>(weight_));
   }
🤖 Prompt for AI Agents
In mllm/core/aops/RMSNormOp.cpp around lines 34-39, the chained calls
ir_ctx->lookupSymbolTable("init")->cast_<ir::graph::SubGraphOp>()->getTopRegion()
can dereference null/invalid pointers; add null-safety checks: first store the
result of lookupSymbolTable("init") in a local variable and verify it's
non-null, then attempt the cast and verify the cast succeeded (or use a
safe/dynamic cast API), then check the returned region is non-null before
constructing the IRWriterGuard; if any check fails, skip registration or
log/handle the error gracefully instead of proceeding to call methods on null.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pymllm/backends/qualcomm/transformers/static_qwen3.py (1)

1-3: Module appears incomplete with only unused imports.

This file contains only import statements with no implementation, docstring, or module-level documentation. Please verify whether this is intentional (e.g., a stub for future implementation) or if implementation is missing from this PR.

If this is a placeholder, consider adding:

  1. A module-level docstring explaining the intended purpose
  2. A TODO comment indicating what needs to be implemented

If you'd like, I can help generate the implementation structure based on the quantization patterns mentioned in the PR objectives. Would you like me to open an issue to track the completion of this module?

pyproject.toml (1)

31-32: Add version constraints for torch and torchao.

Both torch and torchao are used in quantization features (particularly in pymllm/backends/qualcomm/transformers/static_qwen3.py for pt2e quantization) without version constraints, which can lead to reproducibility issues and unexpected breaking changes. PyTorch is particularly known for introducing breaking changes between major versions, and torchao's quantization functionality is version-sensitive.

Consider pinning to specific versions or specifying minimum versions, similar to other dependencies in this file (e.g., apache-tvm-ffi == 0.1.0b4, pyyaml >= 6.0.2). At minimum: torch >= 2.0 and torchao >= 0.1.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3dcbf8 and bacdc16.

📒 Files selected for processing (5)
  • pymllm/backends/qualcomm/transformers/README.md
  • pymllm/backends/qualcomm/transformers/__init__.py
  • pymllm/backends/qualcomm/transformers/static_qwen3.py
  • pymllm/backends/qualcomm/transformers/train.py
  • pyproject.toml
✅ Files skipped from review due to trivial changes (1)
  • pymllm/backends/qualcomm/transformers/README.md
🧰 Additional context used
📓 Path-based instructions (5)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • pymllm/backends/qualcomm/transformers/static_qwen3.py
{mllm,mllm-cli,pymllm}/**/*.{sh,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

If a file starts with a shebang, it must be the first line and contain only the interpreter path and optional single argument, encoded in UTF-8.

Files:

  • pymllm/backends/qualcomm/transformers/static_qwen3.py
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • pymllm/backends/qualcomm/transformers/static_qwen3.py
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • pymllm/backends/qualcomm/transformers/static_qwen3.py
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • pymllm/backends/qualcomm/transformers/static_qwen3.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-macos
  • GitHub Check: build-x86
  • GitHub Check: build-android

@chenghuaWang chenghuaWang merged commit 7c7c10c into UbiquitousLearning:main Dec 29, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants