feat(qualcomm): Qnn AOT Lowering Passes #584

oreomaker · 2026-01-07T03:48:48Z

Please check Guidelines for Contributing.

Summary by CodeRabbit

New Features
- Added Gather support end-to-end (API, CPU backend, and QNN AOT lowering) plus many new QNN AOT patterns (Linear, RMSNorm, MatMul, Concat, Transpose, Softmax, Repeat, Where, Equal) to broaden op coverage.
- Added public gather functional API.
- Added QNN tensor debug utility for inspection.
Bug Fixes
- Corrected QNN quantization datatype mappings and improved handling for per-channel and LPBQ quantization; relaxed some runtime assertions.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…tions

- Add QnnAOTLinearPattern implementation to handle LinearOp conversion to QNN FullyConnected operation - Register Linear pattern in LLM2QnnLoweringPass - Fix tensor access in setupComplexTensorQuantization by using correct index {i, 0, 0} instead of {i} - Correct block axis calculation using fixed dimension index 1 - Update quantization spec creation with proper axis parameter

coderabbitai · 2026-01-07T03:49:00Z

📝 Walkthrough

Walkthrough

Adds Gather operation support end-to-end (core aops, functional API, CPU op) and extensive QNN AOT lowering/quantization plumbing; refactors QNN utilities and updates the qwen3 example to use nn::functional::gather for embedding lookups.

Changes

Cohort / File(s)	Summary
Core Gather Operation `mllm/core/aops/GatherOp.hpp`, `mllm/core/aops/GatherOp.cpp`	New GatherOp and GatherOpOptions (dim). Implements trace, reshape (shape inference), forward stub (NYI).
Functional API `mllm/nn/Functional.hpp`, `mllm/nn/Functional.cpp`	Adds `mllm::nn::functional::gather(const Tensor&, int dim, const Tensor&)` wrapper calling GatherOp.
CPU Backend Gather `mllm/backends/cpu/ops/GatherOp.hpp`, `mllm/backends/cpu/ops/GatherOp.cpp`, `mllm/backends/cpu/CPUBackend.cpp`	Adds CPUGatherOp implementation and factory; registers CPUGatherOpFactory in CPUBackend; forward copies per-element (index validation, negative indices, dtype checks).
QNN Utilities & Wrappers `mllm/backends/qnn/QNNUtils.hpp`, `mllm/backends/qnn/QNNUtils.cpp`, `mllm/backends/qnn/aot/QnnWrappersAPI.hpp`, `mllm/backends/qnn/aot/QnnWrappersAPI.cpp`	Maps host integer dtypes to fixed-point QNN types; removes/assertions for AOT paths; adds debug `__printQnnTensor`; implements `setupComplexTensorQuantization` for per-channel and LPBQ handling and adjusts quant parsing.
QNN AOT Visitor Patterns `mllm/backends/qnn/aot/visitor/{Gather,RMSNorm,Linear,Concat,Equal,Matmul,Repeat,Softmax,Transpose,Where}.{hpp,cpp}`	Many new visitor patterns that lower linalg ops to QNN AOT nodes (validate attrs, capture tensors, emit params). Gather, RMSNorm, Linear have denser logic (axis/weight reshape/LPBQ).
QNN AOT Passes & Quant Recipe `mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`, `mllm/backends/qnn/aot/passes/LLMQuantRecipePass.{cpp,hpp}`	Registers new patterns (Gather, RMSNorm, Linear) in lowering pass; adds GatherPattern to quant-recipe pass and tweaks LPBQ block-axis arg for linear pattern. Note: duplicate GatherPattern declaration appears in header.
Index Visitor Update `mllm/backends/qnn/aot/visitor/Index.cpp`	Adjusted indices emission for Gather (removed extra flag) and moved axis param inline when constructing QNN node.
IR Declarations `mllm/compile/ir/linalg/Op.hpp`	Adds forward declaration(s) for GatherOp (duplicate declarations present).
Qwen3 Example `examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp`	Replaced mutable local `position_ids` with a const reference; changed RoPE embedding indexing to use `nn::functional::gather` on sliced QDQ_ROPE outputs.

Sequence Diagram(s)

sequenceDiagram
  participant User as User Code
  participant NN as nn::functional::gather
  participant AOp as aops::GatherOp
  participant CPU as CPUGatherOp
  participant QNNLower as QNN AOT Lowering/Runtime

  Note over User,NN: High-level gather call
  User->>NN: gather(x, dim, indices)
  NN->>AOp: build/trace GatherOp(dim, inputs)
  alt runtime path: CPU execution
    AOp->>CPU: CPUGatherOp::forward(inputs)
    CPU-->>User: output tensor
  else runtime path: QNN AOT lowering
    AOp->>QNNLower: Lower to QNN Gather node (emit inputs, axis param)
    QNNLower-->>User: output prepared for QNN runtime
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

feat(Qnn AOT): Implement LLMQuantRecipePass and associated patterns for quantization #572 — Adds and wires Gather support across aops, CPU, nn::functional, and QNN AOT visitor/lowering (strong overlap).
feat(qualcomm): Qnn AOT Lowering pass #580 — Implements Gather support and QNN AOT Gather pattern registration (directly related to lowering/visitor changes).
feat(Qnn AOT): Refactor code structure for improved readability and maintainability #568 — Modifies qwen3_qnn_aot example embedding/RoPE lookup, overlapping example-level changes.

Suggested reviewers

liang1232018
yirongjie

Poem

🐰 I hopped through tensors, dims aligned,

From core to CPU and QNN combined.
I gathered indices, tiny and spry,
Sliced RoPEs danced, embeddings fly.
A joyful twitch — new ops on the run!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is essentially empty, containing only a template link with no substantive information about the implementation, motivation, or testing details.	Add a detailed description explaining the QNN AOT lowering passes implementation, which operations are supported, key technical details, and any testing performed.
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.08% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding QNN AOT lowering passes to the qualcomm backend, which aligns with the comprehensive set of file additions and changes throughout the PR.

_{✏️ Tip: You can configure your own custom Pre-merge Checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mllm/backends/qnn/QNNUtils.cpp (1)

306-328: Fix semantic inconsistency in kInt16 type mapping.

The mapping of kInt16 → QNN_DATATYPE_UFIXED_POINT_16 is semantically incorrect. Signed integer types should not map to unsigned fixed-point types. This contradicts the quantized variant kInt16PerTensorSym → QNN_DATATYPE_SFIXED_POINT_16 used elsewhere in the same function (line 377), which correctly maps signed to signed fixed-point.

Change line 311 from QNN_DATATYPE_UFIXED_POINT_16 to QNN_DATATYPE_SFIXED_POINT_16 to maintain sign consistency.

The quantization parameter initialization for these types is properly handled via createQuantizeParams() and scale propagation is implemented correctly throughout the codebase.

🤖 Fix all issues with AI agents

In @mllm/backends/cpu/ops/GatherOp.cpp:
- Around line 57-59: In GatherOp.cpp the loop silently skips copies when "idx <
0 || idx >= dim_size", leaving parts of the output uninitialized; change the
Gather implementation to zero-initialize the output buffer/tensor (the
destination written by the gather) before the loop and, when an index is
out-of-bounds, either log a warning including the offending "idx" and "dim_size"
or raise an explicit error instead of continuing; ensure you reference the same
variables ("idx", "dim_size") and the output buffer in your changes so the
zero-fill happens once before the loop and all invalid-index cases are handled
with a clear log/error.

In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp:
- Around line 174-178: The blockwise expansion sets blockwise_expansion.axis =
cfg->ch_axis but incorrectly computes blockwise_expansion.numBlocksPerAxis using
v->tensor_.size(1); change the calculation to use the configured channel axis
like the other LPBQ fields (i.e., compute numBlocksPerAxis as
v->tensor_.size(cfg->ch_axis)) so the axis and block count are consistent when
cfg->ch_axis is not 1; update the code around Qnn_BlockwiseExpansion_t
blockwise_expansion and setBlockwiseQuantization usage to use
v->tensor_.size(cfg->ch_axis).

In @mllm/backends/qnn/aot/visitor/Concat.cpp:
- Around line 43-61: NormalizeAxis may produce an out-of-bounds or negative
value; after computing rank from first_input->tensor_.shape() and normalizing
axis (axis += rank), add explicit validation that 0 <= axis && axis < rank and
handle the error (throw an exception or log and abort) before proceeding, and
only then call QNNParamScalarWrapper::create("axis", (uint32_t)axis) to avoid
unsigned wraparound; update the Concat creation flow around
QnnAOTNodeOperation::create("Concat")/emplaceParamScalar to perform this check
early.

In @mllm/backends/qnn/aot/visitor/Gather.cpp:
- Around line 49-50: The dynamic_cast of gather_op->getAOp() to
mllm::aops::GatherOp* can return nullptr and is dereferenced immediately; change
the code to first assign the result to a local pointer (e.g., auto* gatherAOp =
dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())), check if gatherAOp is
nullptr, handle the error path (log via the existing logger, return/throw an
appropriate error, or abort the visitor) and only then read
gatherAOp->options().dim and call emplaceParamScalar, so you avoid undefined
behavior when the cast fails.

In @mllm/backends/qnn/aot/visitor/Linear.cpp:
- Around line 57-58: Ensure the tensor has at least 3 dimensions before
accessing shape()[0..2] in the reshape call: check the weight tensor's
rank/shape size (e.g., via tensor_.dim() or tensor_.shape().size()) in the code
around Linear.cpp where weight_val->tensor_ is being viewed, and if it is
smaller than 3 either return/throw a clear error (or log and skip) with context
(function/class name) or handle the alternative shape case; only perform the
view(...) when the validation passes to avoid out-of-bounds access.

In @mllm/backends/qnn/aot/visitor/Repeat.cpp:
- Line 45: Validate options.repeat_times before casting: check that int
repeat_times = options.repeat_times is > 0 (and optionally within a sane upper
bound) and return or throw a clear error if not; only then cast to uint32_t for
the QNN Tile call. Update both places where you cast repeat_times to uint32_t
(the initial assignment and the later usage around lines 54-55) to use the
validated value and emit a descriptive error mentioning
options.repeat_times/repeat_times so negative, zero, or excessively large values
cannot be silently converted into huge unsigned integers.

In @mllm/backends/qnn/aot/visitor/RMSNorm.cpp:
- Around line 40-45: The casted TensorValue pointers obtained for weight, i_0,
and o_0 (from
writer.getContext()->lookupSymbolTable(...)->outputs().front()->cast_<ir::tensor::TensorValue>()
and op->inputs()/outputs() casts) are used without null checks; add explicit
nullptr checks after each cast (for weight, i_0, and o_0) and handle failures by
returning early or throwing/logging a clear error so the function does not
dereference a null pointer — locate the casts in RMSNorm.cpp and guard each
cast_<ir::tensor::TensorValue>() result before further use.

In @mllm/backends/qnn/aot/visitor/Softmax.cpp:
- Around line 32-33: Validate that op has at least one input and one output and
that the front() elements and their casts succeed before using them: check
op->inputs().empty() and op->outputs().empty(), ensure op->inputs().front() and
op->outputs().front() are non-null, then perform the
cast_<ir::tensor::TensorValue>() and verify the returned pointers (input and
output) are non-null; if any check fails, return or raise an appropriate
error/log instead of dereferencing.
- Around line 42-48: The code adjusts a possibly-negative axis but never
verifies it falls into [0, rank-1], which can cause wraparound when later cast
to unsigned; after computing rank and applying "if (axis < 0) { axis += rank; }"
(referencing real_softmax_op->options().axis, axis, input->tensor_.shape(), and
rank), validate that axis >= 0 && axis < rank and handle invalid values (e.g.,
return an error/throw an exception or log and abort the op) rather than
proceeding; update the Softmax visitor to check this bound immediately after the
negative-axis adjustment and produce a clear error/early-exit if out of range.

In @mllm/backends/qnn/aot/visitor/Transpose.cpp:
- Around line 45-52: The permutation logic in Transpose.cpp uses options.dim0
and options.dim1 directly as indices into perm, which can be negative; normalize
both dims to positive equivalents (e.g., if dim < 0 then dim += rank) before
using them. Update the block that builds perm (after computing rank and perm
vector) to normalize options.dim0 and options.dim1 into local ints (or size_t)
and then perform the std::swap using those normalized indices so negative
dimensions behave the same as in CPU TransposeOp and other QNN AOT visitors
(refer to input->tensor_.shape(), rank, perm, and options.dim0/options.dim1).

In @mllm/backends/qnn/aot/visitor/Where.cpp:
- Around line 31-34: The code in Where.cpp assumes three inputs when calling
std::next on op->inputs() (accessing condition, true_input, false_input); first
validate the input count by checking op->inputs().size() (or equivalent) is >= 3
and handle the error path (throw, return, or log) before dereferencing
iterators; update the block that creates condition, true_input, and false_input
to guard against fewer inputs and use safe iterator/index access only after the
size check so std::next calls cannot go out of bounds.

In @mllm/backends/qnn/QNNUtils.cpp:
- Around line 310-312: The case handling for kInt16 incorrectly maps the signed
16-bit integer to QNN_DATATYPE_UFIXED_POINT_16 (unsigned); update the mapping in
the switch (case kInt16) to use the signed 16-bit fixed-point/type variant
(e.g., QNN_DATATYPE_FIXED_POINT_16 or the correct QNN signed-16 enum value used
elsewhere in the codebase) so that kInt16 maps to the signed QNN datatype
instead of QNN_DATATYPE_UFIXED_POINT_16.
- Line 618: The function name uses a reserved leading double underscore
(__printQnnTensor); rename it to a non-reserved identifier (e.g., printQnnTensor
or Qnn_PrintTensor) and update its declaration in qnnutils.hpp and any call
sites to the new name, ensuring linkage/extern "C" attributes remain unchanged;
also run a project-wide search for __printQnnTensor to fix all references and
rebuild to verify no symbol mismatches.
- Around line 453-454: Replace the commented-out device assertion with proper
conditional validation: keep the existing runtime check for non-AOT tensors by
reinstating MLLM_RT_ASSERT(tensor.device() == kQNN) when type !=
QNN_TENSOR_TYPE_STATIC, and add an explicit check for the AOT/static case (e.g.,
assert tensor.device() == kCPU or allow kCPU/kQNN as appropriate) to validate
expected placement; locate the assertion around the variables type,
QNN_TENSOR_TYPE_STATIC and tensor.device() in QNNUtils.cpp and implement the
conditional MLLM_RT_ASSERT branches to handle both runtime and AOT scenarios.

In @mllm/backends/qnn/QNNUtils.hpp:
- Line 91: The function name __printQnnTensor uses a double-underscore prefix
which is reserved; rename it to a non-reserved identifier (e.g., printQnnTensor
or debugPrintQnnTensor) in the declaration and update the matching definition
and all call sites to the new name (ensure Qnn_Tensor_t reference stays
unchanged and update any header/implementation pairs and tests that reference
__printQnnTensor).

In @mllm/compile/ir/linalg/Op.hpp:
- Line 84: Remove the duplicate forward declaration of GatherOp: locate the
second occurrence of "class GatherOp;" (the one at the later position) and
delete it so only the original forward declaration remains (leave the first
"class GatherOp;" intact).

🧹 Nitpick comments (8)

mllm/backends/qnn/aot/visitor/Index.cpp (2)
42-76: Consider adding inline comments and axis bounds validation.

The axis calculation logic is complex and would benefit from inline comments explaining the algorithm. Additionally, consider validating that the calculated axis is within the bounds of the input tensor's rank to catch configuration errors early.
Suggested improvements

Add inline comments:
// Iterate through index specifications to find the gather axis
// and extract indices tensor. Increment axis for each dimension
// until we find vector or tensor indices.
for (const auto& blob : indices_list) {
  // Skip dimensions with full slice (kAll, kAll)
  if (blob.slice_indices_.has_value()) {
    // ... existing code
  }
  // ... rest of logic
}
Add axis validation after line 81:
if (axis >= input->getType()->cast_<ir::tensor::TensorType>()->getShape().size()) {
  MLLM_ERROR("Calculated axis " + std::to_string(axis) + " exceeds input tensor rank");
  return false;
}
14-16: Add function documentation.

Both isMatch and rewrite functions lack documentation. Per coding guidelines, public APIs should have clear comments explaining purpose, parameters, returns, and errors. This is especially important for pattern implementations that are part of the AOT lowering infrastructure.
Suggested documentation
/// Checks if the operation matches the QNN AOT Index pattern.
/// @param op The operation to check
/// @return true if the operation is an IndexOp with QNN attributes
bool QnnAOTIndexPattern::isMatch(const mllm::ir::op_ptr_t& op) {
  return op->isa_<mllm::ir::linalg::IndexOp>() && (op->getAttr("using_qnn") != nullptr);
}

/// Rewrites an IndexOp to QNN Gather operation.
/// Extracts the gather axis and indices from the IndexOp and emits
/// a QNN Gather node to the AOT graph.
/// @param writer IR writer for creating new operations
/// @param op The IndexOp to rewrite
/// @return true if rewrite succeeded, false otherwise
/// @note QNN Gather only supports gathering on a single axis
bool QnnAOTIndexPattern::rewrite(ir::IRWriter& writer, const ir::op_ptr_t& op) {
  // ... existing implementation
}
Based on coding guidelines.

Also applies to: 18-98
mllm/backends/qnn/QNNUtils.cpp (1)
681-682: Improve formatting in debug output string.

Add spaces after colons for better readability in the blockwise expansion debug output.
🔎 Proposed fix
-      quant_str = "Blockwise Expansion (axis:" + std::to_string(t.quantizeParams.blockwiseExpansion->axis)
-                  + ", blockSize:" + std::to_string(t.quantizeParams.blockwiseExpansion->numBlocksPerAxis) + ")";
+      quant_str = "Blockwise Expansion (axis: " + std::to_string(t.quantizeParams.blockwiseExpansion->axis)
+                  + ", blockSize: " + std::to_string(t.quantizeParams.blockwiseExpansion->numBlocksPerAxis) + ")";
mllm/backends/qnn/aot/visitor/Softmax.cpp (1)

43-43: Extract beta as a named constant or clarify the comment.

The beta value is hardcoded to 1.0f. Unlike axis, which is retrieved from real_softmax_op->options(), beta is not configurable and is a fixed requirement of the QNN Softmax API. The comment "Default beta" is misleading. Consider extracting it as a named constant (e.g., constexpr float QNN_SOFTMAX_BETA = 1.0f;) at file scope for clarity and maintainability, or update the comment to reflect that it's a fixed QNN parameter.
mllm/backends/qnn/aot/visitor/Equal.cpp (1)
18-52: LGTM! Solid implementation following established patterns.

The rewrite implementation correctly:

Validates the operation cast and required attributes

Captures inputs/outputs through the AOT environment

Creates and registers a QNN ElementWiseEqual node

Follows the same structure as other AOT patterns (Gather, MatMul, Where, etc.)
Optional: Minor style consistency suggestion

Line 34 uses std::next(op->inputs().begin()) while other patterns like MatMul explicitly use std::next(op->inputs().begin(), 1). Though functionally equivalent, explicit parameters improve consistency.
-  auto input1 = (*std::next(op->inputs().begin()))->cast_<ir::tensor::TensorValue>();
+  auto input1 = (*std::next(op->inputs().begin(), 1))->cast_<ir::tensor::TensorValue>();
mllm/backends/qnn/aot/visitor/Gather.cpp (1)
32-37: Consider adding null checks for tensor casts.

The cast_<ir::tensor::TensorValue>() results at lines 33, 34, and 37 are used without validation. While similar patterns in the codebase also omit these checks, adding them would improve robustness.
🔎 Proposed fix
  // Inputs
  auto table = op->inputs().front()->cast_<ir::tensor::TensorValue>();
  auto indices = (*std::next(op->inputs().begin()))->cast_<ir::tensor::TensorValue>();

  // Output
  auto output = op->outputs().front()->cast_<ir::tensor::TensorValue>();
+
+  if (!table || !indices || !output) {
+    MLLM_ERROR("Failed to cast inputs/output to TensorValue");
+    return false;
+  }
mllm/core/aops/GatherOp.hpp (1)
24-24: Consider adding [[nodiscard]] to the accessor.

The options() accessor returns a const reference that should typically not be discarded. Adding [[nodiscard]] would help catch accidental misuse.
🔎 Proposed fix
-  inline const GatherOpOptions& options() const { return options_; }
+  [[nodiscard]] inline const GatherOpOptions& options() const { return options_; }
mllm/core/aops/GatherOp.cpp (1)
36-40: reserve underestimates final vector size.

new_shape.reserve(dim) only reserves space for dim elements, but the final size is dim + indices_shape.size() + (shape.size() - dim - 1). Consider reserving the correct final size to avoid reallocations:
  std::vector<int32_t> new_shape;
- new_shape.reserve(dim);
+ new_shape.reserve(dim + indices_shape.size() + (shape.size() - dim - 1));

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9202124 and 000e12b.

📒 Files selected for processing (37)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
mllm/backends/cpu/CPUBackend.cpp
mllm/backends/cpu/ops/GatherOp.cpp
mllm/backends/cpu/ops/GatherOp.hpp
mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/backends/qnn/aot/visitor/Concat.cpp
mllm/backends/qnn/aot/visitor/Concat.hpp
mllm/backends/qnn/aot/visitor/Equal.cpp
mllm/backends/qnn/aot/visitor/Equal.hpp
mllm/backends/qnn/aot/visitor/Gather.cpp
mllm/backends/qnn/aot/visitor/Gather.hpp
mllm/backends/qnn/aot/visitor/Index.cpp
mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/aot/visitor/Linear.hpp
mllm/backends/qnn/aot/visitor/Matmul.cpp
mllm/backends/qnn/aot/visitor/Matmul.hpp
mllm/backends/qnn/aot/visitor/RMSNorm.cpp
mllm/backends/qnn/aot/visitor/RMSNorm.hpp
mllm/backends/qnn/aot/visitor/Repeat.cpp
mllm/backends/qnn/aot/visitor/Repeat.hpp
mllm/backends/qnn/aot/visitor/Softmax.cpp
mllm/backends/qnn/aot/visitor/Softmax.hpp
mllm/backends/qnn/aot/visitor/Transpose.cpp
mllm/backends/qnn/aot/visitor/Transpose.hpp
mllm/backends/qnn/aot/visitor/Where.cpp
mllm/backends/qnn/aot/visitor/Where.hpp
mllm/compile/ir/linalg/Op.hpp
mllm/core/aops/GatherOp.cpp
mllm/core/aops/GatherOp.hpp
mllm/nn/Functional.cpp
mllm/nn/Functional.hpp

🧰 Additional context used

📓 Path-based instructions (4)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/aot/visitor/Repeat.cpp
mllm/backends/qnn/aot/visitor/Transpose.hpp
mllm/backends/cpu/CPUBackend.cpp
mllm/backends/qnn/aot/visitor/Equal.hpp
mllm/core/aops/GatherOp.hpp
mllm/backends/qnn/aot/visitor/Linear.hpp
mllm/backends/qnn/aot/visitor/Concat.hpp
mllm/backends/qnn/aot/visitor/Where.cpp
mllm/backends/qnn/aot/visitor/Gather.cpp
mllm/backends/qnn/aot/visitor/Softmax.cpp
mllm/backends/qnn/aot/visitor/RMSNorm.cpp
mllm/backends/qnn/aot/visitor/Matmul.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/backends/qnn/aot/visitor/Gather.hpp
mllm/backends/qnn/aot/visitor/RMSNorm.hpp
mllm/nn/Functional.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/cpu/ops/GatherOp.hpp
mllm/core/aops/GatherOp.cpp
mllm/backends/cpu/ops/GatherOp.cpp
mllm/backends/qnn/aot/visitor/Repeat.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/nn/Functional.hpp
mllm/backends/qnn/aot/visitor/Softmax.hpp
mllm/backends/qnn/aot/visitor/Equal.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Transpose.cpp
mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/aot/visitor/Index.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
mllm/backends/qnn/aot/visitor/Concat.cpp
mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/aot/visitor/Where.hpp
mllm/compile/ir/linalg/Op.hpp
mllm/backends/qnn/aot/visitor/Matmul.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/aot/visitor/Repeat.cpp
mllm/backends/qnn/aot/visitor/Transpose.hpp
mllm/backends/cpu/CPUBackend.cpp
mllm/backends/qnn/aot/visitor/Equal.hpp
mllm/core/aops/GatherOp.hpp
mllm/backends/qnn/aot/visitor/Linear.hpp
mllm/backends/qnn/aot/visitor/Concat.hpp
mllm/backends/qnn/aot/visitor/Where.cpp
mllm/backends/qnn/aot/visitor/Gather.cpp
mllm/backends/qnn/aot/visitor/Softmax.cpp
mllm/backends/qnn/aot/visitor/RMSNorm.cpp
mllm/backends/qnn/aot/visitor/Matmul.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/backends/qnn/aot/visitor/Gather.hpp
mllm/backends/qnn/aot/visitor/RMSNorm.hpp
mllm/nn/Functional.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/cpu/ops/GatherOp.hpp
mllm/core/aops/GatherOp.cpp
mllm/backends/cpu/ops/GatherOp.cpp
mllm/backends/qnn/aot/visitor/Repeat.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/nn/Functional.hpp
mllm/backends/qnn/aot/visitor/Softmax.hpp
mllm/backends/qnn/aot/visitor/Equal.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Transpose.cpp
mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/aot/visitor/Index.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
mllm/backends/qnn/aot/visitor/Concat.cpp
mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/aot/visitor/Where.hpp
mllm/compile/ir/linalg/Op.hpp
mllm/backends/qnn/aot/visitor/Matmul.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

mllm/backends/qnn/aot/visitor/Repeat.cpp
mllm/backends/cpu/CPUBackend.cpp
mllm/backends/qnn/aot/visitor/Where.cpp
mllm/backends/qnn/aot/visitor/Gather.cpp
mllm/backends/qnn/aot/visitor/Softmax.cpp
mllm/backends/qnn/aot/visitor/RMSNorm.cpp
mllm/nn/Functional.cpp
mllm/core/aops/GatherOp.cpp
mllm/backends/cpu/ops/GatherOp.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/visitor/Equal.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Transpose.cpp
mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/aot/visitor/Index.cpp
mllm/backends/qnn/aot/visitor/Concat.cpp
mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/aot/visitor/Matmul.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/aot/visitor/Repeat.cpp
mllm/backends/qnn/aot/visitor/Transpose.hpp
mllm/backends/cpu/CPUBackend.cpp
mllm/backends/qnn/aot/visitor/Equal.hpp
mllm/core/aops/GatherOp.hpp
mllm/backends/qnn/aot/visitor/Linear.hpp
mllm/backends/qnn/aot/visitor/Concat.hpp
mllm/backends/qnn/aot/visitor/Where.cpp
mllm/backends/qnn/aot/visitor/Gather.cpp
mllm/backends/qnn/aot/visitor/Softmax.cpp
mllm/backends/qnn/aot/visitor/RMSNorm.cpp
mllm/backends/qnn/aot/visitor/Matmul.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/backends/qnn/aot/visitor/Gather.hpp
mllm/backends/qnn/aot/visitor/RMSNorm.hpp
mllm/nn/Functional.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/cpu/ops/GatherOp.hpp
mllm/core/aops/GatherOp.cpp
mllm/backends/cpu/ops/GatherOp.cpp
mllm/backends/qnn/aot/visitor/Repeat.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/nn/Functional.hpp
mllm/backends/qnn/aot/visitor/Softmax.hpp
mllm/backends/qnn/aot/visitor/Equal.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/visitor/Transpose.cpp
mllm/backends/qnn/aot/visitor/Linear.cpp
mllm/backends/qnn/aot/visitor/Index.cpp
mllm/backends/qnn/aot/QnnWrappersAPI.hpp
mllm/backends/qnn/aot/visitor/Concat.cpp
mllm/backends/qnn/QNNUtils.cpp
mllm/backends/qnn/aot/visitor/Where.hpp
mllm/compile/ir/linalg/Op.hpp
mllm/backends/qnn/aot/visitor/Matmul.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp

🧬 Code graph analysis (21)

mllm/backends/qnn/aot/visitor/Repeat.cpp (6)

mllm/backends/qnn/aot/visitor/Index.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-98)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Concat.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-71)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Equal.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-52)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Gather.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-56)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Matmul.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-51)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/Where.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-53)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/Transpose.hpp (9)

mllm/backends/qnn/aot/visitor/Concat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Equal.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Gather.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Matmul.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Repeat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Softmax.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Where.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Index.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Equal.hpp (9)

mllm/backends/qnn/aot/visitor/Concat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Gather.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Matmul.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Repeat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Softmax.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Transpose.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Where.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Index.hpp (2)

op (14-14)

writer (16-16)

mllm/core/aops/GatherOp.hpp (1)

mllm/backends/cpu/ops/GatherOp.hpp (3)

options (20-22)

options (20-20)

inputs (15-15)

mllm/backends/qnn/aot/visitor/Linear.hpp (5)

mllm/backends/qnn/aot/visitor/Matmul.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Softmax.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Elewise.hpp (4)

op (14-14)

op (25-25)

writer (16-16)

writer (27-27)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/visitor/Concat.hpp (9)

mllm/backends/qnn/aot/visitor/Equal.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Gather.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Matmul.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Repeat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Softmax.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Transpose.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Where.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Index.hpp (2)

op (14-14)

writer (16-16)

examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)

mllm/nn/Functional.hpp (1)

gather (165-165)

mllm/backends/qnn/aot/visitor/Softmax.cpp (8)

mllm/backends/qnn/aot/visitor/Concat.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-71)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Equal.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-52)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Gather.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-56)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Matmul.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-51)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/RMSNorm.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-66)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Repeat.cpp (4)

isMatch (15-17)

isMatch (15-15)

rewrite (19-84)

rewrite (19-19)

mllm/backends/qnn/aot/visitor/Transpose.cpp (4)

isMatch (15-17)

isMatch (15-15)

rewrite (19-76)

rewrite (19-19)

mllm/backends/qnn/aot/visitor/Where.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-53)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/RMSNorm.cpp (8)

mllm/backends/qnn/aot/visitor/Concat.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-71)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Equal.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-52)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Gather.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-56)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Matmul.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-51)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/Repeat.cpp (4)

isMatch (15-17)

isMatch (15-15)

rewrite (19-84)

rewrite (19-19)

mllm/backends/qnn/aot/visitor/Softmax.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-69)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Transpose.cpp (4)

isMatch (15-17)

isMatch (15-15)

rewrite (19-76)

rewrite (19-19)

mllm/backends/qnn/aot/visitor/Where.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-53)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/Gather.hpp (8)

mllm/backends/qnn/aot/visitor/Concat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Equal.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Matmul.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Repeat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Softmax.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Transpose.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Where.hpp (2)

op (14-14)

writer (16-16)

mllm/nn/Functional.cpp (2)

mllm/core/Tensor.cpp (1)

instance (427-427)

mllm/backends/qnn/aot/visitor/Gather.hpp (1)

kGather (18-20)

mllm/backends/cpu/ops/GatherOp.hpp (1)

mllm/core/aops/GatherOp.hpp (2)

inputs (20-20)

inputs (22-22)

mllm/core/aops/GatherOp.cpp (2)

mllm/core/aops/GatherOp.hpp (5)

GatherOp (16-16)

trace_context (18-18)

inputs (20-20)

inputs (22-22)

options_ (24-24)

mllm/backends/cpu/ops/GatherOp.hpp (3)

options (20-22)

options (20-20)

inputs (15-15)

mllm/backends/cpu/ops/GatherOp.cpp (3)

mllm/backends/cpu/ops/GatherOp.hpp (4)

CPUGatherOp (13-13)

options (20-22)

options (20-20)

inputs (15-15)

mllm/core/aops/GatherOp.cpp (2)

forward (21-23)

forward (21-21)

mllm/core/aops/GatherOp.hpp (2)

inputs (20-20)

inputs (22-22)

mllm/backends/qnn/aot/visitor/Repeat.hpp (9)

mllm/backends/qnn/aot/visitor/Concat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Equal.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Gather.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Matmul.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Softmax.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Transpose.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Where.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Index.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (7)

v (47-49)

v (47-47)

v (56-56)

v (58-58)

v (60-60)

v (62-62)

v (65-65)

mllm/backends/qnn/aot/visitor/Softmax.hpp (11)

mllm/backends/qnn/aot/visitor/Linear.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Concat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Equal.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Gather.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Matmul.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Repeat.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Transpose.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Where.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/visitor/Elewise.hpp (4)

op (14-14)

op (25-25)

writer (16-16)

writer (27-27)

mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)

op (17-17)

mllm/backends/qnn/aot/visitor/Equal.cpp (8)

mllm/backends/qnn/aot/visitor/Index.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-98)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Concat.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-71)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Gather.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-56)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Matmul.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-51)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/RMSNorm.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-66)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Repeat.cpp (4)

isMatch (15-17)

isMatch (15-15)

rewrite (19-84)

rewrite (19-19)

mllm/backends/qnn/aot/visitor/Transpose.cpp (4)

isMatch (15-17)

isMatch (15-15)

rewrite (19-76)

rewrite (19-19)

mllm/backends/qnn/aot/visitor/Where.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-53)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/Linear.cpp (2)

mllm/backends/qnn/aot/visitor/Linear.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp (5)

qnn_context_name (219-219)

qnn_context_name (221-222)

qnn_context_name (224-225)

input (84-84)

output (86-86)

mllm/backends/qnn/aot/visitor/Matmul.cpp (8)

mllm/backends/qnn/aot/visitor/Concat.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-71)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Equal.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-52)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Gather.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-56)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Repeat.cpp (4)

isMatch (15-17)

isMatch (15-15)

rewrite (19-84)

rewrite (19-19)

mllm/backends/qnn/aot/visitor/Softmax.cpp (4)

isMatch (14-16)

isMatch (14-14)

rewrite (18-69)

rewrite (18-18)

mllm/backends/qnn/aot/visitor/Transpose.cpp (4)

isMatch (15-17)

isMatch (15-15)

rewrite (19-76)

rewrite (19-19)

mllm/backends/qnn/aot/visitor/Where.cpp (4)

isMatch (13-15)

isMatch (13-13)

rewrite (17-53)

rewrite (17-17)

mllm/backends/qnn/aot/visitor/Matmul.hpp (2)

op (14-14)

writer (16-16)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)

mllm/compile/ir/linalg/Attribute.hpp (7)

node (27-27)

node (27-27)

node (327-327)

node (327-327)

node (348-348)

node (348-348)

create (77-77)

mllm/compile/ir/builtin/Attribute.hpp (10)

node (24-24)

node (24-24)

node (44-44)

node (44-44)

node (64-64)

node (64-64)

node (84-84)

node (84-84)

node (104-104)

node (104-104)

🪛 Clang (14.0.6)

mllm/backends/qnn/aot/visitor/Transpose.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Equal.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/core/aops/GatherOp.hpp

[error] 24-24: function 'options' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)

[error] 27-27: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Linear.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Concat.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Matmul.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 304-304: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Gather.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/nn/Functional.cpp

[error] 215-215: 3 adjacent parameters of 'gather' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/QNNUtils.hpp

[error] 91-91: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Repeat.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Softmax.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 65-65: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/QNNUtils.cpp

[error] 618-618: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

[error] 628-628: variable name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 630-630: variable 'tensor_type' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 640-640: variable 'dtype_str' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 666-666: variable 'shape_str' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 673-673: variable 'quant_str' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 675-675: repeated branch in conditional chain

(bugprone-branch-clone,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Where.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-android
GitHub Check: build-macos
GitHub Check: build-x86

coderabbitai · 2026-01-07T03:54:25Z

mllm/backends/cpu/ops/GatherOp.cpp

+      if (idx < 0) idx += dim_size;
+
+      if (idx < 0 || idx >= dim_size) { continue; }


⚠️ Potential issue | 🟠 Major

Silent skip of out-of-bounds indices may leave output uninitialized.

When idx < 0 || idx >= dim_size after negative index adjustment, the code silently skips the copy, leaving that portion of the output tensor uninitialized. This could result in garbage values or undefined behavior downstream.

Consider either:

Logging a warning/error for out-of-bounds indices.

Zero-initializing the output tensor before the loop.

Raising an error for invalid indices.

🔎 Proposed fix: Log warning and zero-fill

- if (idx < 0 || idx >= dim_size) { continue; } + if (idx < 0 || idx >= dim_size) { + MLLM_WARN("GatherOp: index {} out of bounds [0, {}), filling with zeros", idx, dim_size); + int64_t dst_offset = (o * indices_count + i) * inner_size * data_type_size; + std::memset(output_ptr + dst_offset, 0, inner_size * data_type_size); + continue; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (idx < 0) idx += dim_size;

if (idx < 0 || idx >= dim_size) { continue; }

if (idx < 0) idx += dim_size;

if (idx < 0 || idx >= dim_size) {

MLLM_WARN("GatherOp: index {} out of bounds [0, {}), filling with zeros", idx, dim_size);

int64_t dst_offset = (o * indices_count + i) * inner_size * data_type_size;

std::memset(output_ptr + dst_offset, 0, inner_size * data_type_size);

continue;

}

🤖 Prompt for AI Agents

In @mllm/backends/cpu/ops/GatherOp.cpp around lines 57 - 59, In GatherOp.cpp the loop silently skips copies when "idx < 0 || idx >= dim_size", leaving parts of the output uninitialized; change the Gather implementation to zero-initialize the output buffer/tensor (the destination written by the gather) before the loop and, when an index is out-of-bounds, either log a warning including the offending "idx" and "dim_size" or raise an explicit error instead of continuing; ensure you reference the same variables ("idx", "dim_size") and the output buffer in your changes so the zero-fill happens once before the loop and all invalid-index cases are handled with a clear log/error.

coderabbitai · 2026-01-07T03:54:26Z

mllm/backends/qnn/aot/QnnWrappersAPI.cpp

      Qnn_BlockwiseExpansion_t blockwise_expansion;
      blockwise_expansion.axis = cfg->ch_axis;
-      blockwise_expansion.axis = cfg->ch_axis;
      blockwise_expansion.scaleOffsets = nullptr;  // Will be set by setBlockwiseQuantization
-      blockwise_expansion.numBlocksPerAxis = v->tensor_.size(cfg->ch_axis) / cfg->block_size;
+      blockwise_expansion.numBlocksPerAxis = v->tensor_.size(1) / cfg->block_size;
      blockwise_expansion.blockScaleBitwidth = 12;  // 12 bits for 4 to 16 expansion


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, understand the structure of QuantizationSpecLPBQ and ch_axis rg -n "ch_axis" --type cpp -B 2 -A 2 | head -80

Repository: UbiquitousLearning/mllm

Length of output: 6583

🏁 Script executed:

# Check how blockwise_expansion is used and what axis/numBlocksPerAxis mean rg -n "blockwise_expansion" --type cpp -B 3 -A 3 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 1458

🏁 Script executed:

# Look for LPBQ configuration and how ch_axis is set rg -n "QuantizationSpecLPBQ\|kLPBQ" --type cpp -B 2 -A 5 | head -150

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Check the tensor dimensions in context of weight tensors rg -n "v->tensor_\.size\(" --type cpp -B 2 -A 2 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 1627

🏁 Script executed:

# Search for where LPBQ ch_axis is set in LLMQuantRecipePass rg -n "QuantizationSpecLPBQ::create" --type cpp -B 5 -A 15 | head -150

Repository: UbiquitousLearning/mllm

Length of output: 4470

🏁 Script executed:

# Look for Linear layer quantization setup - typically ch_axis would be set there rg -n "Linear\|MatMul" mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp -B 2 -A 8 | head -200

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Check what values ch_axis can have - look at initialization and usage patterns rg -n "ch_axis\s*=" --type cpp -B 1 -A 1 | grep -v ".hpp" | head -100

Repository: UbiquitousLearning/mllm

Length of output: 81

🏁 Script executed:

# Found ch_axis = 0 at line 797. Let's verify if this is the only place LPBQ is created rg -n "QuantizationSpecLPBQ::create" --type cpp -A 2 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 831

🏁 Script executed:

# Check weight tensor dimensions - for Linear layers, shapes should be [out_features, in_features] rg -n "shape\|size\(0\)|size\(1\)" mllm/backends/qnn/aot/QnnWrappersAPI.cpp | head -50

Repository: UbiquitousLearning/mllm

Length of output: 153

🏁 Script executed:

# Verify: does the code document what axis the blockwise quantization should be along? rg -n "blockwise\|block_size" mllm/backends/qnn/aot/QnnWrappersAPI.cpp -B 2 -A 2 | head -80

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Let's check if there are any tests or comments explaining the relationship between ch_axis and tensor dims rg -n "blockwise\|block_size" mllm/compile/ir/linalg/ -B 2 -A 2 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Check the full LPBQ quantization spec to understand semantics cat mllm/compile/ir/linalg/Attribute.hpp | grep -A 30 "struct QuantizationSpecLPBQ"

Repository: UbiquitousLearning/mllm

Length of output: 1463

🏁 Script executed:

# Let's check the relationship more carefully - look at how scale_level_0_int is used rg -n "scale_level_0_int" --type cpp -B 2 -A 2 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 3644

🏁 Script executed:

# Check dimensions of weight tensors in Linear quantization - what are size(0) and size(1)? rg -n "Linear.*weight\|register.*weight" mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp -B 3 -A 3 | head -80

Repository: UbiquitousLearning/mllm

Length of output: 49

🏁 Script executed:

# Look for any documentation or schema about blockwise expansion axis semantics rg -n "Qnn_BlockwiseExpansion" --type cpp -B 5 -A 10 | head -150

Repository: UbiquitousLearning/mllm

Length of output: 5137

Mismatch between blockwise expansion axis and numBlocksPerAxis calculation.

blockwise_expansion.axis is set to cfg->ch_axis (line 175), but numBlocksPerAxis is computed using hardcoded v->tensor_.size(1) (line 177). At line 165, the code correctly uses v->tensor_.size(cfg->ch_axis) for other LPBQ fields; this pattern should be consistent. If ch_axis is 0 (as set in LLMQuantRecipePass.cpp:797), then numBlocksPerAxis should use size(0), not size(1):

Suggested fix

blockwise_expansion.axis = cfg->ch_axis; blockwise_expansion.scaleOffsets = nullptr; // Will be set by setBlockwiseQuantization - blockwise_expansion.numBlocksPerAxis = v->tensor_.size(1) / cfg->block_size; + blockwise_expansion.numBlocksPerAxis = v->tensor_.size(cfg->ch_axis) / cfg->block_size;

🤖 Prompt for AI Agents

In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 174 - 178, The blockwise expansion sets blockwise_expansion.axis = cfg->ch_axis but incorrectly computes blockwise_expansion.numBlocksPerAxis using v->tensor_.size(1); change the calculation to use the configured channel axis like the other LPBQ fields (i.e., compute numBlocksPerAxis as v->tensor_.size(cfg->ch_axis)) so the axis and block count are consistent when cfg->ch_axis is not 1; update the code around Qnn_BlockwiseExpansion_t blockwise_expansion and setBlockwiseQuantization usage to use v->tensor_.size(cfg->ch_axis).

coderabbitai · 2026-01-07T03:54:26Z

mllm/backends/qnn/aot/visitor/Concat.cpp

+  // Handle negative axis
+  // We can use the first input to determine rank, assuming all inputs have same rank
+  auto first_input = op->inputs().front()->cast_<ir::tensor::TensorValue>();
+  auto input_shape = first_input->tensor_.shape();
+  int rank = input_shape.size();
+  if (axis < 0) { axis += rank; }
+
+  // Create QNN Op Node
+  auto qnn_op_node = QnnAOTNodeOperation::create("Concat");
+  qnn_op_node->setPackageName("qti.aisw");
+
+  // Add Inputs
+  for (auto& input_val : op->inputs()) {
+    auto input = input_val->cast_<ir::tensor::TensorValue>();
+    qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, input));
+  }
+
+  // Add Params
+  qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (uint32_t)axis));


🛠️ Refactor suggestion | 🟠 Major

Add bounds validation for the normalized axis.

After normalizing the negative axis, the code should verify that the resulting axis is within the valid range [0, rank). Additionally, before casting to uint32_t at line 61, explicitly validate that axis is non-negative to prevent potential unsigned integer wraparound issues in case the normalization logic has edge cases.

🔎 Proposed fix to add axis bounds validation

int rank = input_shape.size(); if (axis < 0) { axis += rank; } + + // Validate axis is in valid range + if (axis < 0 || axis >= rank) { + MLLM_ERROR("Axis " << axis << " is out of bounds for rank " << rank); + return false; + } // Create QNN Op Node auto qnn_op_node = QnnAOTNodeOperation::create("Concat");

🤖 Prompt for AI Agents

In @mllm/backends/qnn/aot/visitor/Concat.cpp around lines 43 - 61, NormalizeAxis may produce an out-of-bounds or negative value; after computing rank from first_input->tensor_.shape() and normalizing axis (axis += rank), add explicit validation that 0 <= axis && axis < rank and handle the error (throw an exception or log and abort) before proceeding, and only then call QNNParamScalarWrapper::create("axis", (uint32_t)axis) to avoid unsigned wraparound; update the Concat creation flow around QnnAOTNodeOperation::create("Concat")/emplaceParamScalar to perform this check early.

coderabbitai · 2026-01-07T03:54:26Z

mllm/backends/qnn/aot/visitor/Gather.cpp

+  int axis = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())->options().dim;
+  qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));


⚠️ Potential issue | 🟠 Major

Unchecked dynamic_cast may cause undefined behavior.

The dynamic_cast at line 49 can return nullptr if the cast fails, and directly accessing ->options().dim on a null pointer leads to undefined behavior.

🔎 Proposed fix: Add null check

// Add scalar param axis - int axis = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())->options().dim; + auto gather_aop = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp()); + if (!gather_aop) { + MLLM_ERROR("Failed to cast to aops::GatherOp"); + return false; + } + int axis = gather_aop->options().dim; qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

int axis = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())->options().dim;

qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));

auto gather_aop = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp());

if (!gather_aop) {

MLLM_ERROR("Failed to cast to aops::GatherOp");

return false;

}

int axis = gather_aop->options().dim;

qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));

🤖 Prompt for AI Agents

In @mllm/backends/qnn/aot/visitor/Gather.cpp around lines 49 - 50, The dynamic_cast of gather_op->getAOp() to mllm::aops::GatherOp* can return nullptr and is dereferenced immediately; change the code to first assign the result to a local pointer (e.g., auto* gatherAOp = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())), check if gatherAOp is nullptr, handle the error path (log via the existing logger, return/throw an appropriate error, or abort the visitor) and only then read gatherAOp->options().dim and call emplaceParamScalar, so you avoid undefined behavior when the cast fails.

coderabbitai · 2026-01-07T03:54:26Z

mllm/backends/qnn/aot/visitor/Linear.cpp

+  weight_val->tensor_ = weight_val->tensor_.view(
+      {weight_val->tensor_.shape()[0], weight_val->tensor_.shape()[1] * weight_val->tensor_.shape()[2]});


⚠️ Potential issue | 🔴 Critical

Add shape validation before tensor reshaping.

The code accesses shape()[0], shape()[1], and shape()[2] without validating that the weight tensor has at least 3 dimensions. This could cause an out-of-bounds access if the tensor shape doesn't meet expectations.

🔎 Proposed fix to add shape validation

+ // Validate weight tensor has expected 3D shape + if (weight_val->tensor_.shape().size() < 3) { + MLLM_ERROR("Weight tensor must have at least 3 dimensions, got " + + std::to_string(weight_val->tensor_.shape().size())); + return false; + } + weight_val->tensor_ = weight_val->tensor_.to(kUInt8); weight_val->tensor_ = weight_val->tensor_.view( {weight_val->tensor_.shape()[0], weight_val->tensor_.shape()[1] * weight_val->tensor_.shape()[2]});

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

weight_val->tensor_ = weight_val->tensor_.view(

{weight_val->tensor_.shape()[0], weight_val->tensor_.shape()[1] * weight_val->tensor_.shape()[2]});

// Validate weight tensor has expected 3D shape

if (weight_val->tensor_.shape().size() < 3) {

MLLM_ERROR("Weight tensor must have at least 3 dimensions, got " +

std::to_string(weight_val->tensor_.shape().size()));

return false;

}

weight_val->tensor_ = weight_val->tensor_.to(kUInt8);

weight_val->tensor_ = weight_val->tensor_.view(

{weight_val->tensor_.shape()[0], weight_val->tensor_.shape()[1] * weight_val->tensor_.shape()[2]});

🤖 Prompt for AI Agents

In @mllm/backends/qnn/aot/visitor/Linear.cpp around lines 57 - 58, Ensure the tensor has at least 3 dimensions before accessing shape()[0..2] in the reshape call: check the weight tensor's rank/shape size (e.g., via tensor_.dim() or tensor_.shape().size()) in the code around Linear.cpp where weight_val->tensor_ is being viewed, and if it is smaller than 3 either return/throw a clear error (or log and skip) with context (function/class name) or handle the alternative shape case; only perform the view(...) when the validation passes to avoid out-of-bounds access.

coderabbitai · 2026-01-07T03:54:26Z

mllm/backends/qnn/QNNUtils.cpp

    case kInt16: {
-      ret = QNN_DATATYPE_INT_16;
+      ret = QNN_DATATYPE_UFIXED_POINT_16;
      break;


⚠️ Potential issue | 🔴 Critical

Critical: Incorrect type mapping for kInt16.

Line 311 maps kInt16 (signed 16-bit integer) to QNN_DATATYPE_UFIXED_POINT_16 (unsigned fixed-point 16-bit). This is a sign mismatch that will cause incorrect value interpretation and potential data corruption.

🔎 Proposed fix

case kInt16: { - ret = QNN_DATATYPE_UFIXED_POINT_16; + ret = QNN_DATATYPE_SFIXED_POINT_16; break; }

🤖 Prompt for AI Agents

In @mllm/backends/qnn/QNNUtils.cpp around lines 310 - 312, The case handling for kInt16 incorrectly maps the signed 16-bit integer to QNN_DATATYPE_UFIXED_POINT_16 (unsigned); update the mapping in the switch (case kInt16) to use the signed 16-bit fixed-point/type variant (e.g., QNN_DATATYPE_FIXED_POINT_16 or the correct QNN signed-16 enum value used elsewhere in the codebase) so that kInt16 maps to the signed QNN datatype instead of QNN_DATATYPE_UFIXED_POINT_16.

coderabbitai · 2026-01-07T03:54:27Z

mllm/backends/qnn/QNNUtils.cpp

+  // in AOT case, the tensor is all on CPU (TODO: handle this)
+  // if (type != QNN_TENSOR_TYPE_STATIC) { MLLM_RT_ASSERT(tensor.device() == kQNN); }


⚠️ Potential issue | 🟠 Major

Add proper device validation instead of commenting out the assertion.

The device type assertion has been commented out for AOT handling, but this removes an important safety check that could catch device placement errors. Instead of removing the check entirely, add conditional logic to handle both runtime and AOT cases properly.

🔎 Proposed fix

- // in AOT case, the tensor is all on CPU (TODO: handle this) - // if (type != QNN_TENSOR_TYPE_STATIC) { MLLM_RT_ASSERT(tensor.device() == kQNN); } + // In AOT case, tensors may be on CPU; in runtime, non-static tensors must be on QNN + if (type != QNN_TENSOR_TYPE_STATIC) { + // TODO: Replace with proper AOT detection mechanism + bool isAOT = (tensor.device() == kCPU); + if (!isAOT) { + MLLM_RT_ASSERT(tensor.device() == kQNN); + } + }

🤖 Prompt for AI Agents

In @mllm/backends/qnn/QNNUtils.cpp around lines 453 - 454, Replace the commented-out device assertion with proper conditional validation: keep the existing runtime check for non-AOT tensors by reinstating MLLM_RT_ASSERT(tensor.device() == kQNN) when type != QNN_TENSOR_TYPE_STATIC, and add an explicit check for the AOT/static case (e.g., assert tensor.device() == kCPU or allow kCPU/kQNN as appropriate) to validate expected placement; locate the assertion around the variables type, QNN_TENSOR_TYPE_STATIC and tensor.device() in QNNUtils.cpp and implement the conditional MLLM_RT_ASSERT branches to handle both runtime and AOT scenarios.

coderabbitai · 2026-01-07T03:54:27Z

mllm/backends/qnn/QNNUtils.cpp

  }
 }

+void __printQnnTensor(const Qnn_Tensor_t* tensor) {


⚠️ Potential issue | 🔴 Critical

Fix reserved identifier violation in function name.

The function name __printQnnTensor uses a leading double underscore, which is reserved for implementation use in C++. This violates the C++ standard and could conflict with compiler or standard library identifiers.

🔎 Proposed fix

-void __printQnnTensor(const Qnn_Tensor_t* tensor) { +void printQnnTensor(const Qnn_Tensor_t* tensor) {

Don't forget to update the header declaration in mllm/backends/qnn/qnnutils.hpp as well.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Clang (14.0.6)

[error] 618-618: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

🤖 Prompt for AI Agents

In @mllm/backends/qnn/QNNUtils.cpp at line 618, The function name uses a reserved leading double underscore (__printQnnTensor); rename it to a non-reserved identifier (e.g., printQnnTensor or Qnn_PrintTensor) and update its declaration in qnnutils.hpp and any call sites to the new name, ensuring linkage/extern "C" attributes remain unchanged; also run a project-wide search for __printQnnTensor to fix all references and rebuild to verify no symbol mismatches.

coderabbitai · 2026-01-07T03:54:27Z

mllm/backends/qnn/QNNUtils.hpp


 bool freeQnnTensors(Qnn_Tensor_t*& tensors, uint32_t numTensors);

+void __printQnnTensor(const Qnn_Tensor_t* tensor);  // for debug use


⚠️ Potential issue | 🔴 Critical

Reserved identifier violation: rename the debug function.

The function name __printQnnTensor uses a double underscore prefix, which is reserved by the C++ standard. This can lead to undefined behavior or conflicts with compiler internals.

🔎 Proposed fix

-void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use +void printQnnTensorDebug(const Qnn_Tensor_t* tensor); // for debug use

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use

void printQnnTensorDebug(const Qnn_Tensor_t* tensor); // for debug use

🧰 Tools

🪛 Clang (14.0.6)

[error] 91-91: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

🤖 Prompt for AI Agents

In @mllm/backends/qnn/QNNUtils.hpp at line 91, The function name __printQnnTensor uses a double-underscore prefix which is reserved; rename it to a non-reserved identifier (e.g., printQnnTensor or debugPrintQnnTensor) in the declaration and update the matching definition and all call sites to the new name (ensure Qnn_Tensor_t reference stays unchanged and update any header/implementation pairs and tests that reference __printQnnTensor).

mllm/compile/ir/linalg/Op.hpp

chenghuaWang · 2026-01-07T05:25:51Z

mllm/backends/qnn/QNNUtils.hpp

    qnnParam_.scalarParam.uint32Value = static_cast<uint32_t>(value);
+  } else if constexpr (std::is_same_v<T, int32_t>) {
+    qnnParam_.scalarParam.dataType = QNN_DATATYPE_INT_32;
+    qnnParam_.scalarParam.uint32Value = static_cast<int32_t>(value);


qnnParam_.scalarParam.int32Value = static_cast<int32_t>(value);

chenghuaWang · 2026-01-07T05:26:41Z

mllm/compile/ir/linalg/Op.hpp


 LINALG_AOPS_DEFINE(EqualOp, EQUALOP);
 LINALG_AOPS_DEFINE(WhereOp, WHEREOP);



There is no corresponding LINALG_AOPS_DEFINE(GatherOp, GATHEROP);

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @mllm/backends/qnn/QNNUtils.hpp:
- Around line 91-92: The debug function name __printQnnTensor uses reserved
leading underscores; rename the function (e.g., printQnnTensor or
QnnPrintTensor) and update its declaration void __printQnnTensor(const
Qnn_Tensor_t* tensor), its definition, and every call site to the new identifier
so the prototype, implementation, and usages remain consistent and avoid
reserved-identifier UB.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 000e12b and c41efb9.

📒 Files selected for processing (1)

mllm/backends/qnn/QNNUtils.hpp

🧰 Additional context used

📓 Path-based instructions (3)

{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

mllm/backends/qnn/QNNUtils.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

mllm/backends/qnn/QNNUtils.hpp

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

mllm/backends/qnn/QNNUtils.hpp

🪛 Clang (14.0.6)

mllm/backends/qnn/QNNUtils.hpp

[error] 91-91: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-android
GitHub Check: build-x86
GitHub Check: build-macos

🔇 Additional comments (1)

mllm/backends/qnn/QNNUtils.hpp (1)

281-286: LGTM! Proper separation of signed and unsigned integer handling.

The split into separate branches for uint32_t and int32_t correctly maps each type to its corresponding QNN datatype and struct field, ensuring type safety and proper handling of signed integers.

coderabbitai · 2026-01-07T06:20:16Z

mllm/backends/qnn/QNNUtils.hpp

+void __printQnnTensor(const Qnn_Tensor_t* tensor);  // for debug use
+


⚠️ Potential issue | 🔴 Critical

Reserved identifier violation: rename the debug function.

The function name __printQnnTensor uses a double underscore prefix, which is reserved by the C++ standard for compiler and standard library implementations. This can lead to undefined behavior or conflicts with compiler internals.

🔎 Proposed fix

-void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use +void printQnnTensorDebug(const Qnn_Tensor_t* tensor); // for debug use

Note: Ensure the function definition and all call sites are updated to match the new name.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use

void printQnnTensorDebug(const Qnn_Tensor_t* tensor); // for debug use

🧰 Tools

🪛 Clang (14.0.6)

[error] 91-91: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

🤖 Prompt for AI Agents

In @mllm/backends/qnn/QNNUtils.hpp around lines 91 - 92, The debug function name __printQnnTensor uses reserved leading underscores; rename the function (e.g., printQnnTensor or QnnPrintTensor) and update its declaration void __printQnnTensor(const Qnn_Tensor_t* tensor), its definition, and every call site to the new identifier so the prototype, implementation, and usages remain consistent and avoid reserved-identifier UB.

chenghuaWang

LGTM

oreomaker added 11 commits January 6, 2026 07:26

feat(cpu): add GatherOp implementation for CPU backend

9a01660

feat(qnn/aot): add gather pattern support to LLM quantization recipe

3e5ee4f

refactor(qnn): update data type mappings and add debug utilities

b8bf680

feat(qnn): add support for per-channel and LPBQ quantization

7b740eb

feat(qnn-aot): add QNN AOT visitor implementations for multiple opera…

cca38ba

…tions

fix(qnn/aot): index operation input handling

37aa377

feat(qwen3_qnn_aot): optimize ROPE embedding computation

8160496

feat(qnn-aot): add Gather and RMSNorm visitor registeration

45c495f

feat(qnn): enhance Blockwise Expansion quantization(LPBQ) info display

32ebd47

Merge branch 'main' into qnn-aot

404d5da

oreomaker requested review from chenghuaWang, liang1232018 and yirongjie as code owners January 7, 2026 03:48

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

chenghuaWang changed the title ~~Qnn aot~~ feat(qualcomm): Qnn AOT Lowering Passes Jan 7, 2026

chenghuaWang requested changes Jan 7, 2026

View reviewed changes

fix(qnn utils): add int32 case in QNNParamScalarWrapper

c41efb9

oreomaker force-pushed the qnn-aot branch from 000e12b to c41efb9 Compare January 7, 2026 06:17

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

fix(linalg): remove unused GatherOp declaration

f723aba

chenghuaWang approved these changes Jan 7, 2026

View reviewed changes

chenghuaWang merged commit 545677e into UbiquitousLearning:main Jan 7, 2026
4 checks passed

		if (idx < 0) idx += dim_size;

		if (idx < 0 \|\| idx >= dim_size) { continue; }

		int axis = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())->options().dim;
		qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));

-  int axis = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())->options().dim;
-  qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));
+  auto gather_aop = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp());
+  if (!gather_aop) {
+    MLLM_ERROR("Failed to cast to aops::GatherOp");
+    return false;
+  }
+  int axis = gather_aop->options().dim;
+  qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));

		weight_val->tensor_ = weight_val->tensor_.view(
		{weight_val->tensor_.shape()[0], weight_val->tensor_.shape()[1] * weight_val->tensor_.shape()[2]});

		// in AOT case, the tensor is all on CPU (TODO: handle this)
		// if (type != QNN_TENSOR_TYPE_STATIC) { MLLM_RT_ASSERT(tensor.device() == kQNN); }


		bool freeQnnTensors(Qnn_Tensor_t*& tensors, uint32_t numTensors);

		void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use

	void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use
	void printQnnTensorDebug(const Qnn_Tensor_t* tensor); // for debug use


		LINALG_AOPS_DEFINE(EqualOp, EQUALOP);
		LINALG_AOPS_DEFINE(WhereOp, WHEREOP);

feat(qualcomm): Qnn AOT Lowering Passes #584

feat(qualcomm): Qnn AOT Lowering Passes #584

Uh oh!

Conversation

oreomaker commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chenghuaWang Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

chenghuaWang Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

chenghuaWang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oreomaker commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 7, 2026 •

edited

Loading