Skip to content

Conversation

@oreomaker
Copy link
Collaborator

@oreomaker oreomaker commented Jan 7, 2026

Please check Guidelines for Contributing.

Summary by CodeRabbit

  • New Features

    • Added Gather support end-to-end (API, CPU backend, and QNN AOT lowering) plus many new QNN AOT patterns (Linear, RMSNorm, MatMul, Concat, Transpose, Softmax, Repeat, Where, Equal) to broaden op coverage.
    • Added public gather functional API.
    • Added QNN tensor debug utility for inspection.
  • Bug Fixes

    • Corrected QNN quantization datatype mappings and improved handling for per-channel and LPBQ quantization; relaxed some runtime assertions.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Adds Gather operation support end-to-end (core aops, functional API, CPU op) and extensive QNN AOT lowering/quantization plumbing; refactors QNN utilities and updates the qwen3 example to use nn::functional::gather for embedding lookups.

Changes

Cohort / File(s) Summary
Core Gather Operation
mllm/core/aops/GatherOp.hpp, mllm/core/aops/GatherOp.cpp
New GatherOp and GatherOpOptions (dim). Implements trace, reshape (shape inference), forward stub (NYI).
Functional API
mllm/nn/Functional.hpp, mllm/nn/Functional.cpp
Adds mllm::nn::functional::gather(const Tensor&, int dim, const Tensor&) wrapper calling GatherOp.
CPU Backend Gather
mllm/backends/cpu/ops/GatherOp.hpp, mllm/backends/cpu/ops/GatherOp.cpp, mllm/backends/cpu/CPUBackend.cpp
Adds CPUGatherOp implementation and factory; registers CPUGatherOpFactory in CPUBackend; forward copies per-element (index validation, negative indices, dtype checks).
QNN Utilities & Wrappers
mllm/backends/qnn/QNNUtils.hpp, mllm/backends/qnn/QNNUtils.cpp, mllm/backends/qnn/aot/QnnWrappersAPI.hpp, mllm/backends/qnn/aot/QnnWrappersAPI.cpp
Maps host integer dtypes to fixed-point QNN types; removes/assertions for AOT paths; adds debug __printQnnTensor; implements setupComplexTensorQuantization for per-channel and LPBQ handling and adjusts quant parsing.
QNN AOT Visitor Patterns
mllm/backends/qnn/aot/visitor/*{Gather,RMSNorm,Linear,Concat,Equal,Matmul,Repeat,Softmax,Transpose,Where}*.{hpp,cpp}
Many new visitor patterns that lower linalg ops to QNN AOT nodes (validate attrs, capture tensors, emit params). Gather, RMSNorm, Linear have denser logic (axis/weight reshape/LPBQ).
QNN AOT Passes & Quant Recipe
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp, mllm/backends/qnn/aot/passes/LLMQuantRecipePass.{cpp,hpp}
Registers new patterns (Gather, RMSNorm, Linear) in lowering pass; adds GatherPattern to quant-recipe pass and tweaks LPBQ block-axis arg for linear pattern. Note: duplicate GatherPattern declaration appears in header.
Index Visitor Update
mllm/backends/qnn/aot/visitor/Index.cpp
Adjusted indices emission for Gather (removed extra flag) and moved axis param inline when constructing QNN node.
IR Declarations
mllm/compile/ir/linalg/Op.hpp
Adds forward declaration(s) for GatherOp (duplicate declarations present).
Qwen3 Example
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
Replaced mutable local position_ids with a const reference; changed RoPE embedding indexing to use nn::functional::gather on sliced QDQ_ROPE outputs.

Sequence Diagram(s)

sequenceDiagram
  participant User as User Code
  participant NN as nn::functional::gather
  participant AOp as aops::GatherOp
  participant CPU as CPUGatherOp
  participant QNNLower as QNN AOT Lowering/Runtime

  Note over User,NN: High-level gather call
  User->>NN: gather(x, dim, indices)
  NN->>AOp: build/trace GatherOp(dim, inputs)
  alt runtime path: CPU execution
    AOp->>CPU: CPUGatherOp::forward(inputs)
    CPU-->>User: output tensor
  else runtime path: QNN AOT lowering
    AOp->>QNNLower: Lower to QNN Gather node (emit inputs, axis param)
    QNNLower-->>User: output prepared for QNN runtime
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

Suggested reviewers

  • liang1232018
  • yirongjie

Poem

🐰 I hopped through tensors, dims aligned,

From core to CPU and QNN combined.
I gathered indices, tiny and spry,
Sliced RoPEs danced, embeddings fly.
A joyful twitch — new ops on the run!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is essentially empty, containing only a template link with no substantive information about the implementation, motivation, or testing details. Add a detailed description explaining the QNN AOT lowering passes implementation, which operations are supported, key technical details, and any testing performed.
Docstring Coverage ⚠️ Warning Docstring coverage is 5.08% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding QNN AOT lowering passes to the qualcomm backend, which aligns with the comprehensive set of file additions and changes throughout the PR.

✏️ Tip: You can configure your own custom Pre-merge Checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mllm/backends/qnn/QNNUtils.cpp (1)

306-328: Fix semantic inconsistency in kInt16 type mapping.

The mapping of kInt16 → QNN_DATATYPE_UFIXED_POINT_16 is semantically incorrect. Signed integer types should not map to unsigned fixed-point types. This contradicts the quantized variant kInt16PerTensorSym → QNN_DATATYPE_SFIXED_POINT_16 used elsewhere in the same function (line 377), which correctly maps signed to signed fixed-point.

Change line 311 from QNN_DATATYPE_UFIXED_POINT_16 to QNN_DATATYPE_SFIXED_POINT_16 to maintain sign consistency.

The quantization parameter initialization for these types is properly handled via createQuantizeParams() and scale propagation is implemented correctly throughout the codebase.

🤖 Fix all issues with AI agents
In @mllm/backends/cpu/ops/GatherOp.cpp:
- Around line 57-59: In GatherOp.cpp the loop silently skips copies when "idx <
0 || idx >= dim_size", leaving parts of the output uninitialized; change the
Gather implementation to zero-initialize the output buffer/tensor (the
destination written by the gather) before the loop and, when an index is
out-of-bounds, either log a warning including the offending "idx" and "dim_size"
or raise an explicit error instead of continuing; ensure you reference the same
variables ("idx", "dim_size") and the output buffer in your changes so the
zero-fill happens once before the loop and all invalid-index cases are handled
with a clear log/error.

In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp:
- Around line 174-178: The blockwise expansion sets blockwise_expansion.axis =
cfg->ch_axis but incorrectly computes blockwise_expansion.numBlocksPerAxis using
v->tensor_.size(1); change the calculation to use the configured channel axis
like the other LPBQ fields (i.e., compute numBlocksPerAxis as
v->tensor_.size(cfg->ch_axis)) so the axis and block count are consistent when
cfg->ch_axis is not 1; update the code around Qnn_BlockwiseExpansion_t
blockwise_expansion and setBlockwiseQuantization usage to use
v->tensor_.size(cfg->ch_axis).

In @mllm/backends/qnn/aot/visitor/Concat.cpp:
- Around line 43-61: NormalizeAxis may produce an out-of-bounds or negative
value; after computing rank from first_input->tensor_.shape() and normalizing
axis (axis += rank), add explicit validation that 0 <= axis && axis < rank and
handle the error (throw an exception or log and abort) before proceeding, and
only then call QNNParamScalarWrapper::create("axis", (uint32_t)axis) to avoid
unsigned wraparound; update the Concat creation flow around
QnnAOTNodeOperation::create("Concat")/emplaceParamScalar to perform this check
early.

In @mllm/backends/qnn/aot/visitor/Gather.cpp:
- Around line 49-50: The dynamic_cast of gather_op->getAOp() to
mllm::aops::GatherOp* can return nullptr and is dereferenced immediately; change
the code to first assign the result to a local pointer (e.g., auto* gatherAOp =
dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())), check if gatherAOp is
nullptr, handle the error path (log via the existing logger, return/throw an
appropriate error, or abort the visitor) and only then read
gatherAOp->options().dim and call emplaceParamScalar, so you avoid undefined
behavior when the cast fails.

In @mllm/backends/qnn/aot/visitor/Linear.cpp:
- Around line 57-58: Ensure the tensor has at least 3 dimensions before
accessing shape()[0..2] in the reshape call: check the weight tensor's
rank/shape size (e.g., via tensor_.dim() or tensor_.shape().size()) in the code
around Linear.cpp where weight_val->tensor_ is being viewed, and if it is
smaller than 3 either return/throw a clear error (or log and skip) with context
(function/class name) or handle the alternative shape case; only perform the
view(...) when the validation passes to avoid out-of-bounds access.

In @mllm/backends/qnn/aot/visitor/Repeat.cpp:
- Line 45: Validate options.repeat_times before casting: check that int
repeat_times = options.repeat_times is > 0 (and optionally within a sane upper
bound) and return or throw a clear error if not; only then cast to uint32_t for
the QNN Tile call. Update both places where you cast repeat_times to uint32_t
(the initial assignment and the later usage around lines 54-55) to use the
validated value and emit a descriptive error mentioning
options.repeat_times/repeat_times so negative, zero, or excessively large values
cannot be silently converted into huge unsigned integers.

In @mllm/backends/qnn/aot/visitor/RMSNorm.cpp:
- Around line 40-45: The casted TensorValue pointers obtained for weight, i_0,
and o_0 (from
writer.getContext()->lookupSymbolTable(...)->outputs().front()->cast_<ir::tensor::TensorValue>()
and op->inputs()/outputs() casts) are used without null checks; add explicit
nullptr checks after each cast (for weight, i_0, and o_0) and handle failures by
returning early or throwing/logging a clear error so the function does not
dereference a null pointer — locate the casts in RMSNorm.cpp and guard each
cast_<ir::tensor::TensorValue>() result before further use.

In @mllm/backends/qnn/aot/visitor/Softmax.cpp:
- Around line 32-33: Validate that op has at least one input and one output and
that the front() elements and their casts succeed before using them: check
op->inputs().empty() and op->outputs().empty(), ensure op->inputs().front() and
op->outputs().front() are non-null, then perform the
cast_<ir::tensor::TensorValue>() and verify the returned pointers (input and
output) are non-null; if any check fails, return or raise an appropriate
error/log instead of dereferencing.
- Around line 42-48: The code adjusts a possibly-negative axis but never
verifies it falls into [0, rank-1], which can cause wraparound when later cast
to unsigned; after computing rank and applying "if (axis < 0) { axis += rank; }"
(referencing real_softmax_op->options().axis, axis, input->tensor_.shape(), and
rank), validate that axis >= 0 && axis < rank and handle invalid values (e.g.,
return an error/throw an exception or log and abort the op) rather than
proceeding; update the Softmax visitor to check this bound immediately after the
negative-axis adjustment and produce a clear error/early-exit if out of range.

In @mllm/backends/qnn/aot/visitor/Transpose.cpp:
- Around line 45-52: The permutation logic in Transpose.cpp uses options.dim0
and options.dim1 directly as indices into perm, which can be negative; normalize
both dims to positive equivalents (e.g., if dim < 0 then dim += rank) before
using them. Update the block that builds perm (after computing rank and perm
vector) to normalize options.dim0 and options.dim1 into local ints (or size_t)
and then perform the std::swap using those normalized indices so negative
dimensions behave the same as in CPU TransposeOp and other QNN AOT visitors
(refer to input->tensor_.shape(), rank, perm, and options.dim0/options.dim1).

In @mllm/backends/qnn/aot/visitor/Where.cpp:
- Around line 31-34: The code in Where.cpp assumes three inputs when calling
std::next on op->inputs() (accessing condition, true_input, false_input); first
validate the input count by checking op->inputs().size() (or equivalent) is >= 3
and handle the error path (throw, return, or log) before dereferencing
iterators; update the block that creates condition, true_input, and false_input
to guard against fewer inputs and use safe iterator/index access only after the
size check so std::next calls cannot go out of bounds.

In @mllm/backends/qnn/QNNUtils.cpp:
- Around line 310-312: The case handling for kInt16 incorrectly maps the signed
16-bit integer to QNN_DATATYPE_UFIXED_POINT_16 (unsigned); update the mapping in
the switch (case kInt16) to use the signed 16-bit fixed-point/type variant
(e.g., QNN_DATATYPE_FIXED_POINT_16 or the correct QNN signed-16 enum value used
elsewhere in the codebase) so that kInt16 maps to the signed QNN datatype
instead of QNN_DATATYPE_UFIXED_POINT_16.
- Line 618: The function name uses a reserved leading double underscore
(__printQnnTensor); rename it to a non-reserved identifier (e.g., printQnnTensor
or Qnn_PrintTensor) and update its declaration in qnnutils.hpp and any call
sites to the new name, ensuring linkage/extern "C" attributes remain unchanged;
also run a project-wide search for __printQnnTensor to fix all references and
rebuild to verify no symbol mismatches.
- Around line 453-454: Replace the commented-out device assertion with proper
conditional validation: keep the existing runtime check for non-AOT tensors by
reinstating MLLM_RT_ASSERT(tensor.device() == kQNN) when type !=
QNN_TENSOR_TYPE_STATIC, and add an explicit check for the AOT/static case (e.g.,
assert tensor.device() == kCPU or allow kCPU/kQNN as appropriate) to validate
expected placement; locate the assertion around the variables type,
QNN_TENSOR_TYPE_STATIC and tensor.device() in QNNUtils.cpp and implement the
conditional MLLM_RT_ASSERT branches to handle both runtime and AOT scenarios.

In @mllm/backends/qnn/QNNUtils.hpp:
- Line 91: The function name __printQnnTensor uses a double-underscore prefix
which is reserved; rename it to a non-reserved identifier (e.g., printQnnTensor
or debugPrintQnnTensor) in the declaration and update the matching definition
and all call sites to the new name (ensure Qnn_Tensor_t reference stays
unchanged and update any header/implementation pairs and tests that reference
__printQnnTensor).

In @mllm/compile/ir/linalg/Op.hpp:
- Line 84: Remove the duplicate forward declaration of GatherOp: locate the
second occurrence of "class GatherOp;" (the one at the later position) and
delete it so only the original forward declaration remains (leave the first
"class GatherOp;" intact).
🧹 Nitpick comments (8)
mllm/backends/qnn/aot/visitor/Index.cpp (2)

42-76: Consider adding inline comments and axis bounds validation.

The axis calculation logic is complex and would benefit from inline comments explaining the algorithm. Additionally, consider validating that the calculated axis is within the bounds of the input tensor's rank to catch configuration errors early.

Suggested improvements

Add inline comments:

// Iterate through index specifications to find the gather axis
// and extract indices tensor. Increment axis for each dimension
// until we find vector or tensor indices.
for (const auto& blob : indices_list) {
  // Skip dimensions with full slice (kAll, kAll)
  if (blob.slice_indices_.has_value()) {
    // ... existing code
  }
  // ... rest of logic
}

Add axis validation after line 81:

if (axis >= input->getType()->cast_<ir::tensor::TensorType>()->getShape().size()) {
  MLLM_ERROR("Calculated axis " + std::to_string(axis) + " exceeds input tensor rank");
  return false;
}

14-16: Add function documentation.

Both isMatch and rewrite functions lack documentation. Per coding guidelines, public APIs should have clear comments explaining purpose, parameters, returns, and errors. This is especially important for pattern implementations that are part of the AOT lowering infrastructure.

Suggested documentation
/// Checks if the operation matches the QNN AOT Index pattern.
/// @param op The operation to check
/// @return true if the operation is an IndexOp with QNN attributes
bool QnnAOTIndexPattern::isMatch(const mllm::ir::op_ptr_t& op) {
  return op->isa_<mllm::ir::linalg::IndexOp>() && (op->getAttr("using_qnn") != nullptr);
}

/// Rewrites an IndexOp to QNN Gather operation.
/// Extracts the gather axis and indices from the IndexOp and emits
/// a QNN Gather node to the AOT graph.
/// @param writer IR writer for creating new operations
/// @param op The IndexOp to rewrite
/// @return true if rewrite succeeded, false otherwise
/// @note QNN Gather only supports gathering on a single axis
bool QnnAOTIndexPattern::rewrite(ir::IRWriter& writer, const ir::op_ptr_t& op) {
  // ... existing implementation
}

Based on coding guidelines.

Also applies to: 18-98

mllm/backends/qnn/QNNUtils.cpp (1)

681-682: Improve formatting in debug output string.

Add spaces after colons for better readability in the blockwise expansion debug output.

🔎 Proposed fix
-      quant_str = "Blockwise Expansion (axis:" + std::to_string(t.quantizeParams.blockwiseExpansion->axis)
-                  + ", blockSize:" + std::to_string(t.quantizeParams.blockwiseExpansion->numBlocksPerAxis) + ")";
+      quant_str = "Blockwise Expansion (axis: " + std::to_string(t.quantizeParams.blockwiseExpansion->axis)
+                  + ", blockSize: " + std::to_string(t.quantizeParams.blockwiseExpansion->numBlocksPerAxis) + ")";
mllm/backends/qnn/aot/visitor/Softmax.cpp (1)

43-43: Extract beta as a named constant or clarify the comment.

The beta value is hardcoded to 1.0f. Unlike axis, which is retrieved from real_softmax_op->options(), beta is not configurable and is a fixed requirement of the QNN Softmax API. The comment "Default beta" is misleading. Consider extracting it as a named constant (e.g., constexpr float QNN_SOFTMAX_BETA = 1.0f;) at file scope for clarity and maintainability, or update the comment to reflect that it's a fixed QNN parameter.

mllm/backends/qnn/aot/visitor/Equal.cpp (1)

18-52: LGTM! Solid implementation following established patterns.

The rewrite implementation correctly:

  • Validates the operation cast and required attributes
  • Captures inputs/outputs through the AOT environment
  • Creates and registers a QNN ElementWiseEqual node
  • Follows the same structure as other AOT patterns (Gather, MatMul, Where, etc.)
Optional: Minor style consistency suggestion

Line 34 uses std::next(op->inputs().begin()) while other patterns like MatMul explicitly use std::next(op->inputs().begin(), 1). Though functionally equivalent, explicit parameters improve consistency.

-  auto input1 = (*std::next(op->inputs().begin()))->cast_<ir::tensor::TensorValue>();
+  auto input1 = (*std::next(op->inputs().begin(), 1))->cast_<ir::tensor::TensorValue>();
mllm/backends/qnn/aot/visitor/Gather.cpp (1)

32-37: Consider adding null checks for tensor casts.

The cast_<ir::tensor::TensorValue>() results at lines 33, 34, and 37 are used without validation. While similar patterns in the codebase also omit these checks, adding them would improve robustness.

🔎 Proposed fix
  // Inputs
  auto table = op->inputs().front()->cast_<ir::tensor::TensorValue>();
  auto indices = (*std::next(op->inputs().begin()))->cast_<ir::tensor::TensorValue>();

  // Output
  auto output = op->outputs().front()->cast_<ir::tensor::TensorValue>();
+
+  if (!table || !indices || !output) {
+    MLLM_ERROR("Failed to cast inputs/output to TensorValue");
+    return false;
+  }
mllm/core/aops/GatherOp.hpp (1)

24-24: Consider adding [[nodiscard]] to the accessor.

The options() accessor returns a const reference that should typically not be discarded. Adding [[nodiscard]] would help catch accidental misuse.

🔎 Proposed fix
-  inline const GatherOpOptions& options() const { return options_; }
+  [[nodiscard]] inline const GatherOpOptions& options() const { return options_; }
mllm/core/aops/GatherOp.cpp (1)

36-40: reserve underestimates final vector size.

new_shape.reserve(dim) only reserves space for dim elements, but the final size is dim + indices_shape.size() + (shape.size() - dim - 1). Consider reserving the correct final size to avoid reallocations:

  std::vector<int32_t> new_shape;
- new_shape.reserve(dim);
+ new_shape.reserve(dim + indices_shape.size() + (shape.size() - dim - 1));
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9202124 and 000e12b.

📒 Files selected for processing (37)
  • examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/backends/cpu/ops/GatherOp.cpp
  • mllm/backends/cpu/ops/GatherOp.hpp
  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/visitor/Concat.cpp
  • mllm/backends/qnn/aot/visitor/Concat.hpp
  • mllm/backends/qnn/aot/visitor/Equal.cpp
  • mllm/backends/qnn/aot/visitor/Equal.hpp
  • mllm/backends/qnn/aot/visitor/Gather.cpp
  • mllm/backends/qnn/aot/visitor/Gather.hpp
  • mllm/backends/qnn/aot/visitor/Index.cpp
  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/aot/visitor/Linear.hpp
  • mllm/backends/qnn/aot/visitor/Matmul.cpp
  • mllm/backends/qnn/aot/visitor/Matmul.hpp
  • mllm/backends/qnn/aot/visitor/RMSNorm.cpp
  • mllm/backends/qnn/aot/visitor/RMSNorm.hpp
  • mllm/backends/qnn/aot/visitor/Repeat.cpp
  • mllm/backends/qnn/aot/visitor/Repeat.hpp
  • mllm/backends/qnn/aot/visitor/Softmax.cpp
  • mllm/backends/qnn/aot/visitor/Softmax.hpp
  • mllm/backends/qnn/aot/visitor/Transpose.cpp
  • mllm/backends/qnn/aot/visitor/Transpose.hpp
  • mllm/backends/qnn/aot/visitor/Where.cpp
  • mllm/backends/qnn/aot/visitor/Where.hpp
  • mllm/compile/ir/linalg/Op.hpp
  • mllm/core/aops/GatherOp.cpp
  • mllm/core/aops/GatherOp.hpp
  • mllm/nn/Functional.cpp
  • mllm/nn/Functional.hpp
🧰 Additional context used
📓 Path-based instructions (4)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/aot/visitor/Repeat.cpp
  • mllm/backends/qnn/aot/visitor/Transpose.hpp
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/backends/qnn/aot/visitor/Equal.hpp
  • mllm/core/aops/GatherOp.hpp
  • mllm/backends/qnn/aot/visitor/Linear.hpp
  • mllm/backends/qnn/aot/visitor/Concat.hpp
  • mllm/backends/qnn/aot/visitor/Where.cpp
  • mllm/backends/qnn/aot/visitor/Gather.cpp
  • mllm/backends/qnn/aot/visitor/Softmax.cpp
  • mllm/backends/qnn/aot/visitor/RMSNorm.cpp
  • mllm/backends/qnn/aot/visitor/Matmul.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/visitor/Gather.hpp
  • mllm/backends/qnn/aot/visitor/RMSNorm.hpp
  • mllm/nn/Functional.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/cpu/ops/GatherOp.hpp
  • mllm/core/aops/GatherOp.cpp
  • mllm/backends/cpu/ops/GatherOp.cpp
  • mllm/backends/qnn/aot/visitor/Repeat.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/nn/Functional.hpp
  • mllm/backends/qnn/aot/visitor/Softmax.hpp
  • mllm/backends/qnn/aot/visitor/Equal.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Transpose.cpp
  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/aot/visitor/Index.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
  • mllm/backends/qnn/aot/visitor/Concat.cpp
  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/aot/visitor/Where.hpp
  • mllm/compile/ir/linalg/Op.hpp
  • mllm/backends/qnn/aot/visitor/Matmul.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/aot/visitor/Repeat.cpp
  • mllm/backends/qnn/aot/visitor/Transpose.hpp
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/backends/qnn/aot/visitor/Equal.hpp
  • mllm/core/aops/GatherOp.hpp
  • mllm/backends/qnn/aot/visitor/Linear.hpp
  • mllm/backends/qnn/aot/visitor/Concat.hpp
  • mllm/backends/qnn/aot/visitor/Where.cpp
  • mllm/backends/qnn/aot/visitor/Gather.cpp
  • mllm/backends/qnn/aot/visitor/Softmax.cpp
  • mllm/backends/qnn/aot/visitor/RMSNorm.cpp
  • mllm/backends/qnn/aot/visitor/Matmul.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/visitor/Gather.hpp
  • mllm/backends/qnn/aot/visitor/RMSNorm.hpp
  • mllm/nn/Functional.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/cpu/ops/GatherOp.hpp
  • mllm/core/aops/GatherOp.cpp
  • mllm/backends/cpu/ops/GatherOp.cpp
  • mllm/backends/qnn/aot/visitor/Repeat.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/nn/Functional.hpp
  • mllm/backends/qnn/aot/visitor/Softmax.hpp
  • mllm/backends/qnn/aot/visitor/Equal.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Transpose.cpp
  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/aot/visitor/Index.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
  • mllm/backends/qnn/aot/visitor/Concat.cpp
  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/aot/visitor/Where.hpp
  • mllm/compile/ir/linalg/Op.hpp
  • mllm/backends/qnn/aot/visitor/Matmul.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,py,pyi}: Prioritize production-ready code quality by evaluating time and space complexity of algorithms and data structures, and suggest more efficient alternatives for operations with high complexity (e.g., O(n^2) or worse) when feasible.
Avoid unnecessary object creation in loops or hot paths.
Check for proper error handling and resource cleanup (e.g., using try-finally, context managers, or RAII).
Ensure functions that can fail return appropriate error codes or raise exceptions.
Validate inputs for public APIs and critical internal functions.
Add comments for complex algorithms or non-obvious logic.
Identify potential security issues (e.g., buffer overflows, injection risks, insecure temporary files) and recommend using secure alternatives (e.g., parameterized queries, secure random generators).
Suggest adding unit tests for untested complex logic or edge cases.
Ensure code is testable by avoiding global state and using dependency injection.
Flag overly complex functions (e.g., high cyclomatic complexity) and suggest breaking them down.
Use named constants instead of magic numbers.
Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure.

Files:

  • mllm/backends/qnn/aot/visitor/Repeat.cpp
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/backends/qnn/aot/visitor/Where.cpp
  • mllm/backends/qnn/aot/visitor/Gather.cpp
  • mllm/backends/qnn/aot/visitor/Softmax.cpp
  • mllm/backends/qnn/aot/visitor/RMSNorm.cpp
  • mllm/nn/Functional.cpp
  • mllm/core/aops/GatherOp.cpp
  • mllm/backends/cpu/ops/GatherOp.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/visitor/Equal.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Transpose.cpp
  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/aot/visitor/Index.cpp
  • mllm/backends/qnn/aot/visitor/Concat.cpp
  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/aot/visitor/Matmul.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/aot/visitor/Repeat.cpp
  • mllm/backends/qnn/aot/visitor/Transpose.hpp
  • mllm/backends/cpu/CPUBackend.cpp
  • mllm/backends/qnn/aot/visitor/Equal.hpp
  • mllm/core/aops/GatherOp.hpp
  • mllm/backends/qnn/aot/visitor/Linear.hpp
  • mllm/backends/qnn/aot/visitor/Concat.hpp
  • mllm/backends/qnn/aot/visitor/Where.cpp
  • mllm/backends/qnn/aot/visitor/Gather.cpp
  • mllm/backends/qnn/aot/visitor/Softmax.cpp
  • mllm/backends/qnn/aot/visitor/RMSNorm.cpp
  • mllm/backends/qnn/aot/visitor/Matmul.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/visitor/Gather.hpp
  • mllm/backends/qnn/aot/visitor/RMSNorm.hpp
  • mllm/nn/Functional.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/cpu/ops/GatherOp.hpp
  • mllm/core/aops/GatherOp.cpp
  • mllm/backends/cpu/ops/GatherOp.cpp
  • mllm/backends/qnn/aot/visitor/Repeat.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/nn/Functional.hpp
  • mllm/backends/qnn/aot/visitor/Softmax.hpp
  • mllm/backends/qnn/aot/visitor/Equal.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/visitor/Transpose.cpp
  • mllm/backends/qnn/aot/visitor/Linear.cpp
  • mllm/backends/qnn/aot/visitor/Index.cpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.hpp
  • mllm/backends/qnn/aot/visitor/Concat.cpp
  • mllm/backends/qnn/QNNUtils.cpp
  • mllm/backends/qnn/aot/visitor/Where.hpp
  • mllm/compile/ir/linalg/Op.hpp
  • mllm/backends/qnn/aot/visitor/Matmul.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
🧬 Code graph analysis (21)
mllm/backends/qnn/aot/visitor/Repeat.cpp (6)
mllm/backends/qnn/aot/visitor/Index.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-98)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Concat.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-71)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Equal.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-52)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Gather.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-56)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Matmul.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-51)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/Where.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-53)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/Transpose.hpp (9)
mllm/backends/qnn/aot/visitor/Concat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Equal.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Gather.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Matmul.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Repeat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Softmax.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Where.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Index.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Equal.hpp (9)
mllm/backends/qnn/aot/visitor/Concat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Gather.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Matmul.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Repeat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Softmax.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Transpose.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Where.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Index.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/core/aops/GatherOp.hpp (1)
mllm/backends/cpu/ops/GatherOp.hpp (3)
  • options (20-22)
  • options (20-20)
  • inputs (15-15)
mllm/backends/qnn/aot/visitor/Linear.hpp (5)
mllm/backends/qnn/aot/visitor/Matmul.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Softmax.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Elewise.hpp (4)
  • op (14-14)
  • op (25-25)
  • writer (16-16)
  • writer (27-27)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/visitor/Concat.hpp (9)
mllm/backends/qnn/aot/visitor/Equal.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Gather.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Matmul.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Repeat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Softmax.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Transpose.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Where.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Index.hpp (2)
  • op (14-14)
  • writer (16-16)
examples/qwen3_qnn_aot/modeling_qwen_qnn_aot.hpp (1)
mllm/nn/Functional.hpp (1)
  • gather (165-165)
mllm/backends/qnn/aot/visitor/Softmax.cpp (8)
mllm/backends/qnn/aot/visitor/Concat.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-71)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Equal.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-52)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Gather.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-56)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Matmul.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-51)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/RMSNorm.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-66)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Repeat.cpp (4)
  • isMatch (15-17)
  • isMatch (15-15)
  • rewrite (19-84)
  • rewrite (19-19)
mllm/backends/qnn/aot/visitor/Transpose.cpp (4)
  • isMatch (15-17)
  • isMatch (15-15)
  • rewrite (19-76)
  • rewrite (19-19)
mllm/backends/qnn/aot/visitor/Where.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-53)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/RMSNorm.cpp (8)
mllm/backends/qnn/aot/visitor/Concat.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-71)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Equal.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-52)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Gather.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-56)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Matmul.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-51)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/Repeat.cpp (4)
  • isMatch (15-17)
  • isMatch (15-15)
  • rewrite (19-84)
  • rewrite (19-19)
mllm/backends/qnn/aot/visitor/Softmax.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-69)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Transpose.cpp (4)
  • isMatch (15-17)
  • isMatch (15-15)
  • rewrite (19-76)
  • rewrite (19-19)
mllm/backends/qnn/aot/visitor/Where.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-53)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/Gather.hpp (8)
mllm/backends/qnn/aot/visitor/Concat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Equal.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Matmul.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Repeat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Softmax.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Transpose.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Where.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/nn/Functional.cpp (2)
mllm/core/Tensor.cpp (1)
  • instance (427-427)
mllm/backends/qnn/aot/visitor/Gather.hpp (1)
  • kGather (18-20)
mllm/backends/cpu/ops/GatherOp.hpp (1)
mllm/core/aops/GatherOp.hpp (2)
  • inputs (20-20)
  • inputs (22-22)
mllm/core/aops/GatherOp.cpp (2)
mllm/core/aops/GatherOp.hpp (5)
  • GatherOp (16-16)
  • trace_context (18-18)
  • inputs (20-20)
  • inputs (22-22)
  • options_ (24-24)
mllm/backends/cpu/ops/GatherOp.hpp (3)
  • options (20-22)
  • options (20-20)
  • inputs (15-15)
mllm/backends/cpu/ops/GatherOp.cpp (3)
mllm/backends/cpu/ops/GatherOp.hpp (4)
  • CPUGatherOp (13-13)
  • options (20-22)
  • options (20-20)
  • inputs (15-15)
mllm/core/aops/GatherOp.cpp (2)
  • forward (21-23)
  • forward (21-21)
mllm/core/aops/GatherOp.hpp (2)
  • inputs (20-20)
  • inputs (22-22)
mllm/backends/qnn/aot/visitor/Repeat.hpp (9)
mllm/backends/qnn/aot/visitor/Concat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Equal.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Gather.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Matmul.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Softmax.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Transpose.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Where.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Index.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (7)
  • v (47-49)
  • v (47-47)
  • v (56-56)
  • v (58-58)
  • v (60-60)
  • v (62-62)
  • v (65-65)
mllm/backends/qnn/aot/visitor/Softmax.hpp (11)
mllm/backends/qnn/aot/visitor/Linear.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Concat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Equal.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Gather.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Matmul.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/RMSNorm.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Repeat.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Transpose.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Where.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/visitor/Elewise.hpp (4)
  • op (14-14)
  • op (25-25)
  • writer (16-16)
  • writer (27-27)
mllm/backends/qnn/aot/passes/MarkQnnGraphPass.hpp (1)
  • op (17-17)
mllm/backends/qnn/aot/visitor/Equal.cpp (8)
mllm/backends/qnn/aot/visitor/Index.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-98)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Concat.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-71)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Gather.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-56)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Matmul.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-51)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/RMSNorm.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-66)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Repeat.cpp (4)
  • isMatch (15-17)
  • isMatch (15-15)
  • rewrite (19-84)
  • rewrite (19-19)
mllm/backends/qnn/aot/visitor/Transpose.cpp (4)
  • isMatch (15-17)
  • isMatch (15-15)
  • rewrite (19-76)
  • rewrite (19-19)
mllm/backends/qnn/aot/visitor/Where.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-53)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/Linear.cpp (2)
mllm/backends/qnn/aot/visitor/Linear.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/QnnWrappersAPI.hpp (5)
  • qnn_context_name (219-219)
  • qnn_context_name (221-222)
  • qnn_context_name (224-225)
  • input (84-84)
  • output (86-86)
mllm/backends/qnn/aot/visitor/Matmul.cpp (8)
mllm/backends/qnn/aot/visitor/Concat.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-71)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Equal.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-52)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Gather.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-56)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Repeat.cpp (4)
  • isMatch (15-17)
  • isMatch (15-15)
  • rewrite (19-84)
  • rewrite (19-19)
mllm/backends/qnn/aot/visitor/Softmax.cpp (4)
  • isMatch (14-16)
  • isMatch (14-14)
  • rewrite (18-69)
  • rewrite (18-18)
mllm/backends/qnn/aot/visitor/Transpose.cpp (4)
  • isMatch (15-17)
  • isMatch (15-15)
  • rewrite (19-76)
  • rewrite (19-19)
mllm/backends/qnn/aot/visitor/Where.cpp (4)
  • isMatch (13-15)
  • isMatch (13-13)
  • rewrite (17-53)
  • rewrite (17-17)
mllm/backends/qnn/aot/visitor/Matmul.hpp (2)
  • op (14-14)
  • writer (16-16)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp (2)
mllm/compile/ir/linalg/Attribute.hpp (7)
  • node (27-27)
  • node (27-27)
  • node (327-327)
  • node (327-327)
  • node (348-348)
  • node (348-348)
  • create (77-77)
mllm/compile/ir/builtin/Attribute.hpp (10)
  • node (24-24)
  • node (24-24)
  • node (44-44)
  • node (44-44)
  • node (64-64)
  • node (64-64)
  • node (84-84)
  • node (84-84)
  • node (104-104)
  • node (104-104)
🪛 Clang (14.0.6)
mllm/backends/qnn/aot/visitor/Transpose.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Equal.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/core/aops/GatherOp.hpp

[error] 24-24: function 'options' should be marked [[nodiscard]]

(modernize-use-nodiscard,-warnings-as-errors)


[error] 27-27: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Linear.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Concat.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Matmul.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp

[error] 304-304: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Gather.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/RMSNorm.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/nn/Functional.cpp

[error] 215-215: 3 adjacent parameters of 'gather' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

mllm/backends/qnn/QNNUtils.hpp

[error] 91-91: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Repeat.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Softmax.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/aot/QnnWrappersAPI.hpp

[error] 65-65: parameter name 'v' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/backends/qnn/QNNUtils.cpp

[error] 618-618: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)


[error] 628-628: variable name 't' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 630-630: variable 'tensor_type' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 640-640: variable 'dtype_str' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 666-666: variable 'shape_str' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 673-673: variable 'quant_str' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 675-675: repeated branch in conditional chain

(bugprone-branch-clone,-warnings-as-errors)

mllm/backends/qnn/aot/visitor/Where.hpp

[error] 14-14: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 16-16: parameter name 'op' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-android
  • GitHub Check: build-macos
  • GitHub Check: build-x86

Comment on lines +57 to +59
if (idx < 0) idx += dim_size;

if (idx < 0 || idx >= dim_size) { continue; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent skip of out-of-bounds indices may leave output uninitialized.

When idx < 0 || idx >= dim_size after negative index adjustment, the code silently skips the copy, leaving that portion of the output tensor uninitialized. This could result in garbage values or undefined behavior downstream.

Consider either:

  1. Logging a warning/error for out-of-bounds indices.
  2. Zero-initializing the output tensor before the loop.
  3. Raising an error for invalid indices.
🔎 Proposed fix: Log warning and zero-fill
-      if (idx < 0 || idx >= dim_size) { continue; }
+      if (idx < 0 || idx >= dim_size) {
+        MLLM_WARN("GatherOp: index {} out of bounds [0, {}), filling with zeros", idx, dim_size);
+        int64_t dst_offset = (o * indices_count + i) * inner_size * data_type_size;
+        std::memset(output_ptr + dst_offset, 0, inner_size * data_type_size);
+        continue;
+      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (idx < 0) idx += dim_size;
if (idx < 0 || idx >= dim_size) { continue; }
if (idx < 0) idx += dim_size;
if (idx < 0 || idx >= dim_size) {
MLLM_WARN("GatherOp: index {} out of bounds [0, {}), filling with zeros", idx, dim_size);
int64_t dst_offset = (o * indices_count + i) * inner_size * data_type_size;
std::memset(output_ptr + dst_offset, 0, inner_size * data_type_size);
continue;
}
🤖 Prompt for AI Agents
In @mllm/backends/cpu/ops/GatherOp.cpp around lines 57 - 59, In GatherOp.cpp the
loop silently skips copies when "idx < 0 || idx >= dim_size", leaving parts of
the output uninitialized; change the Gather implementation to zero-initialize
the output buffer/tensor (the destination written by the gather) before the loop
and, when an index is out-of-bounds, either log a warning including the
offending "idx" and "dim_size" or raise an explicit error instead of continuing;
ensure you reference the same variables ("idx", "dim_size") and the output
buffer in your changes so the zero-fill happens once before the loop and all
invalid-index cases are handled with a clear log/error.

Comment on lines 174 to 178
Qnn_BlockwiseExpansion_t blockwise_expansion;
blockwise_expansion.axis = cfg->ch_axis;
blockwise_expansion.axis = cfg->ch_axis;
blockwise_expansion.scaleOffsets = nullptr; // Will be set by setBlockwiseQuantization
blockwise_expansion.numBlocksPerAxis = v->tensor_.size(cfg->ch_axis) / cfg->block_size;
blockwise_expansion.numBlocksPerAxis = v->tensor_.size(1) / cfg->block_size;
blockwise_expansion.blockScaleBitwidth = 12; // 12 bits for 4 to 16 expansion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, understand the structure of QuantizationSpecLPBQ and ch_axis
rg -n "ch_axis" --type cpp -B 2 -A 2 | head -80

Repository: UbiquitousLearning/mllm

Length of output: 6583


🏁 Script executed:

# Check how blockwise_expansion is used and what axis/numBlocksPerAxis mean
rg -n "blockwise_expansion" --type cpp -B 3 -A 3 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 1458


🏁 Script executed:

# Look for LPBQ configuration and how ch_axis is set
rg -n "QuantizationSpecLPBQ\|kLPBQ" --type cpp -B 2 -A 5 | head -150

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Check the tensor dimensions in context of weight tensors
rg -n "v->tensor_\.size\(" --type cpp -B 2 -A 2 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 1627


🏁 Script executed:

# Search for where LPBQ ch_axis is set in LLMQuantRecipePass
rg -n "QuantizationSpecLPBQ::create" --type cpp -B 5 -A 15 | head -150

Repository: UbiquitousLearning/mllm

Length of output: 4470


🏁 Script executed:

# Look for Linear layer quantization setup - typically ch_axis would be set there
rg -n "Linear\|MatMul" mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp -B 2 -A 8 | head -200

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Check what values ch_axis can have - look at initialization and usage patterns
rg -n "ch_axis\s*=" --type cpp -B 1 -A 1 | grep -v ".hpp" | head -100

Repository: UbiquitousLearning/mllm

Length of output: 81


🏁 Script executed:

# Found ch_axis = 0 at line 797. Let's verify if this is the only place LPBQ is created
rg -n "QuantizationSpecLPBQ::create" --type cpp -A 2 | head -50

Repository: UbiquitousLearning/mllm

Length of output: 831


🏁 Script executed:

# Check weight tensor dimensions - for Linear layers, shapes should be [out_features, in_features]
rg -n "shape\|size\(0\)|size\(1\)" mllm/backends/qnn/aot/QnnWrappersAPI.cpp | head -50

Repository: UbiquitousLearning/mllm

Length of output: 153


🏁 Script executed:

# Verify: does the code document what axis the blockwise quantization should be along?
rg -n "blockwise\|block_size" mllm/backends/qnn/aot/QnnWrappersAPI.cpp -B 2 -A 2 | head -80

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Let's check if there are any tests or comments explaining the relationship between ch_axis and tensor dims
rg -n "blockwise\|block_size" mllm/compile/ir/linalg/ -B 2 -A 2 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Check the full LPBQ quantization spec to understand semantics
cat mllm/compile/ir/linalg/Attribute.hpp | grep -A 30 "struct QuantizationSpecLPBQ"

Repository: UbiquitousLearning/mllm

Length of output: 1463


🏁 Script executed:

# Let's check the relationship more carefully - look at how scale_level_0_int is used
rg -n "scale_level_0_int" --type cpp -B 2 -A 2 | head -100

Repository: UbiquitousLearning/mllm

Length of output: 3644


🏁 Script executed:

# Check dimensions of weight tensors in Linear quantization - what are size(0) and size(1)?
rg -n "Linear.*weight\|register.*weight" mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp -B 3 -A 3 | head -80

Repository: UbiquitousLearning/mllm

Length of output: 49


🏁 Script executed:

# Look for any documentation or schema about blockwise expansion axis semantics
rg -n "Qnn_BlockwiseExpansion" --type cpp -B 5 -A 10 | head -150

Repository: UbiquitousLearning/mllm

Length of output: 5137


Mismatch between blockwise expansion axis and numBlocksPerAxis calculation.

blockwise_expansion.axis is set to cfg->ch_axis (line 175), but numBlocksPerAxis is computed using hardcoded v->tensor_.size(1) (line 177). At line 165, the code correctly uses v->tensor_.size(cfg->ch_axis) for other LPBQ fields; this pattern should be consistent. If ch_axis is 0 (as set in LLMQuantRecipePass.cpp:797), then numBlocksPerAxis should use size(0), not size(1):

Suggested fix
  blockwise_expansion.axis = cfg->ch_axis;
  blockwise_expansion.scaleOffsets = nullptr;  // Will be set by setBlockwiseQuantization
- blockwise_expansion.numBlocksPerAxis = v->tensor_.size(1) / cfg->block_size;
+ blockwise_expansion.numBlocksPerAxis = v->tensor_.size(cfg->ch_axis) / cfg->block_size;
🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/QnnWrappersAPI.cpp around lines 174 - 178, The
blockwise expansion sets blockwise_expansion.axis = cfg->ch_axis but incorrectly
computes blockwise_expansion.numBlocksPerAxis using v->tensor_.size(1); change
the calculation to use the configured channel axis like the other LPBQ fields
(i.e., compute numBlocksPerAxis as v->tensor_.size(cfg->ch_axis)) so the axis
and block count are consistent when cfg->ch_axis is not 1; update the code
around Qnn_BlockwiseExpansion_t blockwise_expansion and setBlockwiseQuantization
usage to use v->tensor_.size(cfg->ch_axis).

Comment on lines +43 to +61
// Handle negative axis
// We can use the first input to determine rank, assuming all inputs have same rank
auto first_input = op->inputs().front()->cast_<ir::tensor::TensorValue>();
auto input_shape = first_input->tensor_.shape();
int rank = input_shape.size();
if (axis < 0) { axis += rank; }

// Create QNN Op Node
auto qnn_op_node = QnnAOTNodeOperation::create("Concat");
qnn_op_node->setPackageName("qti.aisw");

// Add Inputs
for (auto& input_val : op->inputs()) {
auto input = input_val->cast_<ir::tensor::TensorValue>();
qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, input));
}

// Add Params
qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (uint32_t)axis));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add bounds validation for the normalized axis.

After normalizing the negative axis, the code should verify that the resulting axis is within the valid range [0, rank). Additionally, before casting to uint32_t at line 61, explicitly validate that axis is non-negative to prevent potential unsigned integer wraparound issues in case the normalization logic has edge cases.

🔎 Proposed fix to add axis bounds validation
  int rank = input_shape.size();
  if (axis < 0) { axis += rank; }
+ 
+ // Validate axis is in valid range
+ if (axis < 0 || axis >= rank) {
+   MLLM_ERROR("Axis " << axis << " is out of bounds for rank " << rank);
+   return false;
+ }

  // Create QNN Op Node
  auto qnn_op_node = QnnAOTNodeOperation::create("Concat");
🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/visitor/Concat.cpp around lines 43 - 61, NormalizeAxis
may produce an out-of-bounds or negative value; after computing rank from
first_input->tensor_.shape() and normalizing axis (axis += rank), add explicit
validation that 0 <= axis && axis < rank and handle the error (throw an
exception or log and abort) before proceeding, and only then call
QNNParamScalarWrapper::create("axis", (uint32_t)axis) to avoid unsigned
wraparound; update the Concat creation flow around
QnnAOTNodeOperation::create("Concat")/emplaceParamScalar to perform this check
early.

Comment on lines +49 to +50
int axis = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())->options().dim;
qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unchecked dynamic_cast may cause undefined behavior.

The dynamic_cast at line 49 can return nullptr if the cast fails, and directly accessing ->options().dim on a null pointer leads to undefined behavior.

🔎 Proposed fix: Add null check
  // Add scalar param axis
-  int axis = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())->options().dim;
+  auto gather_aop = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp());
+  if (!gather_aop) {
+    MLLM_ERROR("Failed to cast to aops::GatherOp");
+    return false;
+  }
+  int axis = gather_aop->options().dim;
   qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
int axis = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())->options().dim;
qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));
auto gather_aop = dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp());
if (!gather_aop) {
MLLM_ERROR("Failed to cast to aops::GatherOp");
return false;
}
int axis = gather_aop->options().dim;
qnn_op_node->emplaceParamScalar(QNNParamScalarWrapper::create("axis", (int32_t)axis));
🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/visitor/Gather.cpp around lines 49 - 50, The
dynamic_cast of gather_op->getAOp() to mllm::aops::GatherOp* can return nullptr
and is dereferenced immediately; change the code to first assign the result to a
local pointer (e.g., auto* gatherAOp =
dynamic_cast<mllm::aops::GatherOp*>(gather_op->getAOp())), check if gatherAOp is
nullptr, handle the error path (log via the existing logger, return/throw an
appropriate error, or abort the visitor) and only then read
gatherAOp->options().dim and call emplaceParamScalar, so you avoid undefined
behavior when the cast fails.

Comment on lines +57 to +58
weight_val->tensor_ = weight_val->tensor_.view(
{weight_val->tensor_.shape()[0], weight_val->tensor_.shape()[1] * weight_val->tensor_.shape()[2]});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add shape validation before tensor reshaping.

The code accesses shape()[0], shape()[1], and shape()[2] without validating that the weight tensor has at least 3 dimensions. This could cause an out-of-bounds access if the tensor shape doesn't meet expectations.

🔎 Proposed fix to add shape validation
+  // Validate weight tensor has expected 3D shape
+  if (weight_val->tensor_.shape().size() < 3) {
+    MLLM_ERROR("Weight tensor must have at least 3 dimensions, got " + 
+               std::to_string(weight_val->tensor_.shape().size()));
+    return false;
+  }
+
   weight_val->tensor_ = weight_val->tensor_.to(kUInt8);
 
   weight_val->tensor_ = weight_val->tensor_.view(
       {weight_val->tensor_.shape()[0], weight_val->tensor_.shape()[1] * weight_val->tensor_.shape()[2]});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
weight_val->tensor_ = weight_val->tensor_.view(
{weight_val->tensor_.shape()[0], weight_val->tensor_.shape()[1] * weight_val->tensor_.shape()[2]});
// Validate weight tensor has expected 3D shape
if (weight_val->tensor_.shape().size() < 3) {
MLLM_ERROR("Weight tensor must have at least 3 dimensions, got " +
std::to_string(weight_val->tensor_.shape().size()));
return false;
}
weight_val->tensor_ = weight_val->tensor_.to(kUInt8);
weight_val->tensor_ = weight_val->tensor_.view(
{weight_val->tensor_.shape()[0], weight_val->tensor_.shape()[1] * weight_val->tensor_.shape()[2]});
🤖 Prompt for AI Agents
In @mllm/backends/qnn/aot/visitor/Linear.cpp around lines 57 - 58, Ensure the
tensor has at least 3 dimensions before accessing shape()[0..2] in the reshape
call: check the weight tensor's rank/shape size (e.g., via tensor_.dim() or
tensor_.shape().size()) in the code around Linear.cpp where weight_val->tensor_
is being viewed, and if it is smaller than 3 either return/throw a clear error
(or log and skip) with context (function/class name) or handle the alternative
shape case; only perform the view(...) when the validation passes to avoid
out-of-bounds access.

Comment on lines 310 to 312
case kInt16: {
ret = QNN_DATATYPE_INT_16;
ret = QNN_DATATYPE_UFIXED_POINT_16;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Incorrect type mapping for kInt16.

Line 311 maps kInt16 (signed 16-bit integer) to QNN_DATATYPE_UFIXED_POINT_16 (unsigned fixed-point 16-bit). This is a sign mismatch that will cause incorrect value interpretation and potential data corruption.

🔎 Proposed fix
     case kInt16: {
-      ret = QNN_DATATYPE_UFIXED_POINT_16;
+      ret = QNN_DATATYPE_SFIXED_POINT_16;
       break;
     }
🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNUtils.cpp around lines 310 - 312, The case handling for
kInt16 incorrectly maps the signed 16-bit integer to
QNN_DATATYPE_UFIXED_POINT_16 (unsigned); update the mapping in the switch (case
kInt16) to use the signed 16-bit fixed-point/type variant (e.g.,
QNN_DATATYPE_FIXED_POINT_16 or the correct QNN signed-16 enum value used
elsewhere in the codebase) so that kInt16 maps to the signed QNN datatype
instead of QNN_DATATYPE_UFIXED_POINT_16.

Comment on lines +453 to +454
// in AOT case, the tensor is all on CPU (TODO: handle this)
// if (type != QNN_TENSOR_TYPE_STATIC) { MLLM_RT_ASSERT(tensor.device() == kQNN); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add proper device validation instead of commenting out the assertion.

The device type assertion has been commented out for AOT handling, but this removes an important safety check that could catch device placement errors. Instead of removing the check entirely, add conditional logic to handle both runtime and AOT cases properly.

🔎 Proposed fix
-  // in AOT case, the tensor is all on CPU (TODO: handle this)
-  // if (type != QNN_TENSOR_TYPE_STATIC) { MLLM_RT_ASSERT(tensor.device() == kQNN); }
+  // In AOT case, tensors may be on CPU; in runtime, non-static tensors must be on QNN
+  if (type != QNN_TENSOR_TYPE_STATIC) {
+    // TODO: Replace with proper AOT detection mechanism
+    bool isAOT = (tensor.device() == kCPU);
+    if (!isAOT) {
+      MLLM_RT_ASSERT(tensor.device() == kQNN);
+    }
+  }
🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNUtils.cpp around lines 453 - 454, Replace the
commented-out device assertion with proper conditional validation: keep the
existing runtime check for non-AOT tensors by reinstating
MLLM_RT_ASSERT(tensor.device() == kQNN) when type != QNN_TENSOR_TYPE_STATIC, and
add an explicit check for the AOT/static case (e.g., assert tensor.device() ==
kCPU or allow kCPU/kQNN as appropriate) to validate expected placement; locate
the assertion around the variables type, QNN_TENSOR_TYPE_STATIC and
tensor.device() in QNNUtils.cpp and implement the conditional MLLM_RT_ASSERT
branches to handle both runtime and AOT scenarios.

}
}

void __printQnnTensor(const Qnn_Tensor_t* tensor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix reserved identifier violation in function name.

The function name __printQnnTensor uses a leading double underscore, which is reserved for implementation use in C++. This violates the C++ standard and could conflict with compiler or standard library identifiers.

🔎 Proposed fix
-void __printQnnTensor(const Qnn_Tensor_t* tensor) {
+void printQnnTensor(const Qnn_Tensor_t* tensor) {

Don't forget to update the header declaration in mllm/backends/qnn/qnnutils.hpp as well.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Clang (14.0.6)

[error] 618-618: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNUtils.cpp at line 618, The function name uses a
reserved leading double underscore (__printQnnTensor); rename it to a
non-reserved identifier (e.g., printQnnTensor or Qnn_PrintTensor) and update its
declaration in qnnutils.hpp and any call sites to the new name, ensuring
linkage/extern "C" attributes remain unchanged; also run a project-wide search
for __printQnnTensor to fix all references and rebuild to verify no symbol
mismatches.


bool freeQnnTensors(Qnn_Tensor_t*& tensors, uint32_t numTensors);

void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reserved identifier violation: rename the debug function.

The function name __printQnnTensor uses a double underscore prefix, which is reserved by the C++ standard. This can lead to undefined behavior or conflicts with compiler internals.

🔎 Proposed fix
-void __printQnnTensor(const Qnn_Tensor_t* tensor);  // for debug use
+void printQnnTensorDebug(const Qnn_Tensor_t* tensor);  // for debug use
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use
void printQnnTensorDebug(const Qnn_Tensor_t* tensor); // for debug use
🧰 Tools
🪛 Clang (14.0.6)

[error] 91-91: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNUtils.hpp at line 91, The function name
__printQnnTensor uses a double-underscore prefix which is reserved; rename it to
a non-reserved identifier (e.g., printQnnTensor or debugPrintQnnTensor) in the
declaration and update the matching definition and all call sites to the new
name (ensure Qnn_Tensor_t reference stays unchanged and update any
header/implementation pairs and tests that reference __printQnnTensor).

@chenghuaWang chenghuaWang changed the title Qnn aot feat(qualcomm): Qnn AOT Lowering Passes Jan 7, 2026
qnnParam_.scalarParam.uint32Value = static_cast<uint32_t>(value);
} else if constexpr (std::is_same_v<T, int32_t>) {
qnnParam_.scalarParam.dataType = QNN_DATATYPE_INT_32;
qnnParam_.scalarParam.uint32Value = static_cast<int32_t>(value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qnnParam_.scalarParam.int32Value = static_cast<int32_t>(value);


LINALG_AOPS_DEFINE(EqualOp, EQUALOP);
LINALG_AOPS_DEFINE(WhereOp, WHEREOP);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no corresponding LINALG_AOPS_DEFINE(GatherOp, GATHEROP);

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @mllm/backends/qnn/QNNUtils.hpp:
- Around line 91-92: The debug function name __printQnnTensor uses reserved
leading underscores; rename the function (e.g., printQnnTensor or
QnnPrintTensor) and update its declaration void __printQnnTensor(const
Qnn_Tensor_t* tensor), its definition, and every call site to the new identifier
so the prototype, implementation, and usages remain consistent and avoid
reserved-identifier UB.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 000e12b and c41efb9.

📒 Files selected for processing (1)
  • mllm/backends/qnn/QNNUtils.hpp
🧰 Additional context used
📓 Path-based instructions (3)
{mllm,mllm-cli,pymllm}/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*: Files must not contain C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, or DEL 0x7F. Horizontal tab (0x09) and line feed (0x0A) are explicitly allowed.
All files must be encoded in UTF-8 without BOM.
Any violation of character set (Rule 1) or encoding (Rule 2) requirements must cause the review to fail.
No line may end with trailing whitespace.
Use Unix line endings (LF).
File and directory names must consist only of printable Unicode characters, excluding C0 control codes 0x00–0x08, 0x0B–0x0C, 0x0E–0x1F, C1 control codes 0x7F–0x9F, and DEL 0x7F.
Only use acceptable file extensions: .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .py, .pyi, .sh, .txt, .md, .yml, .yaml, .json, .toml.
Optional license headers, if present, must comply with character set rules (no C0/C1 control codes except tab and line feed).

Files:

  • mllm/backends/qnn/QNNUtils.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi,sh}: TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text that adheres to character set rules.
Encourage consistent coding style and patterns with the existing codebase.
Ensure code is portable across supported platforms (e.g., Linux, Windows) unless explicitly platform-specific.

Files:

  • mllm/backends/qnn/QNNUtils.hpp
{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

{mllm,mllm-cli,pymllm}/**/*.{c,cc,cpp,cxx,h,hh,hpp,py,pyi}: Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors.
Adhere to language-specific best practices and idioms (e.g., PEP 8 for Python, Google C++ Style Guide for C++).

Files:

  • mllm/backends/qnn/QNNUtils.hpp
🪛 Clang (14.0.6)
mllm/backends/qnn/QNNUtils.hpp

[error] 91-91: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-android
  • GitHub Check: build-x86
  • GitHub Check: build-macos
🔇 Additional comments (1)
mllm/backends/qnn/QNNUtils.hpp (1)

281-286: LGTM! Proper separation of signed and unsigned integer handling.

The split into separate branches for uint32_t and int32_t correctly maps each type to its corresponding QNN datatype and struct field, ensuring type safety and proper handling of signed integers.

Comment on lines +91 to +92
void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reserved identifier violation: rename the debug function.

The function name __printQnnTensor uses a double underscore prefix, which is reserved by the C++ standard for compiler and standard library implementations. This can lead to undefined behavior or conflicts with compiler internals.

🔎 Proposed fix
-void __printQnnTensor(const Qnn_Tensor_t* tensor);  // for debug use
+void printQnnTensorDebug(const Qnn_Tensor_t* tensor);  // for debug use

Note: Ensure the function definition and all call sites are updated to match the new name.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void __printQnnTensor(const Qnn_Tensor_t* tensor); // for debug use
void printQnnTensorDebug(const Qnn_Tensor_t* tensor); // for debug use
🧰 Tools
🪛 Clang (14.0.6)

[error] 91-91: declaration uses identifier '__printQnnTensor', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

🤖 Prompt for AI Agents
In @mllm/backends/qnn/QNNUtils.hpp around lines 91 - 92, The debug function name
__printQnnTensor uses reserved leading underscores; rename the function (e.g.,
printQnnTensor or QnnPrintTensor) and update its declaration void
__printQnnTensor(const Qnn_Tensor_t* tensor), its definition, and every call
site to the new identifier so the prototype, implementation, and usages remain
consistent and avoid reserved-identifier UB.

Copy link
Collaborator

@chenghuaWang chenghuaWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenghuaWang chenghuaWang merged commit 545677e into UbiquitousLearning:main Jan 7, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants