feat: LFM2.5 text-embedding & ColBERT (MLX/XNNPACK) with prompts and multi-vector output by NorbertKlockiewicz · Pull Request #1269 · software-mansion/react-native-executorch

NorbertKlockiewicz · 2026-06-19T09:43:09Z

Description

Adds two LFM2.5 retrieval models from Liquid AI and the API needed to use them, through the existing useTextEmbeddings hook — one native runner, one hook, no new public surface beyond optional model-config fields:

LFM2.5-Embedding-350M — dense bi-encoder (CLS pooling, dim 1024). Trained with asymmetric query: /document: prompts.
LFM2.5-ColBERT-350M — late-interaction retriever (Linear(1024→128) per token). Trained with [Q] /[D] prompts.

Both run on MLX on iOS (physical device) and XNNPACK on Android, quantized (MLX int4, XNNPACK 8da4w).

To support them without breaking the existing API, the model config grew three optional fields and forward became config-driven:

prompts — when present, forward requires a role ('query' | 'document') and auto-prepends the matching prompt.
multiVector — when true, forward returns a per-token EmbeddingResult (vectors, numTokens, embeddingDim, tokenIds); otherwise it returns a single pooled Float32Array as before.
skipListIds — punctuation token ids the consumer excludes from MaxSim scoring.

The library auto-applies the role prompts (the matching query: /[Q] prefix is prepended in forward), but late-interaction scoring (MaxSim) stays the consumer's concern — it runs wherever the vectors are stored. The example app demonstrates one way to score (its own local maxSim), and the ColBERT demo is folded into the unified text-embeddings screen, picking the scorer from the model's config.

Native side: TextEmbeddings::generate returns the raw [numTokens, embeddingDim] matrix as an EmbeddingResult; the TS layer reduces it. The empty BaseEmbeddings base class was removed (TextEmbeddings now extends BaseModel directly), and output-shape validation was extracted into TextEmbeddings::buildResult.

Review order: start with the TS types (types/textEmbeddings.ts — ForwardFn/ForwardReturn discriminated on the model config), then the module/hook (TextEmbeddingsModule.ts, useTextEmbeddings.ts), then the native TextEmbeddings.cpp/Types.h, then the registry/URLs and the example screen.

Introduces a breaking change?

Yes
No

forward stays non-breaking: pooled models still return Float32Array. The new return type and role requirement only apply to models that opt in via config.

Type of change

Bug fix (latent: existing models now add CLS/SEP special tokens — see Additional notes)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Open the text-embeddings example app.
Pick LFM2.5 Embedding (MLX on a physical iOS device, XNNPACK on Android/simulator) and run the example queries — weather → "sunny", match → home-team sentences should rank top.
Pick LFM2.5 ColBERT (late-interaction) — same corpus, scored with MaxSim; ordering should match.
Existing pooled models (MiniLM, MPNet, …) keep working unchanged.

C++ unit tests: TextEmbeddingsTests (incl. new EmbeddingResult metadata / tokenIds assertions) compiles and links under the Android NDK toolchain. The suite is cross-compiled, so it is not executed on the host in this setup.

Related issues

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

MLX requires a physical iOS device — the MLX delegate does not run on the simulator (use XNNPACK there). The two models are hosted on the Software Mansion Hugging Face org; docs are updated for both next and the 0.9.x versioned set.

…xSim Add the LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M models, served from HuggingFace (MLX on iOS, XNNPACK on Android / iOS simulator). Text embeddings are unified into one runner and one hook: the native TextEmbeddings model returns the raw [numTokens, embeddingDim] matrix (numTokens === 1 for pooled models, the full sequence for multi-vector / late-interaction models like ColBERT), plus the input token ids. The TS layer reduces it — toVector() for the single-vector case, getTokenVectors() and maxSim() for late interaction. Models trained with asymmetric query/document prompts (LFM uses query:/ document:, ColBERT uses [Q] /[D] ) carry a "prompts" config; forward then requires a role argument ('query' | 'document') that auto-prepends the prompt. The role is type-enforced: required for prompted models, forbidden for plain ones. Also: tokenizer post_processor is now applied for text embeddings so the BOS special token is added (CLS-pooled models depend on it), and the text-to-image Encoder reads the new EmbeddingResult. Example app gains a semantic-search screen and a ColBERT late-interaction search screen demonstrating MaxSim. Authored with Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Migrate the segment-anything (SAM) screen to toVector(forward()) — its CLIP-text path broke when forward started returning EmbeddingResult. - Update the C++ TextEmbeddings integration test for the EmbeddingResult return type (was still using the old OwningArrayBuffer pointer API). - Guard the per-token invariant: throw InvalidModelOutput if output rows != input token count (pooled numTokens==1 exempt), so skiplist masking can't silently misalign if a graph pads/truncates. - Dedup encode()/encodeWithSpecialTokens() into a shared encodeImpl. - Drop the redundant Float32Array copy at the JSI boundary; document the getTokenVectors view lifetime; remove dead BaseEmbeddings::postprocess. Authored with Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

forward(text) returns a single pooled Float32Array again for standard models — restoring the original API, so MiniLM/MPNet/CLIP/SAM consumers need no migration. The reduction (row 0 of the native [numTokens, embeddingDim] matrix) happens in the TS module, not at the call site. Multi-vector (late-interaction) models opt in via a `multiVector: true` config flag; for those, forward returns the full per-token EmbeddingResult so MaxSim/skiplist work. Return type is discriminated by the flag, and the role argument by `prompts` (required when prompted, none when not). Authored with Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ents Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

msluszniak · 2026-06-23T11:30:03Z

+    fontSize: 14,
+    fontWeight: '600',
+    color: '#0F172A',
+    fontVariant: ['tabular-nums'],


add nums to cspell or exclude demo apps from cspell.

msluszniak · 2026-06-23T11:34:22Z

+const maxSim = (
+  query: EmbeddingResult,
+  doc: EmbeddingResult,
+  skip: number[] = []
+) => {
+  const dim = query.embeddingDim;
+  const skipped = new Set(skip);
+  let score = 0;
+  for (let qi = 0; qi < query.numTokens; qi++) {
+    const qOff = qi * dim;
+    let best = -Infinity;
+    for (let di = 0; di < doc.numTokens; di++) {
+      if (skipped.has(doc.tokenIds[di])) continue;
+      const dOff = di * dim;
+      let dot = 0;
+      for (let k = 0; k < dim; k++) {
+        dot += query.vectors[qOff + k] * doc.vectors[dOff + k];
+      }
+      if (dot > best) best = dot;
+    }
+    if (best !== -Infinity) score += best;
+  }
+  return score;


I saw you used exactly the same function in demo app. Don't we want to expose it as a helper?

It probably won't hurt to expose it as a util, right now I did it the same way a dotProduct function is done to be consistent, but we can expose those two as a helper for text embeddings.

msluszniak · 2026-06-23T11:35:46Z

I know that we need to be reactive and upload the newest models to our lib to be available immediately, but I don't like the mechanism that it goes via patches. We should think of something better for the future to not edit released documentation etc.

I agree, it's especially hard when there's the need to update docs. Users on versions before the patch will look at v0.9.0 docs and it will differ from the things that library ships for them. I don't really have an idea for that so here we are to make a decision if we ship it in the patch or defer to 0.10.0.

are docs the problem or the entire flow? I dont think there is an option to avoid patches if we need to make code changes alongside. Waiting and batching models in a regular update makes it unnecessarily slow

I would make them pre-releases of the next version, not the patches of the already released version. Till release of v.0.10, we can keep this flow, but after that, we should release pre-0.11 or so, do not update versioned docs and make it smoother.

mkopcins · 2026-06-23T12:32:09Z

+export const maxSim = (
+  query: EmbeddingResult,
+  doc: EmbeddingResult,
+  skipListIds: number[] = []
+) => {
+  const dim = query.embeddingDim;
+  const skip = new Set(skipListIds);
+  let score = 0;
+  for (let qi = 0; qi < query.numTokens; qi++) {
+    const qOff = qi * dim;
+    let best = -Infinity;
+    for (let di = 0; di < doc.numTokens; di++) {
+      if (skip.has(doc.tokenIds[di]!)) continue;
+      const dOff = di * dim;
+      let dot = 0;
+      for (let k = 0; k < dim; k++) {
+        dot += (query.vectors[qOff + k] ?? 0) * (doc.vectors[dOff + k] ?? 0);
+      }
+      if (dot > best) best = dot;
+    }
+    if (best !== -Infinity) score += best;
+  }
+  return score;
+};


wouldnt it make sense to have it in the lib?

mkopcins · 2026-06-23T12:33:51Z

+- **Pooled models** (the default, e.g. MiniLM, MPNet, LFM2.5-Embedding) resolve to a single `Float32Array` — one normalized vector for the whole input.
+- **Multi-vector models** (`multiVector: true`, e.g. LFM2.5-ColBERT) resolve to an [`EmbeddingResult`](../../06-api-reference/interfaces/EmbeddingResult.md) with the per-token vectors (`vectors`, `numTokens`, `embeddingDim`, `tokenIds`).


maybe we could add a link to something explaining it what is the difference? maybe to liquid blog

mkopcins · 2026-06-23T12:34:17Z

+const maxSim = (
+  query: EmbeddingResult,
+  doc: EmbeddingResult,
+  skip: number[] = []
+) => {
+  const dim = query.embeddingDim;
+  const skipped = new Set(skip);
+  let score = 0;
+  for (let qi = 0; qi < query.numTokens; qi++) {
+    const qOff = qi * dim;
+    let best = -Infinity;
+    for (let di = 0; di < doc.numTokens; di++) {
+      if (skipped.has(doc.tokenIds[di])) continue;
+      const dOff = di * dim;
+      let dot = 0;
+      for (let k = 0; k < dim; k++) {
+        dot += query.vectors[qOff + k] * doc.vectors[dOff + k];
+      }
+      if (dot > best) best = dot;
+    }
+    if (best !== -Infinity) score += best;
+  }
+  return score;
+};


If we decide to move it inside our lib this is to be removed

mkopcins · 2026-06-23T12:37:21Z

are docs the problem or the entire flow? I dont think there is an option to avoid patches if we need to make code changes alongside. Waiting and batching models in a regular update makes it unnecessarily slow

mkopcins · 2026-06-23T12:42:43Z

+const LFM_COLBERT_SKIP_LIST = [
+  510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524,
+  535, 536, 537, 538, 539, 540, 541, 568, 569, 570, 571, 572, 573, 600, 601,
+  602, 603,
+];
+
+const LFM_COLBERT_PROMPTS = { query: '[Q] ', document: '[D] ' };


I dont think models specific things should be here, move them outside like it is done for tts or ocr

NorbertKlockiewicz force-pushed the @nk/lfm-embedding-mlx-xnnpack branch from b1f5bdd to 50e80e1 Compare June 22, 2026 10:46

NorbertKlockiewicz and others added 9 commits June 22, 2026 14:34

refactor: move skiplist to model config, MaxSim scoring to app

e12fb03

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(example): merge ColBERT search into text embeddings screen

d551b5f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: drop empty BaseEmbeddings layer, rename skipList, trim comm…

b911530

…ents Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: extract TextEmbeddings::buildResult, validate output rank

9691184

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: dedup LFM model configs, drop deleted util export, trim com…

8e494c4

…ments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: document text embeddings prompts, multi-vector & ColBERT MaxSim

c8e7769

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: align TextEmbeddingsModule JSDoc with LLMModule convention

a593082

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

NorbertKlockiewicz changed the title ~~@nk/lfm embedding mlx xnnpack~~ feat: LFM2.5 text-embedding & ColBERT (MLX/XNNPACK) with prompts and multi-vector output Jun 23, 2026

NorbertKlockiewicz and others added 2 commits June 23, 2026 10:42

test: assert EmbeddingResult metadata + tokenIds; clarify role JSDoc

f470d20

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' into @nk/lfm-embedding-mlx-xnnpack

ad94616

NorbertKlockiewicz self-assigned this Jun 23, 2026

NorbertKlockiewicz added the model Issues related to exporting, improving, fixing ML models label Jun 23, 2026

NorbertKlockiewicz and others added 2 commits June 23, 2026 10:54

fix: remove ===

93ee698

docs: list new model-config fields + correct forward return type

34e4c09

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

NorbertKlockiewicz marked this pull request as ready for review June 23, 2026 09:10

NorbertKlockiewicz requested review from benITo47, mkopcins and msluszniak June 23, 2026 09:13

msluszniak reviewed Jun 23, 2026

View reviewed changes

mkopcins requested changes Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LFM2.5 text-embedding & ColBERT (MLX/XNNPACK) with prompts and multi-vector output#1269

feat: LFM2.5 text-embedding & ColBERT (MLX/XNNPACK) with prompts and multi-vector output#1269
NorbertKlockiewicz wants to merge 14 commits into
mainfrom
@nk/lfm-embedding-mlx-xnnpack

NorbertKlockiewicz commented Jun 19, 2026 •

edited

Loading

Uh oh!

msluszniak Jun 23, 2026

Uh oh!

msluszniak Jun 23, 2026

Uh oh!

NorbertKlockiewicz Jun 23, 2026

Uh oh!

msluszniak Jun 23, 2026 •

edited

Loading

Uh oh!

NorbertKlockiewicz Jun 23, 2026

Uh oh!

mkopcins Jun 23, 2026

Uh oh!

msluszniak Jun 23, 2026

Uh oh!

Uh oh!

mkopcins Jun 23, 2026

Uh oh!

mkopcins Jun 23, 2026

Uh oh!

mkopcins Jun 23, 2026

Uh oh!

mkopcins Jun 23, 2026

Uh oh!

mkopcins Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		- Pooled models (the default, e.g. MiniLM, MPNet, LFM2.5-Embedding) resolve to a single `Float32Array` — one normalized vector for the whole input.
		- Multi-vector models (`multiVector: true`, e.g. LFM2.5-ColBERT) resolve to an [`EmbeddingResult`](../../06-api-reference/interfaces/EmbeddingResult.md) with the per-token vectors (`vectors`, `numTokens`, `embeddingDim`, `tokenIds`).

Conversation

NorbertKlockiewicz commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Related issues

Checklist

Additional notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NorbertKlockiewicz commented Jun 19, 2026 •

edited

Loading

msluszniak Jun 23, 2026 •

edited

Loading