Skip to content

Support for outputting embeddings#2

Open
joefutrelle wants to merge 2 commits into
mainfrom
output-embeddings
Open

Support for outputting embeddings#2
joefutrelle wants to merge 2 commits into
mainfrom
output-embeddings

Conversation

@joefutrelle

Copy link
Copy Markdown

This pull request adds penultimate-layer embedding output. A one-time ONNX graph surgery exposes the embedding tensor as a second model output, then --embeddings captures it alongside class scores in the same forward pass and writes it per bin.

  • add_embedding_output.py (new): auto-detects the pre-head tensor (final Gemm/MatMul input) and adds it to the model's graph outputs; --tensor-name override.
  • cli.py: new --embeddings / --embeddings-only / --embeddings-outfile flags; write_embeddings writes Parquet (pid + embedding as fixed_size_list)
  • sanstorch.py / withtorch.py: capture outputs[1] and accumulate it alongside scores.
  • pyproject.toml: new [embeddings] extra (pyarrow).
  • README: install row, options, and an Embeddings section.

Key questions:

  • is Parquet the right output format for embeddings?
  • is float16 sufficient output precision or is it a premature space optimization?
  • are the README changes accurate and sufficient?

@joefutrelle joefutrelle requested a review from sbatchelder June 5, 2026 16:47
@joefutrelle joefutrelle self-assigned this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant