Skip to content

Query regarding Embedding Normalization and Similarity Score Range for PE-AV model #108

Description

@Preet-Sojitra

Hi there,

I am using facebook/pe-av-large following the example code provided in the model card (using the dot product: audio_embeds @ visual_embeds.T).

I noticed that the resulting similarity scores often exceed 1.0 (e.g., I am seeing scores around 1.1). This suggests the embeddings are not L2-normalized by default.

  1. Are the embeddings intended to be used as unnormalized dot products?
  2. Is there a known range for these scores? I am trying to set a threshold to filter "good" vs "bad" pairs. Should I manually L2-normalize the embeddings to interpret them as Cosine Similarity (-1 to 1)?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions