From 115ead9622dd8af980d88a6edd5d7f07b89a6eb5 Mon Sep 17 00:00:00 2001 From: Tomoko Uchida Date: Sat, 2 Aug 2025 15:58:28 +0900 Subject: [PATCH 1/3] update embcli-gemini README --- packages/embcli-gemini/README.md | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/packages/embcli-gemini/README.md b/packages/embcli-gemini/README.md index d547d25..02f6c74 100644 --- a/packages/embcli-gemini/README.md +++ b/packages/embcli-gemini/README.md @@ -37,20 +37,21 @@ emb models GeminiEmbeddingModel Vendor: gemini Models: + * gemini-embedding-001 (aliases: ) * gemini-embedding-exp-03-07 (aliases: exp-03-07) * text-embedding-004 (aliases: text-004) * embedding-001 (aliases: ) Model Options: * task_type (str) - The type of task for the embedding. Supported task types: 'semantic_similarity', 'classification', 'clustering', 'retrieval_document', 'retrieval_query', 'question_answering', 'fact_verification', 'code_retrieval_query' -# get an embedding for an input text by text-embedding-004 model. -emb embed -m text-004 "Embeddings are essential for semantic search and RAG apps." +# get an embedding for an input text by gemini-embedding-001 model. +emb embed -m gemini-embedding-001 "Embeddings are essential for semantic search and RAG apps." -# get an embedding for an input text by text-embedding-004 model with task_type=retrieval_query. -emb embed -m text-004 "Embeddings are essential for semantic search and RAG apps." -o task_type retrieval_query +# get an embedding for an input text by gemini-embedding-001 model with task_type=retrieval_query. +emb embed -m gemini-embedding-001 "Embeddings are essential for semantic search and RAG apps." -o task_type retrieval_query -# calculate similarity score between two texts by text-embedding-004 model. the default metric is cosine similarity. -emb simscore -m text-004 "The cat drifts toward sleep." "Sleep dances in the cat's eyes." +# calculate similarity score between two texts by gemini-embedding-001 model. the default metric is cosine similarity. +emb simscore -m gemini-embedding-001 "The cat drifts toward sleep." "Sleep dances in the cat's eyes." 0.8025767622661093 ``` @@ -60,14 +61,14 @@ You can use the `emb` command to index documents and perform search by an image. ```bash # index example documents in the current directory. -emb ingest-sample -m text-004 -c catcafe --corpus cat-names-en +emb ingest-sample -m gemini-embedding-001 -c catcafe --corpus cat-names-en # or, you can give the path to your documents. # the documents should be in a CSV file with two columns: id and text. the separator should be comma. -emb ingest -m text-004 -c catcafe -f +emb ingest -m gemini-embedding-001 -c catcafe -f # search for a query in the indexed documents. -emb search -m text-004 -c catcafe -q "Who's the naughtiest one?" +emb search -m gemini-embedding-001 -c catcafe -q "Who's the naughtiest one?" Found 5 results: Score: 0.5264116432711389, Document ID: 28, Text: Loki: Loki is a mischievous and clever cat, always finding new ways to entertain himself, sometimes at his humans' expense. He is a master of stealth and surprise attacks on toys. Despite his playful trickery, Loki is incredibly charming and affectionate, easily winning hearts with his roguish appeal. Score: 0.5167245254962557, Document ID: 46, Text: Bandit: Bandit is a mischievous cat, often with mask-like markings, always on the lookout for his next playful heist of a toy or treat. He is clever and energetic, loving to chase and pounce. Despite his roguish name, Bandit is a loving companion who enjoys a good cuddle after his adventures. @@ -76,7 +77,7 @@ Score: 0.5047165435030156, Document ID: 97, Text: Alfie: Alfie is a cheerful and Score: 0.5034822716772406, Document ID: 71, Text: Archie: Archie is a friendly and slightly goofy ginger cat, always up for a bit of fun and a good meal. He is very sociable and loves attention from anyone willing to give it. Archie enjoys playful wrestling and will often follow his humans around, offering cheerful chirps and affectionate head-bumps. # multilingual search -emb search -m text-004 -c catcafe -q "一番のいたずら者は誰?" +emb search -m gemini-embedding-001 -c catcafe -q "一番のいたずら者は誰?" Found 5 results: Score: 0.45721307081132867, Document ID: 33, Text: Sophie: Sophie is a refined and intelligent cat, perhaps a Russian Blue, with a gentle demeanor. She is observant and thoughtful, often studying her surroundings before acting. Sophie enjoys quiet playtime and affectionate cuddles on her own terms, forming deep and meaningful bonds with her chosen humans with quiet grace. Score: 0.45709408404668733, Document ID: 11, Text: Shadow: Shadow is a mysterious black cat, often materializing silently beside you. He enjoys quiet observation from hidden spots, his golden eyes keenly watching everything. Though initially reserved, Shadow forms deep bonds, offering gentle head-bumps and soft purrs to those he trusts, an enigmatic yet loving companion. From 53216005501a9e940e3db843b7273c82c5d80cb3 Mon Sep 17 00:00:00 2001 From: Tomoko Uchida Date: Sat, 2 Aug 2025 16:00:32 +0900 Subject: [PATCH 2/3] update release note --- docs/docs/release_notes.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/docs/release_notes.md b/docs/docs/release_notes.md index c442542..a94fdda 100644 --- a/docs/docs/release_notes.md +++ b/docs/docs/release_notes.md @@ -1,5 +1,7 @@ # Release Notes +- [embcli-gemini Version 0.1.1](https://github.com/mocobeta/embcli/releases/tag/gemini-0.1.1) (2025-08-02) +- [embcli-jina Version 0.1.3](https://github.com/mocobeta/embcli/releases/tag/jina-0.1.3) (2025-08-02) - [embcli-chroma Version 0.1.0](https://github.com/mocobeta/embcli/releases/tag/chroma-0.1.0) (2025-07-27) - [embcli-core Version 0.1.2](https://github.com/mocobeta/embcli/releases/tag/core-0.1.2) (2025-07-27) - [embcli-voyage Version 0.1.2](https://github.com/mocobeta/embcli/releases/tag/voyage-0.1.2) (2025-06-07) From 007d1437798b421cc90bfaa3180a75dd3e83ae23 Mon Sep 17 00:00:00 2001 From: Tomoko Uchida Date: Sat, 2 Aug 2025 16:03:15 +0900 Subject: [PATCH 3/3] update docs of gemini and jina models --- docs/docs/model_plugins.md | 1 + docs/docs/multimodal_model_plugins.md | 25 ++++++++++++++++--------- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/docs/docs/model_plugins.md b/docs/docs/model_plugins.md index 164c637..5fa4785 100644 --- a/docs/docs/model_plugins.md +++ b/docs/docs/model_plugins.md @@ -93,6 +93,7 @@ GEMINI_API_KEY= GeminiEmbeddingModel Vendor: gemini Models: + * gemini-embedding-001 (aliases: ) * gemini-embedding-exp-03-07 (aliases: exp-03-07) * text-embedding-004 (aliases: text-004) * embedding-001 (aliases: ) diff --git a/docs/docs/multimodal_model_plugins.md b/docs/docs/multimodal_model_plugins.md index a54464c..cbcc6f1 100644 --- a/docs/docs/multimodal_model_plugins.md +++ b/docs/docs/multimodal_model_plugins.md @@ -56,32 +56,39 @@ JINA_API_KEY= ```bash emb models -JinaEmbeddingModel - Vendor: jina -... (snip) -JinaClipModel +JinaMultiModalModel Vendor: jina Models: + * jina-embeddings-v4 (aliases: jina-v4) * jina-clip-v2 (aliases: ) Model Options: - * task (str) - Downstream task for which the embeddings are used. Supported tasks: 'retrieval.query', 'retrieval.passage'. + * task (str) - Downstream task for which the embeddings are used. Supported tasks: 'retrieval.query', 'retrieval.passage', 'text-matching', 'code.query', 'code.passage'. + * late_chunking (bool) - Whether if the late chunking is applied. Only supported in jina-embeddings-v4. + * truncate (bool) - When enabled, the model will automatically drop the tail that extends beyond the maximum context length allowed by the model instead of throwing an error. Only supported in jina-embeddings-v4. * dimensions (int) - The number of dimensions the resulting output embeddings should have. * embedding_type (str) - The type of embeddings to return. Options include 'float', 'binary', 'ubinary'. Default is 'float'. +SentenceTransformerModel + Vendor: sbert + Models: + * sentence-transformers (aliases: sbert) + Default Local Model: all-MiniLM-L6-v2 + See https://sbert.net/docs/sentence_transformer/pretrained_models.html for available local models. + Model Options: ``` -**Example usage:** get an embedding for an input text by jina-clip-v2 model model with an option dimensions=512 and embedding_type=binary. +**Example usage:** get an embedding for an input text by jina-v4 model model with an option dimensions=512 and embedding_type=binary. ```bash -emb embed -m jina-clip-v2 -o dimensions 512 \ +emb embed -m jina-v4 -o dimensions 512 \ -o embedding_type binary \ "Owls can rotate their necks 270 degrees without injury🦉" ``` -**Example usage:** get an embedding for an input image by jina-clip-v2 model. +**Example usage:** get an embedding for an input image by jina-v4 model. ```bash # Assume you have an image file `gingercat.jpeg` in the current directory. -emb embed -m jina-clip-v2 --image gingercat.jpeg +emb embed -m jina-v4 --image gingercat.jpeg ``` ## [embcli-voyage](https://pypi.org/project/embcli-voyage/) for Voyage Multimodal Embeddings