Gnievesponce prompt tune embedd chunking by nievespg1 · Pull Request #1826 · microsoft/graphrag

nievespg1 · 2025-03-19T18:22:44Z

Description

When running prompt tune using the automatic selection method, the system will attempt to embed all the text chunks within one request no matter the size of the payload.

By default, the batch-size should not be larger than 16 text chunks and the token count should be below 8191 for the whole batch.

Related Issues

#1825

Proposed Changes

Modify graphrag/prompt_tune/loader/input.py to add logic that chunks large embeddings jobs/request similarly to how we do it in the indexing workflow. Here is an example workflow with a correct batching strategy: graphrag/index/operations/embed_text/strategies/openai.py

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

Additional Notes

No additional notes

…ine LaTeX within a markdown file

* Added support for embeddings chunking as defined by the config. * ran semvisor -t patch * Eliminated redunant code by using the embed_text strategy directly * Added fix to support brakets within the corpus text; For example, inline LaTeX within a markdown file --------- Co-authored-by: Gabriel Nieves <gnievesponce@microsoft.com>

Gabriel Nieves added 2 commits March 19, 2025 17:04

Added support for embeddings chunking as defined by the config.

70dae82

Merge branch 'main' into gnievesponce-prompt-tune-embedd-chunking

ddf02da

nievespg1 requested review from a team as code owners March 19, 2025 18:22

Gabriel Nieves added 2 commits March 19, 2025 18:27

ran semvisor -t patch

cc72f5f

Eliminated redunant code by using the embed_text strategy directly

3bcc028

AlonsoGuevara reviewed Mar 20, 2025

View reviewed changes

Comment thread graphrag/api/prompt_tune.py

Gabriel Nieves added 2 commits March 24, 2025 23:41

Added fix to support brakets within the corpus text; For example, inl…

0228379

…ine LaTeX within a markdown file

Merge branch 'main' into gnievesponce-prompt-tune-embedd-chunking

dc579e1

AlonsoGuevara approved these changes Mar 27, 2025

View reviewed changes

nievespg1 merged commit ffd8db7 into main Mar 31, 2025
15 checks passed

nievespg1 deleted the gnievesponce-prompt-tune-embedd-chunking branch March 31, 2025 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gnievesponce prompt tune embedd chunking#1826

Gnievesponce prompt tune embedd chunking#1826
nievespg1 merged 6 commits intomainfrom
gnievesponce-prompt-tune-embedd-chunking

nievespg1 commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nievespg1 commented Mar 19, 2025

Description

Related Issues

Proposed Changes

Checklist

Additional Notes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants