Skip to content
Merged

V3/main #2190

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
429e1b1
Remove graph embedding and UMAP (#2048)
natoverse Sep 9, 2025
978e798
Remove file filtering (#2050)
natoverse Sep 9, 2025
97704ab
Remove text unit grouping (#2052)
natoverse Sep 9, 2025
04d9f58
Re-implement hierarchical Leiden (#2049)
natoverse Sep 9, 2025
8720acc
Merge branch 'main' into v3/main
natoverse Sep 15, 2025
84e6008
Merge branch 'main' into v3/main
natoverse Sep 17, 2025
b730530
Merge branch 'main' into v3/main
natoverse Sep 23, 2025
de767cc
Use 4.1 and text-embedding-3-large as defaults
natoverse Sep 24, 2025
d751682
Update comment
natoverse Sep 24, 2025
d7773bd
Clean vector store (#2077)
gaudyb Sep 26, 2025
ebe959a
Update v3/main missing config + functions (#2082)
andresmor-ms Sep 30, 2025
4364d67
Merge branch 'main' into v3/main
natoverse Oct 6, 2025
2b5284c
Merge branch 'main' into v3/main
natoverse Oct 7, 2025
79ad9b9
reduce schema fields (#2089)
gaudyb Oct 9, 2025
eb0dfe3
Remove strategy dicts (#2090)
natoverse Oct 10, 2025
6284cdd
Remove fnllm (#2095)
natoverse Oct 10, 2025
f7a8a08
Merge branch 'main' into v3/main
natoverse Oct 11, 2025
715be61
Sort deps alpha
natoverse Oct 11, 2025
b732445
Remove multi search (#2093)
natoverse Oct 11, 2025
5ec49fd
V3 docs and cleanup (#2100)
natoverse Oct 15, 2025
0436405
Remove document overwrite (#2101)
gaudyb Oct 16, 2025
1bb9fa8
Unified factory (#2105)
natoverse Oct 20, 2025
542d3db
Prefix vector store (#2106)
gaudyb Oct 21, 2025
c43a58c
fix for container name
Oct 23, 2025
6192692
Restructure project as monorepo. (#2111)
dworthen Nov 4, 2025
6b03af6
Fix formatting
natoverse Nov 4, 2025
6033e4f
Storage fixes and cleanup (#2118)
natoverse Nov 5, 2025
ae1f5e1
Nov 2025 housekeeping (#2120)
natoverse Nov 6, 2025
e0cce31
Graphrag config (#2119)
dworthen Nov 10, 2025
4512ce0
Empty graph guards (#2126)
natoverse Nov 11, 2025
a4ffc3d
Remove embeddings optional new (#2128)
gaudyb Nov 17, 2025
d6e6191
Format
natoverse Nov 17, 2025
7bf82b7
Add empty checks for NLP graphs (#2133)
natoverse Nov 17, 2025
20a96cb
Init command asks for models (#2137)
natoverse Nov 24, 2025
4404668
Add graphrag-storage. (#2127)
dworthen Dec 15, 2025
bffa400
Python update (3.13) (#2149)
natoverse Dec 15, 2025
3201f28
Add GraphRAG Cache package. (#2153)
dworthen Dec 16, 2025
c296f1a
Fix a bunch of module comments and function visibility (#2154)
natoverse Dec 17, 2025
c649d9f
Issue #2004 fix (#2159)
gaudyb Dec 31, 2025
fde14b6
Mismatch between header in community report generation prompt example…
gaudyb Dec 31, 2025
8fd7730
Chunker factory (#2156)
natoverse Jan 6, 2026
710fdad
Input factory (#2168)
natoverse Jan 12, 2026
22a4d29
DRIFT fixes (#2171)
natoverse Jan 13, 2026
b05709f
Vector package (#2172)
natoverse Jan 15, 2026
d7bd7c6
Fix smoke vector config
natoverse Jan 16, 2026
fc98a1d
Update index bug (#2173)
gaudyb Jan 17, 2026
c0a06ba
Add GraphRAG LLM package. (#2174)
dworthen Jan 22, 2026
d576857
Update documentation for v3 release (#2176)
gaudyb Jan 23, 2026
edfdbbe
Graphrag llm cleanup (#2181)
dworthen Jan 23, 2026
bb1d71f
Migration update (#2180)
natoverse Jan 27, 2026
9c66c96
Merge branch 'main' into v3/main
dworthen Jan 27, 2026
f06e57c
fix formatting.
dworthen Jan 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .github/workflows/gh-pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ permissions:
contents: write

env:
PYTHON_VERSION: "3.11"
PYTHON_VERSION: "3.13"

jobs:
build:
Expand All @@ -31,7 +31,7 @@ jobs:

- name: Install dependencies
shell: bash
run: uv sync
run: uv sync --all-packages

- name: mkdocs build
shell: bash
Expand Down
78 changes: 78 additions & 0 deletions .github/workflows/python-checks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
name: Python Build and Type Check
on:
push:
branches:
- "**/main" # match branches like feature/main
- "main" # match the main branch
pull_request:
types:
- opened
- reopened
- synchronize
- ready_for_review
branches:
- "**/main"
- "main"
paths-ignore:
- "**/*.md"
- ".semversioner/**"

permissions:
contents: read
pull-requests: read

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
# Only run the for the latest commit
cancel-in-progress: true

jobs:
python-ci:
# skip draft PRs
if: github.event.pull_request.draft == false
strategy:
matrix:
python-version: ["3.11", "3.13"]
os: [ubuntu-latest, windows-latest]
fail-fast: false # Continue running all jobs even if one fails
env:
DEBUG: 1

runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4

- uses: dorny/paths-filter@v3
id: changes
with:
filters: |
python:
- 'graphrag/**/*'
- 'uv.lock'
- 'pyproject.toml'
- '**/*.py'
- '**/*.toml'
- '**/*.ipynb'
- '.github/workflows/python*.yml'
- 'tests/**/*'

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install uv
uses: astral-sh/setup-uv@v6

- name: Install dependencies
shell: bash
run: |
uv sync --all-packages

- name: Check
run: |
uv run poe check

- name: Build
run: |
uv build --all-packages
7 changes: 3 additions & 4 deletions .github/workflows/python-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
if: github.event.pull_request.draft == false
strategy:
matrix:
python-version: ["3.10"]
python-version: ["3.13"]
os: [ubuntu-latest, windows-latest]
fail-fast: false # continue running all jobs even if one fails
env:
Expand Down Expand Up @@ -67,12 +67,11 @@ jobs:
- name: Install dependencies
shell: bash
run: |
uv sync
uv pip install gensim
uv sync --all-packages

- name: Build
run: |
uv build
uv build --all-packages

- name: Install and start Azurite
shell: bash
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/python-notebook-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,13 @@ jobs:
if: github.event.pull_request.draft == false
strategy:
matrix:
python-version: ["3.10"]
python-version: ["3.13"]
os: [ubuntu-latest, windows-latest]
fail-fast: false # Continue running all jobs even if one fails
env:
DEBUG: 1
GRAPHRAG_API_KEY: ${{ secrets.OPENAI_NOTEBOOK_KEY }}
GRAPHRAG_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GRAPHRAG_API_BASE: ${{ secrets.GRAPHRAG_API_BASE }}

runs-on: ${{ matrix.os }}
steps:
Expand Down Expand Up @@ -67,8 +68,7 @@ jobs:
- name: Install dependencies
shell: bash
run: |
uv sync
uv pip install gensim
uv sync --all-packages

- name: Notebook Test
run: |
Expand Down
14 changes: 4 additions & 10 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
branches: [main]

env:
PYTHON_VERSION: "3.10"
PYTHON_VERSION: "3.13"

jobs:
publish:
Expand All @@ -17,8 +17,6 @@ jobs:

environment:
name: pypi
url: https://pypi.org/p/graphrag

permissions:
id-token: write

Expand All @@ -38,14 +36,14 @@ jobs:

- name: Install dependencies
shell: bash
run: uv sync
run: uv sync --all-packages

- name: Export Publication Version
run: echo "version=$(uv version --short)" >> $GITHUB_OUTPUT

- name: Build Distributable
shell: bash
run: uv build
run: uv run poe build

- name: Inspect all distribution members and metadata
shell: bash
Expand Down Expand Up @@ -99,8 +97,4 @@ jobs:
PY

- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: dist
skip-existing: true
verbose: true
run: uv publish
7 changes: 3 additions & 4 deletions .github/workflows/python-smoke-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
if: github.event.pull_request.draft == false
strategy:
matrix:
python-version: ["3.10"]
python-version: ["3.13"]
os: [ubuntu-latest, windows-latest]
fail-fast: false # Continue running all jobs even if one fails
env:
Expand Down Expand Up @@ -72,12 +72,11 @@ jobs:
- name: Install dependencies
shell: bash
run: |
uv sync
uv pip install gensim
uv sync --all-packages

- name: Build
run: |
uv build
uv build --all-packages

- name: Install and start Azurite
shell: bash
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Python CI
name: Python Unit Tests
on:
push:
branches:
Expand Down Expand Up @@ -32,7 +32,7 @@ jobs:
if: github.event.pull_request.draft == false
strategy:
matrix:
python-version: ["3.10", "3.11"] # add 3.12 once gensim supports it. TODO: watch this issue - https://github.com/piskvorky/gensim/issues/3510
python-version: ["3.13"]
os: [ubuntu-latest, windows-latest]
fail-fast: false # Continue running all jobs even if one fails
env:
Expand Down Expand Up @@ -67,16 +67,7 @@ jobs:
- name: Install dependencies
shell: bash
run: |
uv sync
uv pip install gensim

- name: Check
run: |
uv run poe check

- name: Build
run: |
uv build
uv sync --all-packages

- name: Unit Test
run: |
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ output/lancedb
venv/
.conda
.tmp
packages/graphrag-llm/notebooks/metrics
packages/graphrag-llm/notebooks/cache

.env
build.zip
Expand Down Expand Up @@ -58,3 +60,6 @@ docsite/

# Jupyter notebook
.ipynb_checkpoints/

# Root build assets
packages/*/LICENSE
4 changes: 4 additions & 0 deletions .semversioner/next-release/major-20260123143225940955.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"type": "major",
"description": "Monorepo restructure\n\n New Packages:\n - graphrag-cache\n - graphrag-chunking\n - graphrag-common\n - graphrag-input\n - graphrag-llm\n - graphrag-storage\n - graphrag-vectors\n\n Changes:\n - New config: run graphrag init --force to reinitialize config with new layout and options."
}
41 changes: 35 additions & 6 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,19 @@
"args": [
"index",
"--root",
"<path_to_index_folder>"
"${input:root_folder}"
],
"console": "integratedTerminal"
},
{
"name": "Update",
"type": "debugpy",
"request": "launch",
"module": "graphrag",
"args": [
"update",
"--root",
"${input:root_folder}"
],
"console": "integratedTerminal"
},
Expand All @@ -21,10 +33,10 @@
"module": "graphrag",
"args": [
"query",
"${input:query}",
"--root",
"<path_to_index_folder>",
"--method", "basic",
"--query", "What are the top themes in this story",
"${input:root_folder}",
"--method", "${input:query_method}"
]
},
{
Expand All @@ -35,7 +47,7 @@
"args": [
"poe", "prompt-tune",
"--config",
"<path_to_ragtest_root_demo>/settings.yaml",
"${input:root_folder}/settings.yaml",
]
},
{
Expand Down Expand Up @@ -74,5 +86,22 @@
"console": "integratedTerminal",
"justMyCode": false
},
]
],
"inputs": [
{
"id": "root_folder",
"type": "promptString",
"description": "Enter the root folder path"
},
{
"id": "query_method",
"type": "promptString",
"description": "Enter the query method (e.g., 'global', 'local')"
},
{
"id": "query",
"type": "promptString",
"description": "Enter the query text"
}
]
}
2 changes: 1 addition & 1 deletion .vsts-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ trigger:

variables:
isMain: $[eq(variables['Build.SourceBranch'], 'refs/heads/main')]
pythonVersion: "3.10"
pythonVersion: "3.13"
poetryVersion: "1.6.1"
nodeVersion: "18.x"
artifactsFullFeedName: "Resilience/resilience_python"
Expand Down
29 changes: 29 additions & 0 deletions breaking-changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,35 @@ There are five surface areas that may be impacted on any given release. They are

> TL;DR: Always run `graphrag init --path [path] --force` between minor version bumps to ensure you have the latest config format. Run the provided migration notebook between major version bumps if you want to avoid re-indexing prior datasets. Note that this will overwrite your configuration and prompts, so backup if necessary.

# v3
Run the [migration notebook](./docs/examples_notebooks/index_migration_to_v3.ipynb) to convert older tables to the v3 format. Our main goals with v3 were to slim down the core library to minimize long-term maintenance of features that are either largely unused or should have been out of scope for a long time anyway.

## Data Model
We made minimal data model changes that will affect your index for v3. The primary breaking change is that we removed a rarely-used document-grouping capability that resulted in the `text_units` table having a `document_ids` column with a list instead of a single entry in a column called `document_id`. v3 fixes that, and the migration notebook applies the change so you don't need to re-index.

Most of the other changes we made are removal of fields that are no longer used or are out of scope. For example, we removed the UMAP step that generates x/y coordinates for the entities - new indexes will not produce these columns, but they won't hurt anything if they are in your existing tables.

## API
We have removed the multi-search variant from each search method in the API.

## Config

We did make several changes to the configuration model. The best way forward is to re-run `init`, which we always recommend for minor and major version bumps.

This is a summary of changes:
- Removed fnllm as underlying model manager, so the model types "openai_chat", "azure_openai_chat", "openai_embedding", and "azure_openai_embedding" are all invalid. Use "chat" or "embedding".
- fnllm also had an experimental rate limiting "auto" setting, which is no longer allowed. Use `null` in your config as a default, or set explicit limits to tpm/rpm.
- LiteLLM does require a model_provider, so add yours as appropriate. For example, if you previously used "openai_chat" for your model type, this would be "openai", and for "azure_openai_chat" this would be "azure".
- Collapsed the `vector_store` dict into a single root-level object. This is because we no longer support multi-search, and this dict required a lot of downstream complexity for that single use case.
- Removed the `outputs` block that was also only used for multi-search.
- Most workflows had an undocumented `strategy` config dict that allowed fine tuning of internal settings. These fine tunings are never used and had associated complexity, so we removed it.
- Vector store configuration now allows custom schema per embedded field. This overrides the need for the `container_name` prefix, which caused confusion anyway. Now, the default container name will simply be the embedded field name - if you need something custom, add the `index_schema` block and populate as needed.
- We previously supported the ability to embed any text field in the data model. However, we only ever use text_unit_text, entity_description, and community_full_content, so all others have been removed.
- Removed the `umap` and `embed_graph` blocks which were only used to add x/y fields to the entities. This fixed a long-standing dependency issue with graspologic. If you need x/y positions, see the [visualization guide](https://microsoft.github.io/graphrag/visualization_guide/) for using gephi.
- Removed file filtering from input document loading. This was essentially unused.
- Removed the groupby ability for text chunking. This was intended to allow short documents to be grouped before chunking, but is never used and added a bunch of complexity to the chunking process.


# v2

Run the [migration notebook](./docs/examples_notebooks/index_migration_to_v2.ipynb) to convert older tables to the v2 format.
Expand Down
1 change: 1 addition & 0 deletions dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ noqa
dtypes
ints
genid
isinstance

# Azure
abfs
Expand Down
Loading
Loading