perf(notion_datasource): speed up get_authorized_pages for large work…#3171
Open
Kota-Maeda wants to merge 3 commits into
Open
perf(notion_datasource): speed up get_authorized_pages for large work…#3171Kota-Maeda wants to merge 3 commits into
Kota-Maeda wants to merge 3 commits into
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the Notion client to optimize workspace enumeration by combining page and database searches into a single pass and resolving parent IDs concurrently using a thread pool with memoization. Feedback highlights that error handling in the parent resolution logic should be broadened to prevent crashes from network errors, and the memoization implementation currently allows redundant I/O due to a race condition. Additionally, the shift to in-memory filtering for specific search methods may lead to performance regressions in large workspaces.
7fb2b2d to
48d9e1c
Compare
48d9e1c to
2536d6e
Compare
Kota-Maeda
added a commit
to Kota-Maeda/dify-official-plugins
that referenced
this pull request
May 25, 2026
PR langgenius#3192 already published 0.1.19 to the marketplace, so the "Check If Version Exists" CI step on PR langgenius#3171 fails because the plugin's manifest still claims 0.1.19. Bumping to 0.1.20 reserves a fresh version slot for the perf changes in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kota-Maeda
added a commit
to Kota-Maeda/dify-official-plugins
that referenced
this pull request
May 25, 2026
PR langgenius#3192 already published 0.1.19 to the marketplace, so the "Check If Version Exists" CI step on PR langgenius#3171 fails because the plugin's manifest still claims 0.1.19. Bumping to 0.1.20 reserves a fresh version slot for the perf changes in this PR.
dec6b61 to
8861f7c
Compare
Kota-Maeda
added a commit
to Kota-Maeda/dify-official-plugins
that referenced
this pull request
May 26, 2026
PR langgenius#3192 already published 0.1.19 to the marketplace, so the "Check If Version Exists" CI step on PR langgenius#3171 fails because the plugin's manifest still claims 0.1.19. Bumping to 0.1.20 reserves a fresh version slot for the perf changes in this PR.
8861f7c to
380f5cb
Compare
Contributor
Author
|
@cazziwork Could you confirm this PR? |
380f5cb to
2c60d54
Compare
dd34325 to
e3bf9d1
Compare
e3bf9d1 to
1c553f8
Compare
…spaces - Parallelize parent block resolution with ThreadPoolExecutor (configurable via NOTION_PARENT_RESOLVE_WORKERS, default 8) and memoize lookups with a thread-safe cache so shared ancestors are not refetched. - Replace the two filtered /v1/search loops (one for pages, one for databases) with a single un-filtered pass dispatched by object type, halving the search round-trips. - Route the three direct requests.* call sites through _make_request so that 429 / transient 5xx are retried uniformly. _make_request now accepts allow_status to preserve the existing 404 -> root fallback for inaccessible parent blocks. Public method signatures (notion_page_search, notion_database_search, notion_block_parent_page_id) are preserved as thin wrappers so existing callers keep working. Bump plugin version to 0.1.19.
- Widen the except clause in _build_page_entry to also catch requests.exceptions.RequestException, so a single HTTP error during parent resolution skips that item instead of aborting the whole enumeration (preserves the behaviour established by PR langgenius#2891). - Coalesce concurrent parent-block lookups via a thread-safe in-flight tracker (threading.Event per block_id), so multiple workers asking for the same ancestor share one HTTP request instead of racing past the cache miss and amplifying 429 backoff. - Restore API-side filtering in notion_page_search and notion_database_search by routing them through a new _search_filtered helper. They previously fetched everything and filtered in memory, which doubled the data transferred when callers used them independently.
PR langgenius#3207 published 0.1.20 to the marketplace, so the "Check If Version Exists" CI step rejects this PR again. Bumping to 0.1.21 reserves a fresh version slot for the perf changes in this PR.
1c553f8 to
54b564a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3170.
langgenius/notion_datasource@0.1.18cannot enumerate pages on Notion workspaces with more than ~1k shared items.get_authorized_pages()runs three phases serially (search pages, search databases, resolve every parent block — with recursion and no cache), which routinely exceeds the plugin-daemon SSE deadline (PLUGIN_MAX_EXECUTION_TIMEOUT, default 600s) and the request iskilled by timeout.This PR rewrites the hot path without changing the external behavior:
block_idparents in aThreadPoolExecutor(size configurable via the env varNOTION_PARENT_RESOLVE_WORKERS, default8, clamped to[1, 32]). Results are memoized in a thread-safedictso sibling pages that share an ancestor don't re-issue the same HTTP request./v1/searchloop. The previous two filtered loops (one forobject="page", one forobject="database") are replaced with a single un-filtered pass; items are dispatched byobjecttype. Halves the number of search round-trips.requests.post/getcall sites are routed through_make_request, which already handles 429 / transient 5xx with backoff._make_requestgains anallow_statusparameter so the existing "404 on a parent block → treat as root" behavior is preserved without re-introducing a direct request.Backwards compatibility:
notion_page_search,notion_database_search,notion_block_parent_page_idare kept as thin wrappers delegating to the new internals, so any external caller keeps working.OnlineDocumentPagefield values are unchanged. The only observable difference is the ordering of the returned list (parallel completion order rather than insertion order), which downstream code does not rely on.Change Type
Screenshots / Videos
This PR is an internal performance fix with no observable UI change.
Before/After behavior is shown below as plugin-daemon log excerpts.
Before — request killed by the 600s SSE deadline:
After — same workspace, same credential, completes well inside the deadline:
LLM Plugin Checklist
Not applicable — this is a datasource plugin.
Version
versioninmanifest.yaml(0.1.18→0.1.19, not the one undermeta)dify_plugin>=0.5.0is declared inpyproject.tomland locked inuv.lockA note on the template wording: the template suggests the literal range
dify_plugin>=0.3.0,<0.6.0, but adding an upper bound of<0.6.0makes the lock unsatisfiable in this plugin —dify_plugin0.5.x pinswerkzeug<3.1.dev0, while this plugin already requiredwerkzeug>=3.1.7before this PR. The current spec (>=0.5.0, no upper bound) matches the convention used by ~all other datasource plugins in this repo (confluence, dropbox, github, gitlab, onedrive, sharepoint, etc.).uv.lockis left as-is onupstream/main.Patch-level bump rationale:
manifest.yamlcapabilities orprovider/notion_datasource.yamlfieldsTesting
0.6.0-local)Verified manually against a multi-thousand-item Notion workspace that was previously hitting the 600s SSE deadline. After this change the same enumeration completes in ~60 seconds end to end with the default worker count. No regressions observed on a small (< 100 items) workspace.
Configuration
NOTION_PARENT_RESOLVE_WORKERS81–32(out-of-range values fall back / are clamped)1reproduces the old serial behavior. Increase cautiously — Notion's public rate limit is roughly 3 req/s and exceeding it just amplifies 429 backoff.