add shared memory support to MemoryBucket so that it can be shared in multiprocess scenarios (tests) #164

amaximciuc · 2026-01-17T20:31:26Z

Title

introduce MemoryBucket.create_shared() so that a memorybucket can be used in tests involving multiprocessing.

Ticket / Task

What & Why

every now and then we were forced to rely on FSBucket for tests involving multiprocess. now, we introduce MemoryBucket.create_shared() so that a memorybucket can be used.

Scope

New feature
Bug fix
Refactor / tech-debt
Test / tooling only

Checklist (self-review)

See Section 5.

Risk / Impact

Latency critical path? ☐ Yes ☐ No
External API contract change? ☐ Yes ☐ No
Migration steps required? ☐ Yes ☐ No

Screenshots / Logs / Benchmarks

(only if relevant)

Author Self-Review Checklist

Resources:

PR guidelines

… multiprocess scenarios

coderabbitai · 2026-01-17T20:31:34Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds SharedMemoryBucket (subclass of MemoryBucket) that uses a multiprocessing.SyncManager to back storage and locking so buckets can be shared across processes. Constructor can accept an external manager or create one (tracking _manager and _owns_manager); when created, a weakref finalizer is registered to shut down the manager on GC. Shared instances replace in-memory _objects/_lock with manager.dict() and manager.RLock(). getstate was adjusted to avoid serializing the manager. Existing in-process MemoryBucket behavior is preserved when not using a manager. Tests added for pickling, manager-lifecycle, and multiprocess correctness.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MainProc as "Main Process\n(creates bucket)"
    participant SyncManager as "SyncManager\n(manager process)"
    participant Worker as "Worker Process"

    Client->>MainProc: create SharedMemoryBucket(manager?)
    alt manager not provided
        MainProc->>SyncManager: start new SyncManager()
        MainProc->>SyncManager: manager.dict()/manager.RLock()
        MainProc->>MainProc: set _objects, _lock, _manager, _owns_manager=True
        MainProc->>weakref: register finalizer to shutdown manager
    else manager provided
        MainProc->>SyncManager: use provided manager proxies
        MainProc->>MainProc: set _objects, _lock, _manager, _owns_manager=False
    end

    Client->>Worker: spawn worker (bucket pickled and sent)
    Worker->>SyncManager: proxy operations (dict get/put, RLock acquire/release)
    SyncManager->>Worker: proxy responses (storage and lock results)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.62% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title accurately describes the core feature: adding shared memory support to MemoryBucket for multiprocess scenarios.
Description check	✅ Passed	Description covers key sections with mostly complete information, though some non-critical self-review items remain unchecked (Docs Updated, IDE Type Checker, Optional mypy).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds shared memory support to MemoryBucket to enable safe usage across multiple processes. The implementation introduces a factory method create_shared() that uses Python's multiprocessing Manager to create process-safe shared dictionaries and locks, with automatic lifecycle management via weakref finalizers.

Changes:

Added create_shared() factory method to MemoryBucket with optional manager parameter
Modified __init__ to support both regular and shared storage backends via MutableMapping and generic lock types
Implemented __getstate__ for pickle support of shared buckets
Added comprehensive test suite covering shared memory functionality, pickling, and multiprocess scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 16 comments.

File	Description
python/bucketbase/memory_bucket.py	Added shared memory factory method, pickle support, and modified initialization to handle both regular and shared backends
python/tests/test_memory_bucket.py	Added SharedManagerMixin, TestSharedMemoryBucket (compatibility), TestMemoryBucketPickle, and TestSharedMemoryBucketMultiprocessing classes with comprehensive test coverage

python/bucketbase/memory_bucket.py

python/tests/test_memory_bucket.py

python/bucketbase/memory_bucket.py

python/tests/test_memory_bucket.py

python/bucketbase/memory_bucket.py

python/tests/test_memory_bucket.py

python/bucketbase/memory_bucket.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@python/bucketbase/memory_bucket.py`:
- Around line 61-69: The bucket currently drops the strong reference to the
provided Manager when should_shutdown is False which allows the Manager process
to be GC-ed and kills the proxy objects; always assign the provided manager to
bucket._manager (i.e., set bucket._manager = manager unconditionally) so the
bucket holds a strong reference, but keep the weakref.finalize(manager.shutdown)
registration only when should_shutdown is True; note that __getstate__ already
clears _manager for pickling so this change won't affect serialization.

🧹 Nitpick comments (2)

python/tests/test_memory_bucket.py (2)
78-95: Shared manager can linger if the multiprocess class doesn’t run.
Shutdown happens only in TestSharedMemoryBucketMultiprocessing. If that class is skipped, the Manager process sticks around. Consider registering an atexit cleanup when the shared manager is first created.
🧹 Suggested cleanup hook
+import atexit
 import multiprocessing
 import pickle
@@
     def get_shared_manager(cls):
         if cls._shared_manager is None:
             ctx = multiprocessing.get_context("spawn")
             cls._shared_manager = ctx.Manager()
+            atexit.register(cls.shutdown_shared_manager)
         return cls._shared_manager
190-234: Check worker exit codes to surface silent crashes.
A worker can throw after writing, and the assertions may still pass. Explicitly validating exitcode makes failures unmissable.
✅ Add exitcode checks
         for p in processes:
             p.join()
+            self.assertEqual(p.exitcode, 0, f"Worker {p.pid} failed with exit code {p.exitcode}")
         return bucket

python/bucketbase/memory_bucket.py

…e multiprocessing tests

…hread synchronization and reliability

try to reduce flakiness in play.min.io tests ("max retries exceeded")

try to reduce flakiness in play.min.io tests ("The write operation timed out")

…ion and streamline steps speedup

…oetry

…setup

…bility during parquet file writes targeting CI & play.min.io

amaximciuc · 2026-01-20T19:57:15Z

@asuiu please review? 🙏

there is still some flakiness in tests(WRT play.min.io and package installations); I've tried to reduce as much as possible but still not a sure-thing; empirically, rerunning the failed jobs does a good job.

also @asuiu - please advise on #164 (comment) 🙏

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/python-package.yml:
- Around line 32-43: Update the GitHub Action usage from actions/setup-python@v4
to the v6 release (e.g., actions/setup-python@v6 or `@v6.1.0`) to modernize the
runner (note v6 requires Node.js 24 on the host); then replace the unconditional
poetry lock step with a non-mutating check such as using poetry check --lock or
run poetry install --no-update instead of poetry lock to avoid regenerating the
lockfile in CI (refer to the existing uses: actions/setup-python@v4 and the
commands poetry lock and poetry install --with dev).

coderabbitai · 2026-01-20T19:59:22Z

.github/workflows/python-package.yml

      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies
+        env:
+          POETRY_VIRTUALENVS_PREFER_UV: "true"
        run: |
-          python -m pip install --upgrade pip
-          python -m pip install poetry
+          uv pip install --system poetry
          cd python
          poetry lock
          poetry install --with dev


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

actions/setup-python GitHub Action latest version 2025

💡 Result:

The latest release is actions/setup-python v6.1.0 — published Nov 25, 2025. [1][2]

Sources:

actions/setup-python repository. [1]

Releases page (v6.1.0). [2]

🌐 Web query:

actions/setup-python v4 deprecated v5 v6 release history

💡 Result:

Summary timeline for actions/setup-python (major points)

v4 — introduced python-version-file, PyPy fixes and other v4 features. (newreleases.io)

v5 — series of v5.x releases that updated dependencies and moved the action to newer Node runtimes (addressing Node 16 deprecation); some v5 users reported caching failures tied to an old actions/cache pin (see issue). (github.com)

v6 — v6.0.0 (breaking change) upgrades the action to Node.js 24 (runner requirement mentioned in the release notes); v6.1.0 is a later patch with enhancements and dependency updates. (github.com)

Sources:

actions/setup-python releases (v6, v5, v4 changelogs). [GitHub releases]. (github.com)

Issue reporting actions/cache deprecation failures when using setup-python@v5. (github.com)

If you want exact release dates and full changelogs for each minor release (v4.x, v5.x, v6.x), I can list them with direct links.

Citations:

1: https://newreleases.io/project/github/actions/setup-python/release/v4.0.0?utm_source=openai

2: https://github.com/actions/setup-python/releases?utm_source=openai

3: https://github.com/actions/setup-python/releases

4: https://github.com/actions/setup-python/releases

5: @actions/cache needs to be updated actions/setup-python#1037

Upgrade actions/setup-python to v6 — v4 works fine, but v6 is where the action is heading.

v4 isn't actually deprecated; it's just old. The latest is v6.1.0 (released November 2025), which requires Node.js 24 and includes meaningful improvements. If you're modernizing the workflow, v6 is the move. (v5 exists as a midpoint with some caching quirks, so skip it.)

One secondary thought: running poetry lock on every CI run will regenerate the lock file if there's any drift. If someone forgets to commit an updated lock file, CI silently fixes it—masking the mistake. Consider using poetry check --lock or poetry install --no-update instead to verify the lock file matches without regenerating it.

Proposed fix

- name: Set up Python ${{ matrix.python-version }} - uses: actions/setup-python@v4 + uses: actions/setup-python@v6 with: python-version: ${{ matrix.python-version }}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Set up Python ${{ matrix.python-version }}

uses: actions/setup-python@v4

with:

python-version: ${{ matrix.python-version }}

- name: Install dependencies

env:

POETRY_VIRTUALENVS_PREFER_UV: "true"

run: |

python -m pip install --upgrade pip

python -m pip install poetry

uv pip install --system poetry

cd python

poetry lock

poetry install --with dev

- name: Set up Python ${{ matrix.python-version }}

uses: actions/setup-python@v6

with:

python-version: ${{ matrix.python-version }}

- name: Install dependencies

env:

POETRY_VIRTUALENVS_PREFER_UV: "true"

run: |

uv pip install --system poetry

cd python

poetry lock

poetry install --with dev

🧰 Tools

🪛 actionlint (1.7.10)

33-33: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents

In @.github/workflows/python-package.yml around lines 32 - 43, Update the GitHub Action usage from actions/setup-python@v4 to the v6 release (e.g., actions/setup-python@v6 or `@v6.1.0`) to modernize the runner (note v6 requires Node.js 24 on the host); then replace the unconditional poetry lock step with a non-mutating check such as using poetry check --lock or run poetry install --no-update instead of poetry lock to avoid regenerating the lockfile in CI (refer to the existing uses: actions/setup-python@v4 and the commands poetry lock and poetry install --with dev).

an upgrade to v6 could be done in the future, in a dedicated PR.

about the poetry lock file -- will ask the real maintainer(@asuiu ) on how to approach it.

@amaximciuc Fair enough. Ship when ready, iterate later. Makes sense to get @asuiu's input on the lock file strategy—better to have consistency across the repo than to optimize one workflow in isolation.

asuiu

The PR has a few design/structural concerns in the class implementation, but overall it looks acceptable. I’ll address the issues I found directly and proceed with merging.

asuiu · 2026-01-21T07:30:59Z

python/bucketbase/memory_bucket.py

        super().close()


+def _safe_manager_shutdown(manager: Any) -> None:


This would be more appropriate to be a static method of the class, since it's used just in one class

asuiu · 2026-01-21T07:32:50Z

python/bucketbase/memory_bucket.py

+        self._owns_manager: bool = False
+
+    @classmethod
+    def create_shared(cls, manager: Optional[SyncManager] = None) -> "MemoryBucket":


This looks like a hack/workaround to some problem, but I can't see any problem.
I can't find any issue with having a proper SharedMemoryBucket class, which is a standalone class, deriving from the simple MemoryBucket, so I'll refactor this to a standalone class

asuiu · 2026-01-21T07:33:39Z

python/bucketbase/memory_bucket.py

-        self._lock = RLock()
+        self._objects: dict[str, bytes] | DictProxy[str, bytes] = {}
+        self._lock: Any = RLock()
+        self._manager: SyncManager | None = None


These members are not used by the MemoryBucket methods, so they seem to be redundant here

add shared memory support to MemoryBucket so that it can be shared in…

aac36d5

… multiprocess scenarios

Copilot AI review requested due to automatic review settings January 17, 2026 20:31

Copilot started reviewing on behalf of amaximciuc January 17, 2026 20:31 View session

Copilot AI reviewed Jan 17, 2026

View reviewed changes

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

python/bucketbase/memory_bucket.py Outdated Show resolved Hide resolved

amaximciuc mentioned this pull request Jan 17, 2026

enhance file lock management in AppendOnlyFSBucket tests #165

Merged

17 tasks

amaximciuc added 12 commits January 19, 2026 12:49

Merge branch 'master' into picklable-shared-memorybucket

8a961f7

[copilot] Enhance MemoryBucket with shared manager support and improv…

7207b2e

…e multiprocessing tests

Increase MinIO timeout to 12 seconds in tests.

4465056

Refactor locking mechanism in AppendOnlyFSBucket tests for improved t…

d765c99

…hread synchronization and reliability

Add max-parallel configuration to build job

36ba9cf

try to reduce flakiness in play.min.io tests ("max retries exceeded")

Increase MinIO client timeout to 30 seconds in test setup

662511f

try to reduce flakiness in play.min.io tests ("The write operation timed out")

Refactor Python package workflow to use 'uv' for dependency installat…

b00245c

…ion and streamline steps speedup

Update dependency installation command to include --system flag for p…

df12452

…oetry

Add poetry lock step in workflow and fix directory name in container …

da0674f

…setup

Add poetry lock step in workflow and fix directory name in container …

4873d84

…setup

Increase MinIO client timeout to 30(from 12) seconds for improved sta…

8f4bdbb

…bility during parquet file writes targeting CI & play.min.io

[LLM review]

0bb2336

amaximciuc changed the title ~~[WIP] add shared memory support to MemoryBucket so that it can be shared in multiprocess scenarios~~ add shared memory support to MemoryBucket so that it can be shared in multiprocess scenarios (tests) Jan 20, 2026

amaximciuc requested a review from asuiu January 20, 2026 19:57

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

asuiu requested changes Jan 21, 2026

View reviewed changes

v1.5.4: Added SharedMemoryBucket for multiprocessing support

5b2b603

asuiu merged commit c96af4b into master Jan 21, 2026
22 checks passed

asuiu deleted the picklable-shared-memorybucket branch January 21, 2026 08:07

		super().close()


		def _safe_manager_shutdown(manager: Any) -> None:

add shared memory support to MemoryBucket so that it can be shared in multiprocess scenarios (tests) #164

add shared memory support to MemoryBucket so that it can be shared in multiprocess scenarios (tests) #164

Uh oh!

Conversation

amaximciuc commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title

Ticket / Task

What & Why

Scope

Checklist (self-review)

Risk / Impact

Screenshots / Logs / Benchmarks

Author Self-Review Checklist

Resources:

Uh oh!

coderabbitai bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Sequence Diagram(s)

Estimated code review effort

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amaximciuc commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amaximciuc Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

asuiu left a comment

Choose a reason for hiding this comment

Uh oh!

asuiu Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

asuiu Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

asuiu Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amaximciuc commented Jan 17, 2026 •

edited

Loading

coderabbitai bot commented Jan 17, 2026 •

edited

Loading

amaximciuc commented Jan 20, 2026 •

edited

Loading

coderabbitai bot Jan 20, 2026 •

edited

Loading