Skip to content

Conversation

@amaximciuc
Copy link
Contributor

@amaximciuc amaximciuc commented Jan 17, 2026

Title

introduce MemoryBucket.create_shared() so that a memorybucket can be used in tests involving multiprocessing.

Ticket / Task

https://trello.com/c/pfDRr1Xl/

What & Why

every now and then we were forced to rely on FSBucket for tests involving multiprocess. now, we introduce MemoryBucket.create_shared() so that a memorybucket can be used.

Scope

  • New feature
  • Bug fix
  • Refactor / tech-debt
  • Test / tooling only

Checklist (self-review)

See Section 5.

Risk / Impact

  • Latency critical path? ☐ Yes ☐ No
  • External API contract change? ☐ Yes ☐ No
  • Migration steps required? ☐ Yes ☐ No

Screenshots / Logs / Benchmarks

(only if relevant)

Author Self-Review Checklist

  • Single Ticket — PR addresses only one business requirement / Trello card.
  • Minimal Diff — no unrelated refactors, commented-out code, or debug prints.
  • Compiles & Tests Passpytest -q / mvn test clean locally.
  • No Dead Code — every new unit is called or covered by tests.
  • Naming & Clarity — identifiers are self-explanatory; no overloaded meanings.
  • Docs Updated — README, wiki, or docstrings updated where behaviour changed.
  • Performance Tagged — if touching hot path, attach micro-benchmarks or profiler diff.
  • Config & Secrets — no plaintext credentials; configs externalised.
  • Rollback Ready — change can be reverted with git revert without dependency hell.
  • Checklist Acknowledged — I would merge this myself if I were the reviewer.
  • LLM Review — PR has been reviewed by the approved LLM tool and suggestions addressed.
    • IDE Type Checker — PyCharm/Pylance recommendations were addressed where applicable.
    • Optional mypy — (Recommended) Ran python/scripts/run_mypy.bat locally with no critical errors

Resources:

Copilot AI review requested due to automatic review settings January 17, 2026 20:31
@coderabbitai
Copy link

coderabbitai bot commented Jan 17, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds SharedMemoryBucket (subclass of MemoryBucket) that uses a multiprocessing.SyncManager to back storage and locking so buckets can be shared across processes. Constructor can accept an external manager or create one (tracking _manager and _owns_manager); when created, a weakref finalizer is registered to shut down the manager on GC. Shared instances replace in-memory _objects/_lock with manager.dict() and manager.RLock(). getstate was adjusted to avoid serializing the manager. Existing in-process MemoryBucket behavior is preserved when not using a manager. Tests added for pickling, manager-lifecycle, and multiprocess correctness.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MainProc as "Main Process\n(creates bucket)"
    participant SyncManager as "SyncManager\n(manager process)"
    participant Worker as "Worker Process"

    Client->>MainProc: create SharedMemoryBucket(manager?)
    alt manager not provided
        MainProc->>SyncManager: start new SyncManager()
        MainProc->>SyncManager: manager.dict()/manager.RLock()
        MainProc->>MainProc: set _objects, _lock, _manager, _owns_manager=True
        MainProc->>weakref: register finalizer to shutdown manager
    else manager provided
        MainProc->>SyncManager: use provided manager proxies
        MainProc->>MainProc: set _objects, _lock, _manager, _owns_manager=False
    end

    Client->>Worker: spawn worker (bucket pickled and sent)
    Worker->>SyncManager: proxy operations (dict get/put, RLock acquire/release)
    SyncManager->>Worker: proxy responses (storage and lock results)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately describes the core feature: adding shared memory support to MemoryBucket for multiprocess scenarios.
Description check ✅ Passed Description covers key sections with mostly complete information, though some non-critical self-review items remain unchecked (Docs Updated, IDE Type Checker, Optional mypy).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds shared memory support to MemoryBucket to enable safe usage across multiple processes. The implementation introduces a factory method create_shared() that uses Python's multiprocessing Manager to create process-safe shared dictionaries and locks, with automatic lifecycle management via weakref finalizers.

Changes:

  • Added create_shared() factory method to MemoryBucket with optional manager parameter
  • Modified __init__ to support both regular and shared storage backends via MutableMapping and generic lock types
  • Implemented __getstate__ for pickle support of shared buckets
  • Added comprehensive test suite covering shared memory functionality, pickling, and multiprocess scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 16 comments.

File Description
python/bucketbase/memory_bucket.py Added shared memory factory method, pickle support, and modified initialization to handle both regular and shared backends
python/tests/test_memory_bucket.py Added SharedManagerMixin, TestSharedMemoryBucket (compatibility), TestMemoryBucketPickle, and TestSharedMemoryBucketMultiprocessing classes with comprehensive test coverage

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@python/bucketbase/memory_bucket.py`:
- Around line 61-69: The bucket currently drops the strong reference to the
provided Manager when should_shutdown is False which allows the Manager process
to be GC-ed and kills the proxy objects; always assign the provided manager to
bucket._manager (i.e., set bucket._manager = manager unconditionally) so the
bucket holds a strong reference, but keep the weakref.finalize(manager.shutdown)
registration only when should_shutdown is True; note that __getstate__ already
clears _manager for pickling so this change won't affect serialization.
🧹 Nitpick comments (2)
python/tests/test_memory_bucket.py (2)

78-95: Shared manager can linger if the multiprocess class doesn’t run.
Shutdown happens only in TestSharedMemoryBucketMultiprocessing. If that class is skipped, the Manager process sticks around. Consider registering an atexit cleanup when the shared manager is first created.

🧹 Suggested cleanup hook
+import atexit
 import multiprocessing
 import pickle
@@
     def get_shared_manager(cls):
         if cls._shared_manager is None:
             ctx = multiprocessing.get_context("spawn")
             cls._shared_manager = ctx.Manager()
+            atexit.register(cls.shutdown_shared_manager)
         return cls._shared_manager

190-234: Check worker exit codes to surface silent crashes.
A worker can throw after writing, and the assertions may still pass. Explicitly validating exitcode makes failures unmissable.

✅ Add exitcode checks
         for p in processes:
             p.join()
+            self.assertEqual(p.exitcode, 0, f"Worker {p.pid} failed with exit code {p.exitcode}")
         return bucket

@amaximciuc amaximciuc changed the title [WIP] add shared memory support to MemoryBucket so that it can be shared in multiprocess scenarios add shared memory support to MemoryBucket so that it can be shared in multiprocess scenarios (tests) Jan 20, 2026
@amaximciuc
Copy link
Contributor Author

amaximciuc commented Jan 20, 2026

@asuiu please review? 🙏

there is still some flakiness in tests(WRT play.min.io and package installations); I've tried to reduce as much as possible but still not a sure-thing; empirically, rerunning the failed jobs does a good job.

also @asuiu - please advise on #164 (comment) 🙏

@amaximciuc amaximciuc requested a review from asuiu January 20, 2026 19:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/python-package.yml:
- Around line 32-43: Update the GitHub Action usage from actions/setup-python@v4
to the v6 release (e.g., actions/setup-python@v6 or `@v6.1.0`) to modernize the
runner (note v6 requires Node.js 24 on the host); then replace the unconditional
poetry lock step with a non-mutating check such as using poetry check --lock or
run poetry install --no-update instead of poetry lock to avoid regenerating the
lockfile in CI (refer to the existing uses: actions/setup-python@v4 and the
commands poetry lock and poetry install --with dev).

Comment on lines 32 to 43
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
env:
POETRY_VIRTUALENVS_PREFER_UV: "true"
run: |
python -m pip install --upgrade pip
python -m pip install poetry
uv pip install --system poetry
cd python
poetry lock
poetry install --with dev
Copy link

@coderabbitai coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

actions/setup-python GitHub Action latest version 2025

💡 Result:

The latest release is actions/setup-python v6.1.0 — published Nov 25, 2025. [1][2]

Sources:

  • actions/setup-python repository. [1]
  • Releases page (v6.1.0). [2]

🌐 Web query:

actions/setup-python v4 deprecated v5 v6 release history

💡 Result:

Summary timeline for actions/setup-python (major points)

  • v4 — introduced python-version-file, PyPy fixes and other v4 features. (newreleases.io)
  • v5 — series of v5.x releases that updated dependencies and moved the action to newer Node runtimes (addressing Node 16 deprecation); some v5 users reported caching failures tied to an old actions/cache pin (see issue). (github.com)
  • v6 — v6.0.0 (breaking change) upgrades the action to Node.js 24 (runner requirement mentioned in the release notes); v6.1.0 is a later patch with enhancements and dependency updates. (github.com)

Sources:

  1. actions/setup-python releases (v6, v5, v4 changelogs). [GitHub releases]. (github.com)
  2. Issue reporting actions/cache deprecation failures when using setup-python@v5. (github.com)

If you want exact release dates and full changelogs for each minor release (v4.x, v5.x, v6.x), I can list them with direct links.

Citations:


Upgrade actions/setup-python to v6 — v4 works fine, but v6 is where the action is heading.

v4 isn't actually deprecated; it's just old. The latest is v6.1.0 (released November 2025), which requires Node.js 24 and includes meaningful improvements. If you're modernizing the workflow, v6 is the move. (v5 exists as a midpoint with some caching quirks, so skip it.)

One secondary thought: running poetry lock on every CI run will regenerate the lock file if there's any drift. If someone forgets to commit an updated lock file, CI silently fixes it—masking the mistake. Consider using poetry check --lock or poetry install --no-update instead to verify the lock file matches without regenerating it.

Proposed fix
      - name: Set up Python ${{ matrix.python-version }}
-       uses: actions/setup-python@v4
+       uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python-version }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
env:
POETRY_VIRTUALENVS_PREFER_UV: "true"
run: |
python -m pip install --upgrade pip
python -m pip install poetry
uv pip install --system poetry
cd python
poetry lock
poetry install --with dev
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
env:
POETRY_VIRTUALENVS_PREFER_UV: "true"
run: |
uv pip install --system poetry
cd python
poetry lock
poetry install --with dev
🧰 Tools
🪛 actionlint (1.7.10)

33-33: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
In @.github/workflows/python-package.yml around lines 32 - 43, Update the GitHub
Action usage from actions/setup-python@v4 to the v6 release (e.g.,
actions/setup-python@v6 or `@v6.1.0`) to modernize the runner (note v6 requires
Node.js 24 on the host); then replace the unconditional poetry lock step with a
non-mutating check such as using poetry check --lock or run poetry install
--no-update instead of poetry lock to avoid regenerating the lockfile in CI
(refer to the existing uses: actions/setup-python@v4 and the commands poetry
lock and poetry install --with dev).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an upgrade to v6 could be done in the future, in a dedicated PR.

about the poetry lock file -- will ask the real maintainer(@asuiu ) on how to approach it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaximciuc Fair enough. Ship when ready, iterate later. Makes sense to get @asuiu's input on the lock file strategy—better to have consistency across the repo than to optimize one workflow in isolation.

Copy link
Member

@asuiu asuiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR has a few design/structural concerns in the class implementation, but overall it looks acceptable. I’ll address the issues I found directly and proceed with merging.

super().close()


def _safe_manager_shutdown(manager: Any) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more appropriate to be a static method of the class, since it's used just in one class

self._owns_manager: bool = False

@classmethod
def create_shared(cls, manager: Optional[SyncManager] = None) -> "MemoryBucket":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a hack/workaround to some problem, but I can't see any problem.
I can't find any issue with having a proper SharedMemoryBucket class, which is a standalone class, deriving from the simple MemoryBucket, so I'll refactor this to a standalone class

self._lock = RLock()
self._objects: dict[str, bytes] | DictProxy[str, bytes] = {}
self._lock: Any = RLock()
self._manager: SyncManager | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These members are not used by the MemoryBucket methods, so they seem to be redundant here

@asuiu asuiu merged commit c96af4b into master Jan 21, 2026
22 checks passed
@asuiu asuiu deleted the picklable-shared-memorybucket branch January 21, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants