Feature request: cloudpickle.patch_multiprocessing() utility for ForkingPickler replacement

## Summary

Propose adding a `cloudpickle.patch_multiprocessing()` helper that replaces `multiprocessing.reduction.ForkingPickler` with a cloudpickle-based pickler, enabling `Pool.map(lambda x: x**2, range(10))` to work out of the box.

## Motivation: ecosystem fragmentation

Every project that needs cloudpickle + `multiprocessing.Pool` independently reinvents this patching. At least 6 projects maintain their own version:

| Project | Approach |
|---------|----------|
| **loky/joblib** | Full custom `_LokyPickler` subsystem in `loky/backend/reduction.py` |
| **PySpark** | Own `CloudPickleSerializer` wrapping `cloudpickle.dumps/loads` |
| **Ray** | Bundled fork as `ray.cloudpickle` with custom object store |
| **Dask** | Custom serialization protocol in distributed scheduler |
| **multiprocess** | Complete fork of CPython's multiprocessing with dill substituted |
| **trading-strategy/exec-sandbox/pypeln/pyrocko** | Ad-hoc monkey patches of varying correctness |

Most ad-hoc implementations are **incomplete** because of a non-obvious CPython pitfall (see below).

## The `_ForkingPickler` double-binding pitfall

CPython has **two separate name bindings** for `ForkingPickler`:

```python
# multiprocessing/reduction.py
class ForkingPickler(pickle.Pickler):
    ...
```

```python
# multiprocessing/connection.py
from .context import reduction
_ForkingPickler = reduction.ForkingPickler   # captured at import time

class Connection:
    def send(self, obj):
        self._send_bytes(_ForkingPickler.dumps(obj))  # uses the captured reference
```

Patching `reduction.ForkingPickler` alone is **insufficient** — `Connection.send()` still uses the stale `_ForkingPickler` reference captured at import time. You must also patch `multiprocessing.connection._ForkingPickler`. Most ad-hoc implementations miss this.

Additionally, `reduction.dump()` is a module-level function that also needs replacing for completeness.

## Proposed API

```python
import cloudpickle

cloudpickle.patch_multiprocessing()
```

One call, idempotent, patches all three binding sites:
1. `multiprocessing.reduction.ForkingPickler` — the class
2. `multiprocessing.reduction.dump` — the module-level helper
3. `multiprocessing.connection._ForkingPickler` — the import-time captured reference

## Reference implementation

Here's a minimal working implementation (tested on Python 3.14):

```python
import copyreg
import io
import multiprocessing.connection
import multiprocessing.reduction

import cloudpickle


class CloudForkingPickler(cloudpickle.Pickler):
    """ForkingPickler replacement backed by cloudpickle."""
    _extra_reducers = {}
    _copyreg_dispatch_table = copyreg.dispatch_table

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.dispatch_table = self._copyreg_dispatch_table.copy()
        self.dispatch_table.update(self._extra_reducers)

    @classmethod
    def register(cls, type, reduce):
        cls._extra_reducers[type] = reduce

    @classmethod
    def dumps(cls, obj, protocol=None):
        buf = io.BytesIO()
        cls(buf, protocol).dump(obj)
        return buf.getbuffer()

    loads = staticmethod(cloudpickle.loads)


def patch_multiprocessing():
    """Replace multiprocessing's ForkingPickler with cloudpickle-based version."""
    # 1. The class itself
    multiprocessing.reduction.ForkingPickler = CloudForkingPickler
    # 2. The module-level dump() helper
    multiprocessing.reduction.dump = lambda obj, file, protocol=None: \
        CloudForkingPickler(file, protocol).dump(obj)
    # 3. The import-time captured reference in connection.py
    multiprocessing.connection._ForkingPickler = CloudForkingPickler
```

After `patch_multiprocessing()`:
```python
from multiprocessing import Pool
with Pool(4) as p:
    print(p.map(lambda x: x**2, range(10)))
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
```

## Why cloudpickle (not CPython)

There's an [open discussion on discuss.python.org](https://discuss.python.org/t/add-api-to-change-pickler-for-multiprocessing/30299) about adding a pluggable pickler API to multiprocessing, but no PEP has materialized. cloudpickle is the pragmatic place for this — it already provides `Pickler`/`dumps`/`loads`, and adding a one-shot integration helper is a small, natural extension.

## Alternatives considered

- **"Just use loky/joblib"** — Valid for many users, but loky replaces the entire process management layer. Many projects only need cloudpickle serialization with stdlib `multiprocessing.Pool`.
- **"Just use multiprocess (dill)"** — Requires replacing all `multiprocessing` imports. dill is heavier than cloudpickle and has different serialization semantics.
- **"Document the pattern instead"** — The `_ForkingPickler` double-binding makes documentation insufficient; people will keep getting it wrong.

Happy to submit a PR if there's interest.

Project	Approach
loky/joblib	Full custom `_LokyPickler` subsystem in `loky/backend/reduction.py`
PySpark	Own `CloudPickleSerializer` wrapping `cloudpickle.dumps/loads`
Ray	Bundled fork as `ray.cloudpickle` with custom object store
Dask	Custom serialization protocol in distributed scheduler
multiprocess	Complete fork of CPython's multiprocessing with dill substituted
trading-strategy/exec-sandbox/pypeln/pyrocko	Ad-hoc monkey patches of varying correctness

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: cloudpickle.patch_multiprocessing() utility for ForkingPickler replacement #589

Summary

Motivation: ecosystem fragmentation

The `_ForkingPickler` double-binding pitfall

Proposed API

Reference implementation

Why cloudpickle (not CPython)

Alternatives considered

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature request: cloudpickle.patch_multiprocessing() utility for ForkingPickler replacement #589

Description

Summary

Motivation: ecosystem fragmentation

The _ForkingPickler double-binding pitfall

Proposed API

Reference implementation

Why cloudpickle (not CPython)

Alternatives considered

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

The `_ForkingPickler` double-binding pitfall