Skip to content

feat(provide): +unique and +entities strategy modifiers#11245

Draft
lidel wants to merge 22 commits intomasterfrom
feat/provide-entity-roots-with-dedup
Draft

feat(provide): +unique and +entities strategy modifiers#11245
lidel wants to merge 22 commits intomasterfrom
feat/provide-entity-roots-with-dedup

Conversation

@lidel
Copy link
Copy Markdown
Member

@lidel lidel commented Mar 20, 2026

Warning

not ready for review, this is a sandbox for running CI

Summary

  • Experimental Provide.Strategy modifiers (+unique and +entities) for nodes with large, overlapping pin sets (e.g. https://collab.ipfscluster.io hosting https://github.com/ipfs/distributions)
  • Fast-provide extended to pin add/pin update, new --fast-provide-dag flag
  • Hardened strategy parsing, ipfs add --only-hash bug fix
  • Provider strategy test suite covering both legacy and sweep providers

Changes

+unique and +entities strategy modifiers

New opt-in modifiers for Provide.Strategy:

  • +unique: bloom filter dedup across recursive pins. Shared subtrees traversed once per reprovide cycle instead of once per pin. ~4 bytes/CID memory. Logs providedCIDs and skippedBranches after each cycle.
  • +entities: announces only entity roots (files, directories, HAMT shards), skipping internal file chunks. Implies +unique.

Example: Provide.Strategy = "pinned+mfs+entities"

Default Provide.Strategy=all is unchanged. See docs/config.md#providestrategy for details.

Fast-provide on pin add and pin update

Both commands now accept --fast-provide-root, --fast-provide-dag, and --fast-provide-wait, matching ipfs add and ipfs dag import. Root CID is announced immediately after pinning. See docs/config.md#import for defaults.

--fast-provide-dag flag

New flag on ipfs add, ipfs dag import, ipfs pin add, ipfs pin update. Walks and provides the full DAG immediately using the active strategy. No effect with Provide.Strategy=all (blockstore already provides every block on write). Configurable via Import.FastProvideDAG (default: false).

Hardened strategy parsing

Unknown tokens, empty tokens, and invalid combinations now produce clear errors at startup instead of being silently ignored.

ipfs routing reprovide deprecated

Marked as deprecated. Returns an error with the sweep provider (default). Use ipfs provide stat -a to monitor reprovide progress.

Bug fix: ipfs add --only-hash

--only-hash no longer triggers fast-provide or pinning.

Provider strategy test suite

Full test coverage for both legacy and sweep providers across all strategies (all, pinned, roots, mfs, pinned+mfs, pinned+mfs+unique, pinned+mfs+entities):

  • Provide-at-add-time and reprovide (two cycles) for each strategy
  • +unique dedup tests assert exact providedCIDs and skippedBranches counts
  • +entities tests use nested DAGs with chunked files to verify chunks are skipped
  • roots tests verify child blocks of a pin are excluded; mfs tests verify pinned content outside MFS is excluded
  • BootstrapWithStubDHT(nodes) creates ephemeral DHT peers on loopback for the sweep provider (needs >=20 peers to estimate network size)

Compatibility

  • Default behavior unchanged (Provide.Strategy=all)
  • +unique and +entities are opt-in
  • --fast-provide-dag defaults to false
  • Strategy parsing is stricter: previously-ignored typos will now error at startup

Depends on

  • boxo#1124: dag/walker (BloomTracker, WalkEntityRoots, WalkDAG), pinning/dspinner (NewUniquePinnedProvider, NewPinnedEntityRootsProvider)

Context

- config: ParseProvideStrategy returns error, rejects "all" mixed with
  selective strategies, removes dead strategy==0 check
- config: add MustParseProvideStrategy for pre-validated call sites
- config: ValidateProvideConfig validates strategy at startup
- config: ShouldProvideForStrategy uses bitmask check for ProvideStrategyAll
- core/node: downstream callers use MustParseProvideStrategy
- core/node: fix Pinning() nil return that caused fx.Provide panic
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 420b111 to 4468527 Compare March 24, 2026 00:34
lidel added 8 commits March 24, 2026 01:47
- ProvideStrategyUnique: bloom filter cross-DAG deduplication
- ProvideStrategyEntities: entity-aware traversal (implies Unique)
- parser: "unique" and "entities" tokens recognized
- validation: modifiers must combine with pinned/mfs, incompatible
  with all/roots
- go.mod: update boxo to feat/provide-entity-roots-with-dedup
  (VisitedTracker, WalkDAG, WalkEntityRoots, NewConcatProvider,
  NewUniquePinnedProvider, NewPinnedEntityRootsProvider)
pure rename, no behavior change. prepares for ExecuteFastProvideDAG
which will walk the DAG according to Provide.Strategy.
adds ExecuteFastProvideRoot calls to pin add and pin update,
matching the behavior of ipfs add and ipfs dag import. respects
Import.FastProvideRoot and Import.FastProvideWait config options.

previously, pin add/update did not trigger any immediate providing,
leaving pinned content invisible to the DHT until the next reprovide
cycle (up to 22h).
when Provide.Strategy includes +unique, the reprovide cycle uses a
shared BloomTracker across all sub-walks (MFS, recursive pins, direct
pins). duplicate sub-DAG branches across recursive pins are detected
and skipped, reducing traversal from O(pins * total_blocks) to
O(unique_blocks).

- readLastUniqueCount / persistUniqueCount: persist bloom sizing count
  between cycles at /reprovideLastUniqueCount
- uniqueMFSProvider: MFS walker with shared tracker + locality check
- createKeyProvider restructured: +unique bit checked first, non-unique
  strategies fall through to existing switch unchanged
- per-cycle fresh BloomTracker sized from previous cycle's count
- channel wrapper persists count on successful cycle completion
when Provide.Strategy includes +entities (which implies +unique), the
reprovide cycle uses WalkEntityRoots instead of WalkDAG, emitting only
entity roots (files, directories, HAMT shards) and skipping internal
file chunks.

- mfsEntityRootsProvider: MFS walk with entity root detection
- createKeyProvider: select walker based on +entities flag via function
  references (makePinProv / makeMFSProv) to avoid duplicating the
  stream wiring logic
- all combinations: pinned+entities, mfs+entities, pinned+mfs+entities
- config.md: document +unique, +entities modifiers with caveats
  (range request limitation, roots vs entities distinction)
- changelog v0.41: add entries for strategy modifiers, pin add/update
  fast-provide, and hardened strategy parsing
per-block providing during ipfs add is now opt-in via
--fast-provide-dag (or Import.FastProvideDAG config, default: false).

without it, only the root CID is fast-provided after add, and the
reprovide cycle handles the rest. this changes the default for
Provide.Strategy=pinned: previously every block was provided during
write, now only the root is immediate.

use --fast-provide-dag=true to restore the previous behavior.
Provide.Strategy=all is unaffected (blockstore hook provides on Put).
pin add and pin update now accept the same --fast-provide-root and
--fast-provide-wait CLI flags as ipfs add and ipfs dag import,
with the same config fallbacks (Import.FastProvideRoot,
Import.FastProvideWait).

previously these were config-only with no CLI override.
@lidel lidel changed the title fix(config): harden provide strategy parsing feat(provide): +unique and +entities strategy modifiers Mar 24, 2026
--fast-provide-dag now available on ipfs add, ipfs dag import,
ipfs pin add, and ipfs pin update (matching --fast-provide-root).

- ExecuteFastProvideDAG accepts []cid.Cid so multiple roots share
  one bloom tracker (cross-root dedup for dag import and pin add)
- --fast-provide-dag supersedes --fast-provide-root (DAG walk
  includes the root CID as the first emitted via DFS pre-order)
- wait parameter: when true blocks until walk completes, when false
  runs in background goroutine
- Import.FastProvideDAG config option (default: false)
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 05f8870 to 07d7c66 Compare March 24, 2026 03:33
lidel added 4 commits March 25, 2026 23:38
- strategy section: clearer trade-offs, suggested configurations,
  memory comparison with concrete numbers
- Import.FastProvideDAG: new config option documentation
- Import.FastProvideRoot/Wait: updated to mention pin commands
- all three Import.FastProvide* options: consistent "Applies to" lists
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 800a1ef to a858eb1 Compare March 26, 2026 23:31
when TEST_DHT_STUB=1, the CLI test harness creates 20 in-process
libp2p hosts on loopback, each running a DHT server with a shared
in-memory ProviderStore. kubo daemons bootstrap to them over real
TCP, exercising the full DHT code path without public internet.

tests opt in via h.SetStubBootstrap(nodes) after Init().

on the daemon side, WAN DHT filters (AddressFilter, QueryFilter,
RoutingTableFilter, RoutingTablePeerDiversityFilter) are lifted
to accept loopback peers when TEST_DHT_STUB is set.

depends on: github.com/libp2p/go-libp2p-kad-dht#1241
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from a858eb1 to 4a47439 Compare March 27, 2026 00:06
lidel added 2 commits March 27, 2026 22:41
add sweep reprovide tests for all strategies (all, pinned, roots,
mfs, pinned+mfs). each test waits for two reprovide cycles to
confirm the schedule runs repeatedly. sweep uses short
Provide.DHT.Interval and polls provide stat --enc=json.

harden negative assertions:
- roots: test excludes child blocks of a recursive pin (not just
  unpinned content), using --only-hash to learn the child CID
- mfs: test that pinned content outside MFS is not provided

fix: ipfs add --only-hash no longer triggers fast-provide or
pinning (was providing CIDs for data that was never stored)

rename SetStubBootstrap to BootstrapWithStubDHT with lazy-init
(ephemeral peers created on first call, not on harness creation)
…-roots-with-dedup

# Conflicts:
#	docs/changelogs/v0.41.md
strategy tests for pinned+mfs+unique and pinned+mfs+entities,
covering both provide-at-add-time and reprovide (two cycles).
content uses a nested DAG (root/subdir/largefile with 1 MiB
chunks) to exercise the walker on multi-level structures.

BootstrapWithStubDHT is now self-contained: it always creates
20 ephemeral DHT peers on loopback and sets TEST_DHT_STUB=1 on
each node's environment so the daemon lifts WAN DHT filters.
no external env var needed. the sweep provider requires >=20
DHT peers to estimate network size (prefix length); without
enough peers it stays offline and never provides.

TEST_DHT_STUB on the daemon side lifts WAN DHT filters
(AddressFilter, QueryFilter, RoutingTableFilter,
RoutingTablePeerDiversityFilter) to accept loopback peers.
this is set automatically by BootstrapWithStubDHT.

other changes:
- Provide.DHT.Interval=30s in sweep reprovide tests (was 1m)
- uniq() helper for unique CIDs across parallel subtests
- ipfs add --only-hash disables fast-provide and pinning
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 8ae795c to 0243a1c Compare March 29, 2026 15:04
lidel added 2 commits March 29, 2026 18:03
ipfs add --help: rewrite fast-provide section with clear structure
(content discoverability, flag defaults, strategy=all behavior)

ipfs routing reprovide: mark as deprecated, note it returns an error
with sweep provider, log error with actionable guidance

changelog: fix missing --fast-provide-dag flag on pin commands,
use "routing system" instead of "DHT" where applicable, link to
docs/config.md as source of truth for defaults

environment-variables.md: note that BootstrapWithStubDHT sets
TEST_DHT_STUB automatically, no external env var needed
lidel added 2 commits March 29, 2026 21:43
the fork (NoopMessageSender, MsgSenderBuilder) is no longer used.
the ephemeral peer pool in BootstrapWithStubDHT replaced the
NoopMessageSender approach.
log providedCIDs and skippedBranches after each unique reprovide
cycle and fast-provide-dag walk.

tests verify exact counts with two dir pins sharing a 10 KiB file
(5 KiB chunks): fast-provide-dag asserts 5 provided + 1 skipped
branch, reprovide asserts 6 provided + 1 skipped branch (includes
empty MFS root pin). both assert bloom tracker created and no
autoscale.

updates boxo to pick up Deduplicated() counter, bloom
creation/autoscale logging, and review feedback fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improved Reprovider.Strategy for entity DAGs (HAMT/UnixFS dirs, big files)

1 participant