feat(peer): provisional peer verification subsystem by adequatelimited · Pull Request #143 · mochimodev/mochimo

adequatelimited · 2026-04-06T04:31:19Z

Architectural Overview: Provisional Peer Verification

Background: What We Had Before

The Mochimo node maintains a Recent Peer List (Rplist, 64 entries) used for all network operations -- peer discovery, quorum formation, block propagation, and chain synchronization. Previously, when the node received a peer list from another node via OP_SEND_IPL, those IP addresses were added directly to Rplist via addrecent() -- no verification that the IPs were actually running Mochimo nodes.

This created two problems:

Stale peer propagation: Nodes that went offline months ago remain in peer lists indefinitely. Every node shares its Rplist with every peer that asks, so stale IPs propagate across the entire network. A significant portion of advertised peers on the current network are unreachable.
IP flooding attack surface: A malicious node could respond to OP_GET_IPL with fabricated IP addresses, filling the requesting node's Rplist with garbage. The node would then waste time trying to contact unreachable IPs during quorum formation and sync operations, and would propagate those garbage IPs to other nodes that ask for its peer list.

What Changed

Peer IPs received from network responses now go through a provisional verification pipeline before being added to Rplist. The pipeline has three stages:

Stage 1 -- Intake (addprovisional): IPs from OP_SEND_IPL responses are placed in a provisional list (4096 entries) instead of Rplist. Each entry records the candidate IP, the source IP that advertised it, and a status field. Before appending, the function deduplicates against existing provisional entries and Rplist, and checks the source's reputation.

Stage 2 -- Verification (background thread): A dedicated thread processes provisional entries in batches of 32. For each pending entry whose retry time has passed, it attempts a callserver() handshake. If the handshake succeeds, the entry is marked VERIFIED. If it fails, the fail counter increments and the next retry is scheduled with exponential backoff. After 5 failures, the entry is marked EXPIRED.

Stage 3 -- Harvest (harvest_provisional): Called periodically from the main server loop. Scans for VERIFIED entries, promotes them to Rplist via addrecent(), then compacts the list by removing all EXPIRED entries.

Race Condition Handling

The provisional list is protected by a RWLock (from the extended-c threading library):

Parent thread (main loop): Takes write lock for addprovisional() (append) and harvest_provisional() (promote + compact). These are fast operations -- no blocking I/O under the lock.
Verification thread: Takes read lock to scan for candidates, releases it, performs the blocking callserver() attempt (3-second timeout, no lock held), then takes write lock briefly to update the entry's status/fail_count. The lock is never held during network I/O.
OpenMP threads in scan_quorum(): Call addprovisional() which takes write lock. The RWLock handles concurrent writers correctly.

The verification thread checks Running and Provrunning flags between every operation and every sleep second, ensuring clean shutdown without deadlock.

Blocking Situation Analysis

addprovisional(): Only holds write lock during in-memory array operations. No I/O, no network. Worst case is scanning 4096 entries for dedup + reputation -- microseconds.
harvest_provisional(): Same -- in-memory scan and compact under write lock. No I/O.
Verification thread: The callserver() call blocks for up to 3 seconds (INIT_TIMEOUT) per peer. With batches of 32, worst case is ~96 seconds per pass. This runs in a dedicated background thread -- never in the main server loop. Between batches, the thread sleeps for 30 seconds (checking Running every second).
Main server loop: Zero new blocking. harvest_provisional() is a fast in-memory operation.

Source Reputation Management

When addprovisional() evaluates whether to accept an IP from a given source, it tallies that source's track record from existing provisional entries:

Counts all entries from this source_ip that are EXPIRED (failed verification) and whose last attempt was within the last hour (3600 seconds)
Counts all PENDING entries from this source toward the total
If the source has >= 10 entries total and >= 80% are recent failures, the new IP is silently dropped

Time-windowed decay: The reputation check only considers failures from the last hour. This is critical because:

On a fresh node joining the network, many legitimate peers share stale IP lists accumulated over years. Without decay, every source would quickly hit the threshold and the node would stop accepting peer lists from anyone.
With the 1-hour window, old failures age out. A source that shared bad IPs an hour ago gets a fresh chance.
A truly malicious source that continuously floods garbage IPs will keep hitting the threshold every hour -- but gets at most 10 entries per hour into the provisional list before being throttled.

Tunable Parameters

All defined in types.h alongside existing peer configuration:

Parameter	Value	Purpose
PROVPEERSLEN	4096	Maximum provisional list entries
PROVBATCHSIZE	32	Peers verified per thread pass
PROVMAXFAILS	5	Failures before entry expires
PROVBACKOFF	300	Base backoff seconds (multiplied by fail count)
PROVREPUTHR	10	Minimum entries before evaluating source reputation
PROVREPUFAIL	80	Failure percentage threshold to reject a source
PROVREPUTIME	3600	Reputation window in seconds (1 hour)

Behavior Under Normal Conditions

Node starts, completes initial sync via resync()
Verification thread starts after init
During scan_quorum(), peer IPs from responses go to both netplist (immediate scanning) and addprovisional() (long-term verification)
During steady-state refresh_ipl(), peer IPs go only to addprovisional()
Verification thread confirms reachable peers
harvest_provisional() promotes verified peers to Rplist
Rplist gradually fills with confirmed-reachable peers

Behavior Under IP Flooding Attack

A malicious node responds to OP_GET_IPL with 64 fabricated IPs:

All 64 IPs enter the provisional list
Verification thread attempts handshakes -- all fail
After 5 failures each (~75 minutes of backoff), entries are marked EXPIRED
Next harvest compacts them out
Source reputation degrades: 64 expired entries, 100% failure rate
Next time this source sends peer IPs, addprovisional() silently drops them all
After 1 hour, old failures age out, source gets another chance
If source sends garbage again, the cycle repeats -- at most 64 entries per hour of overhead

Impact on node operation: Zero. Rplist is never polluted.

Behavior With Stale Network Peer Lists

Fresh node joins, receives peer lists with many stale IPs
Stale IPs go to provisional, most fail verification
Source reputation accumulates failures, temporarily throttles sources with high failure rates
The 1-hour decay window means sources are not permanently blacklisted
Over time, Rplist fills with only confirmed-reachable peers
Stale IPs never enter Rplist -- they fail verification and expire

Files Changed

File	Change
src/types.h	PROVPEER struct (20 bytes, 4-byte aligned), 7 config defines, 3 status constants
src/peer.h	5 function prototypes
src/peer.c	Full implementation (~250 lines): intake, harvest, reputation, verification thread, lifecycle
src/network.c	scan_quorum() and refresh_ipl() route received peer IPs through addprovisional()
src/bin/mochimo.c	Thread start after init, harvest on refresh timer, thread stop on shutdown
src/test/peer-provisional.c	Unit test conforming to _assert.h / make test / make coverage conventions

Testing

Unit test (make test-peer-provisional): Tests basic add, deduplication against provisional list and Rplist, capacity limit (4096 entries), source reputation with good sources, purge, multiple sources with cross-source dedup, harvest compaction, rapid add/harvest cycles (100 iterations), thread start/stop lifecycle, and concurrent add + harvest from separate threads. All assertions use the standard _assert.h framework. Passes via make test and is included in make coverage.

Build verification: Clean compile with -Wall -Werror -Wextra -Wpedantic on GCC 13 (Ubuntu x64). All existing tests unaffected.

What This Does NOT Change

Peers that complete a real protocol interaction with our node (incoming OP_FOUND, OP_GET_BLOCK, OP_TX, etc.) are still added directly to Rplist via addrecent() -- they have already proven they are real nodes by talking to us
The scan_quorum() working list (netplist) still receives IPs immediately for the current scan -- provisional verification is for long-term Rplist inclusion, not for blocking initial peer discovery
No changes to quorum formation, sync, or consensus paths
No changes to pink list handling
Provisional data is in-memory only -- lost on restart, no disk persistence needed

Introduces a provisional peer list that holds unverified IP addresses received from network peers. A background thread verifies candidates by attempting handshakes, and only verified peers are promoted to the active recent peer list (Rplist). Includes source reputation tracking with time-windowed decay to mitigate IP flooding attacks while tolerating the stale peer lists common on the existing network. New in types.h: PROVPEER struct, configuration defines, status values New in peer.h: function prototypes for provisional peer management New in peer.c: addprovisional(), harvest_provisional(), source reputation logic, background verification thread, purge Modified network.c: scan_quorum() and refresh_ipl() now route received peer IPs through addprovisional() instead of addrecent() Modified mochimo.c: thread lifecycle (start/harvest/stop) integrated into server init, main loop, and shutdown New test: src/test/peer-provisional.c (make test-peer-provisional)

adequatelimited · 2026-04-06T04:33:19Z

Here's the placeholder PR for this new feature. Will revisit it after the remaining audit-fixes are complete. @chrisdigity Would love your input on this.

adequatelimited · 2026-04-06T04:43:41Z

Note: Clearing EXPIRED status items from Provisional may contradict the reputation management threshold calculation. If they are cleared immediately, they won't be available for us to use to calculate that a node is a bad actor. Some re-work is needed there to determine when someone has a "bad' reputation, but the bulk of the feature is here.

adequatelimited force-pushed the master branch from ba86e0b to 45ec896 Compare April 13, 2026 03:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(peer): provisional peer verification subsystem#143

feat(peer): provisional peer verification subsystem#143
adequatelimited wants to merge 1 commit intomasterfrom
feature/provisional-peers

adequatelimited commented Apr 6, 2026

Uh oh!

adequatelimited commented Apr 6, 2026

Uh oh!

adequatelimited commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adequatelimited commented Apr 6, 2026

Architectural Overview: Provisional Peer Verification

Background: What We Had Before

What Changed

Race Condition Handling

Blocking Situation Analysis

Source Reputation Management

Tunable Parameters

Behavior Under Normal Conditions

Behavior Under IP Flooding Attack

Behavior With Stale Network Peer Lists

Files Changed

Testing

What This Does NOT Change

Uh oh!

adequatelimited commented Apr 6, 2026

Uh oh!

adequatelimited commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant