This repository implements a production-oriented Mesa agent-based model for autonomous minefield mapping under partial observability, battery limits, and strict safety semantics. The system now operates as a heterogeneous, dynamically reconfigurable swarm with checkpoint-based retreat, dead-end collapsing, airborne regrouping, and verified final-path extraction.
- Running the Example
- Testing
- Results
- Mission Overview
- World Model
- Shared Knowledge Model
- Agent Roles and Duties
- Drone State Machine
- Standard Protocols
- Emergency Protocols
- Leadership Policy
- Checkpoint and Dead-End Policy
- Path Verification and Final Route Policy
- Time and Battery Model
- File Layout
Create a virtual environment, install dependencies, and launch the server:
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python server.py
The server selects the first available localhost port starting at 8521.
Run the unit test suite with:
python -m unittest discover -s tests -v
The current suite covers:
- mine buffer propagation
- battery depletion
- movement radius enforcement
- final-path marking
- blocked-goal handling
- model shutdown on exhausted drones
- reserved corridor safety
- dead-end immutability
- anti-reversal movement policy
- leadership handoff
- free-fly promotion
Placeholder for result images, screenshots, and experiment snapshots.
Add your output figures here, for example:
- final deep-blue verified path
- leadership handoff sequence
- checkpoint creation and checkpoint retirement
- gray dead-end collapse behavior
- airborne regroup / rally snapshots
The mission objective is to map a continuous safe corridor from the bottom edge of a 100 x 100 minefield to the top edge while respecting:
- local sensing only
- one-cell movement per action
- strict mine and buffer exclusion for ground traversal
- battery limitations
- partial observability
The swarm explores, shares discovered terrain knowledge, adapts leadership when a better-positioned drone emerges, retreats from failed branches using checkpoints, permanently closes bad branches as dead ends, and finally runs verified pathfinding to produce the mission corridor.
- Grid size:
100 x 100 - Physical interpretation:
100m x 100m - Space type:
mesa.space.MultiGrid(100, 100, False) - Agents:
MineAgentDroneAgent- visualization-only markers for knowledge, checkpoints, and dead ends
- Motion:
- drones move exactly one cell per action
- Moore-neighborhood movement
- non-toroidal boundaries
Safety semantics:
MINE: lethal cellUNSAFE_BUFFER: one-cell safety halo around a known mineSAFE: explicitly scanned and clearDEAD_END: permanently retired branchFINAL_PATH: verified winning route after mission success
Ground movement policy:
- cannot walk on
MINE - cannot walk on
UNSAFE_BUFFER - cannot walk on
DEAD_END - may walk beside
UNSAFE_BUFFER
Airborne recovery policy:
- may fly over
MINE - may fly over
UNSAFE_BUFFER - may fly over
DEAD_END
The model maintains a global knowledge_base that acts as a mesh-network state store shared by all drones.
The swarm begins blind. The map is not pre-populated with mine locations.
SAFEMINEUNSAFE_BUFFERFINAL_PATHDEAD_END
Ground truth exists only in the actual Mesa grid through MineAgent placement.
Drones are not allowed to use hidden mine information for normal movement decisions. They discover terrain through local scans and write the result into knowledge_base.
Each drone scans twice per step:
- before movement
- after movement
Each scan inspects the inclusive Moore neighborhood of radius 1, producing a 3 x 3 local sensor footprint:
- current cell
- 8 surrounding cells
Per scanned cell:
- if a
MineAgentexists in that ground-truth cell, the cell is registered asMINE - otherwise the cell is registered as
SAFE
After mine discovery:
- the model propagates
UNSAFE_BUFFERto all adjacent cells around the mine
Additional scan effects:
- scanned cells are removed from
verification_queue - frontier neighbors of scanned safe cells are added to
verification_queue - leader-only scan results may produce checkpoints
- cells already marked
DEAD_ENDare skipped and never turned back into blue/safe state
The swarm contains four drones:
1leader3followers
The leader is responsible for:
- driving vertical exploration
- flanking around hazards
- creating safe checkpoints
- detecting soft deadlocks
- retreating from failed branches
- collapsing dead-end corridors
- acting as the main route-forging unit
Followers are responsible for:
- widening the scanned corridor around the leader
- covering flanks and rear
- maintaining formation when possible
- rallying to new leaders after handoff
- regrouping rapidly during retreat
- supporting verification density for final path extraction
Follower offsets relative to the leader:
- left wing:
(-2, -1) - right wing:
(2, -1) - rear guard:
(0, -2)
Each DroneAgent operates through explicit role-aware states.
Normal ground exploration state.
Leader behavior:
- prefers northward progress
- falls back to lateral flanking if blocked
- escalates to retreat when progress fails
Follower behavior:
- moves toward its leader-relative formation offset
- uses shared knowledge to avoid known hazards
Leader local wall-follow state.
Used when the northward lane is blocked but the branch is not yet abandoned.
Behavior:
- retry north
- try committed flank direction
- reverse flank if needed
- escalate to retreat if no productive move remains
Leader macro-recovery state.
Behavior:
- select nearest active checkpoint
- fly back toward it
- mark every abandoned branch cell as
DEAD_END - if checkpoint is exhausted, retire it and convert it to gray dead-end state
- if checkpoint reveals a fresh route, return to
SCANNING
Follower regroup state.
Behavior:
- move toward leader or leader retreat target
- use safe movement first when possible
- use local safe memory for escape from pockets
- use airborne override when regrouping requires bypassing hazards
Short swarm-wide independent search state.
Behavior:
- temporarily releases all drones from formation
- used during leader-stuck recovery windows
- best-progress drone may later become leader
For each active drone:
- scan local neighborhood
- update local safe memory
- select next move according to state and role
- decrement battery
- move one cell if a move was chosen
- scan again
- register mission time for that action
- update leader progress metrics if applicable
During ground navigation:
- only
SAFEandUNEXPLOREDcells are considered traversable MINE,UNSAFE_BUFFER, andDEAD_ENDare blocked- direct back-and-forth reversal is disallowed when an alternative move exists
Each drone tracks:
recent_pathlast_positionlocal_safe_memory
The current hard anti-bounce rule prevents an immediate reversal back to the previous cell during non-airborne movement if any other valid move exists.
Scanned safe cells push nearby unknown cells into verification_queue.
This queue exists so the swarm can later verify frontier boundaries and avoid building the final route through partially known terrain.
The leader tracks:
max_y_reachedfrustration_counter
If the leader does not improve its best y for a sustained period, it is treated as trapped in a horizontal or downward loop even if some legal moves still exist.
When the leader is judged stuck:
- the leader enters
BACKTRACKING - selects the nearest active checkpoint
- flies over hazards to that checkpoint
- paints abandoned cells as
DEAD_END - if the checkpoint fails, that checkpoint is retired and turned gray
- retreat continues toward the next checkpoint
When the leader retreats:
- followers enter
RALLYING - they stop trying to solve the failed branch as ground navigators
- they may use airborne override to reach the leader or retreat target
After scheduler execution, the model evaluates all active drones.
Promotion rule:
- highest
ywins - tie-breaker uses highest
x - no handoff occurs unless the candidate has strictly better
ythan the current leader
After handoff:
- old leader becomes follower
- new leader becomes
LEADER - leader pointer is updated globally
- follower offsets are reassigned
- all non-leader drones enter regrouping behavior
The model supports a short swarm-wide free-fly mode:
- drones are temporarily released from formation
- each tries to gain progress independently
- after the timer, the best-progress drone may be promoted
The architecture is dynamically heterogeneous.
At any given time:
- exactly one drone is
LEADER - all others are
FOLLOWER
Leader responsibilities are operational rather than fixed to a specific agent identity. Leadership is transferable when another drone demonstrates better forward progress.
This prevents the swarm from permanently depending on one trapped drone.
The leader creates a checkpoint when:
- its current cell
- and all 8 local Moore-neighbor cells
are mine-free in the local scan result.
Checkpoints are stored:
- in
self.model.checkpoints - in
self.model.checkpoint_positions - in the leader's
checkpoint_stack
Checkpoints are macro recovery anchors used when a branch fails.
If a leader returns to a checkpoint and still cannot find a fresh forward route:
- the checkpoint is retired
- its blue marker is removed
- it is converted into
DEAD_END - it becomes gray
- it can never become blue again
DEAD_END means:
- permanently closed for ground navigation
- never rescanned into
SAFE - never rewritten as
UNSAFE_BUFFER - still fly-over capable during airborne recovery
The final route is generated only after a drone reaches the top edge.
The model then runs A* on the discovered safe subgraph.
A candidate path cell must:
- be explicitly
SAFEorFINAL_PATH - not be
MINE - not be
UNSAFE_BUFFER - not be
DEAD_END - have all of its Moore neighbors explicitly known
Important rule:
- adjacent neighbors may be
UNSAFE_BUFFER - the path cell itself cannot be
UNSAFE_BUFFER
This reflects the intended semantics of the safety buffer: buffers represent forbidden cells, not forbidden adjacency.
If a fully verified route exists:
- the route is written into
knowledge_baseasFINAL_PATH - rendered deep blue in the UI
- the simulation halts
If no verified route exists:
- the model reports failure gracefully
- no fabricated route is produced through unknown terrain
Let:
V= number of explored/verified candidate cellsE= number of valid Moore-neighborhood edges among them
Then:
- A* time complexity:
O((V + E) log V) - A* space complexity:
O(V)
Drone local sensing and step-level directional decisions remain bounded by local neighborhoods and are effectively O(1) per step with respect to grid size.
Each drone starts with:
battery = 600
Battery policy:
- battery decreases by
1per drone action - when battery reaches
0, the drone becomes inactive
Mission time policy:
- normal scan/ground step:
1.0second per cell - airborne recovery / bypass / retreat step:
0.25second per cell
Elapsed mission time is accumulated explicitly and displayed in the UI.
agents.py: drone state machine, local scan logic, role policies, anti-oscillation logic, rallying, retreat, airborne overridemodel.py: grid construction, mine placement, shared knowledge maintenance, leadership handoff, checkpoint/dead-end management, final A* path generationserver.py: MesaModularServer,CanvasGrid, portrayal logic, live swarm status paneltests/test_model.py: behavioral tests for hazards, pathfinding, leader handoff, free-fly promotion, dead-end immutability, anti-reversal policy