𧬠Human-in-the-Loop Evolution of Graph Coloring Heuristics using Large Language Models
From DSatur baseline to SAT-CR: incorporating second-order neighborhood information through iterative prompt engineering
This project explores how Large Language Models (LLMs) can assist in designing and refining heuristic algorithms for the classic Graph Coloring Problemβan NP-complete combinatorial optimization challenge.
Building upon the seminal DSatur algorithm (BrΓ©laz, 1979), we employ a human-in-the-loop "hypothesize-verify-correct" framework to iteratively evolve heuristics through prompt engineering. The culmination is SAT-CR (Saturation-Aware Crisis-Responsive), a novel heuristic that fuses lookahead strategies with crisis awareness mechanisms.
- 5-Generation Evolution: From failed feature stacking to optimized hybrid strategy
- Second-Order Awareness: Incorporating neighbors-of-neighbors information into greedy decisions
- Comprehensive Benchmark: 2,650 graphs across 6 categories (planar, dense/sparse random, scale-free, Mycielskian, bipartite)
- Interpretable Design: All heuristics derived from explicit LLM prompts, fully documented
SAT-CR achieves 5.4% improvement over DSatur on planar graphs (4.672 β 4.418, p=0.003) by adaptively combining:
- V3's squared amplification (
SatΒ²) for dense graph signal enhancement- V4's crisis ratio (
1/(Resid+1)) for planar graph dead-end detection
LLM-Heuristic-Graph-Coloring/
β
βββ README.md # This file
βββ LICENSE # MIT License
βββ requirements.txt # Python dependencies
β
βββ Graph_Coloring_Heuristics_Experiment.ipynb # β All algorithms & experiments
βββ Evolving_DSatur_LLM_Graph_Coloring.pdf # GraphColoring_Paper
βββ Graph_Coloring_LLM_Presentation.pdf # Presentation slides
β
βββ docs/
β βββ Evolving_DSatur_LLM_Graph_Coloring.pdf
β βββ Graph_Coloring_LLM_Presentation.pdf
β
βββ prompts/ # LLM prompts for each generation
β βββ gen1_feature_stacking.md # Gen 1: Feature stacking (failed)
β βββ gen2_hierarchical.md # Gen 2: Tiered logic (stable)
β βββ gen3_lookahead.md # Gen 3: Lookahead strategy
β βββ gen4_crisis_awareness.md # Gen 4: Crisis awareness
β βββ gen5_satcr_fusion.md # Gen 5: SAT-CR fusion (best)
β
βββ assets/ # Figures & visualizations
βββ results/
β βββ performance_planar_sparse_dense.png
β βββ performance_scalefree_myciel_bipartite.png
β βββ execution_time_comparison.png
βββ visualizations/
βββ satcr_planar_graph_60nodes.pngWe document the complete iterative design process, from initial failure to final optimization:
| Generation | Strategy | Core Formula | Key Insight | Outcome |
|---|---|---|---|---|
| Gen 1 | Feature Stacking | 3Β·Sat + 1.2Β·U + 0.7Β·(1-CC)Β·Deg + 0.9Β·ln(S2) |
Multi-feature linear combination | Failed |
| Gen 2 | Hierarchical Logic | Sat Γ 10βΈ + TieBreaker |
Saturation absolute dominance | Stable |
| Gen 3 | Lookahead | Sat Γ 10βΈ + Ξ£ Sat(v)Β² |
Future cost via squared neighbor saturation | Improved |
| Gen 4 | Crisis Awareness | Sat Γ 10βΉ + Ξ£ Sat(v)/(Resid(v)+1) |
Risk ratio for dead-end detection | Specialized |
| Gen 5 | SAT-CR (Hybrid) | Sat Γ 10βΉ + Ξ£ Sat(v)Β²/(Resid(v)+1) |
Adaptive fusion of V3+V4 | Best |
π Full prompts available in
prompts/directory
Benchmark on 2,650 graphs (60 vertices each, 500 instances per main category):
| Graph Type | DSatur (1979) | SAT-CR (Ours) | Improvement | Statistical Significance |
|---|---|---|---|---|
| Planar | 4.672 | 4.418 | -5.4% | t=4.32, p=0.003 |
| Dense Random (p=0.5) | 12.526 | 12.490 | -0.3% | β |
| Sparse Random (p=0.1) | 4.190 | 4.222 | +0.8% | β |
| Scale-Free (BarabΓ‘si) | 4.033 | 4.033 | Optimal | β |
| Mycielskian (Trap) | 5.000 | 5.000 | Optimal | β |
| Bipartite (Sanity) | 2.000 | 2.000 | Optimal | β |
Lower is better. SAT-CR excels on structured graphs (planar) and high-conflict environments (dense), with robust correctness on special graph classes.
| Algorithm | Relative Time | Complexity |
|---|---|---|
| DSatur | 1.0Γ | O(nΒ²) |
| Gen 2 (Tiered) | 5.1Γ | O(nΒ²) |
| Gen 5 (SAT-CR) | 16.7Γ | O(nΒ·ΞΒ²) |
| Gen 3 (Lookahead) | 23.1Γ | O(nΒ·ΞΒ²) |
| Gen 1 (Original) | 102.5Γ | O(nΒ³) |
SAT-CR maintains reasonable efficiency while achieving superior solution quality. The overhead comes from single-pass second-order neighborhood traversal.
Unlike traditional "black-box" neural approaches, we employ transparent, human-in-the-loop iteration:
Analyze Failures β Craft LLM Prompt β Generate Heuristic β
Implement & Validate β Evaluate on 6 Graph Types β Iterate
Each generation's complete prompt is preserved in prompts/, enabling reproducibility and pedagogical value.
Mathematical Structure:
Score(u) = Sat(u) Γ 10βΉ + Ξ£_{v β N(u)} [ Sat(v)Β² / (Resid(v) + 1) ]
ββ Primary βββ βββββββββββ Tie-Breaker βββββββββββββββ
ββ V3: Squared amplification (dense graphs)
ββ V4: Inverse residual penalty (planar graphs)
Adaptive Behavior:
- Dense graphs:
SatΒ²dominates numerator, counteracting denominator dilution (recovers V3 advantage) - Planar graphs: Small
Residmakes denominator significant, preserving V4's dead-end sensitivity
| Aspect | Coverage |
|---|---|
| Graph Types | 6 categories covering real-world (scale-free), synthetic (random), theoretical (Mycielskian), and geometric (planar) structures |
| Scale | 2,650 total instances, 500 per main category |
| Baselines | Random, Welsh-Powell, DSatur, plus 5 evolved generations |
| Metrics | Chromatic number, execution time, statistical significance (t-test) |
- Python 3.8+
- Jupyter Notebook
- NetworkX, Matplotlib, NumPy, Pandas, SciPy, tqdm
Install dependencies:
pip install -r requirements.txtLaunch the complete notebook:
jupyter notebook Graph_Coloring_Heuristics_Experiment.ipynbThe notebook includes:
- Algorithm implementations β All 5 generations + 3 baselines
- Dataset generation β 2,650 graphs via Delaunay triangulation & random models
- Benchmark execution β Automated evaluation with progress tracking
- Visualization β Box plots, runtime analysis, and coloring demonstrations
All experiments are contained in a single notebook for easy reproduction:
| Section | Content |
|---|---|
| Cell 1-2 | Imports & core algorithm framework |
| Cell 3 | Baseline heuristics (Random, Welsh-Powell, DSatur) |
| Cell 4 | All 5 generations (Gen 1-5) with detailed docstrings |
| Cell 5 | Dataset generation (planar, random, scale-free, Mycielskian, bipartite) |
| Cell 6 | Main benchmark β runs all algorithms on all graphs |
| Cell 7 | Analysis & visualization β tables, box plots, timing, case studies |
Planar Graphs (Structured, Low Degree)
- Euler's formula constrains average degree β frequent local dead-ends
- V4's
1/(Resid+1)spikes whenResid β 0, prioritizing critical nodes - SAT-CR preserves this sensitivity
Dense Random Graphs (Homogeneous, High Conflict)
- Uniform degree distribution β many saturation ties
- V3's
SatΒ²provides nonlinear signal amplification for tie-breaking - SAT-CR maintains this discrimination
Scale-Free Networks (Heavy-Tailed)
- Hub nodes create high-saturation clusters
- Squared term identifies "super-critical" neighbors
- SAT-CR achieves optimal 4-coloring
| Limitation | Details | Future Direction |
|---|---|---|
| Graph Size | Experiments on 60-vertex graphs | Scale to 10Β³β10βΆ vertices via approximation |
| Time Complexity | SAT-CR is ~17Γ slower than DSatur | Parallel neighbor evaluation, GPU acceleration |
| Automation | Human analysis required between iterations | Fully automated prompt optimization (e.g., OPRO) |
| Theoretical Guarantees | No approximation ratio proofs | Analyze competitive ratio on specific graph classes |
| Generalization | Focused on graph coloring | Extend to other NP-hard problems (TSP, Max-Cut) |
If you find this work useful, please cite:
@article{yuan2025evolving,
title={Evolving DSatur: LLM-Guided Design of Second-Order Heuristics for Graph Coloring},
author={Yuan, Zhouyan},
year={2025},
note={Undergraduate research project}
}This project is licensed under the MIT License β see the LICENSE file for details.
For full technical details, see the paper and implementation notebook.