Optimizing Traffic Signal Control Using LLM-Driven Reward Weight Adjustment in Reinforcement Learning


Fig.1.D3QN-LLM Framework	Fig.2.Flow of the Proposed Methodology

📝 Research Overview

This study utilizes a Large Language Model (LLM) to overcome the challenge of setting weights in multi-objective reward functions in reinforcement learning. By dynamically adjusting the reward function weights in real-time, the LLM enables the reinforcement learning agent to learn an optimal policy that maximizes rewards.
The experimental environment consists of a single intersection, with the goal of minimizing Average Travel Time through traffic signal control.
Dueling Double DQN (D3QN) is used as the reinforcement learning algorithm, while GPT-4o-mini is employed as the LLM for adjusting reward function weights.
The proposed model is referred to as D3QN-LLM, and its framework is illustrated in Fig. 1. The detailed process flow of the proposed method is shown in Fig. 2.

❗️ Contributions of the Research

By utilizing an LLM to dynamically address the weight-setting challenge in reward function design, a key aspect of reinforcement learning, this study proposes a new research direction.

🔧 Technology Stack and Environment

🚀 Key Technologies

Python, PyTorch (GPU Acceleration) - Deep learning & reinforcement learning implementation
SUMO (Simulation of Urban Mobility) - Traffic simulation and vehicle data generation
GPT-4o-mini (LLM API) - Lightweight language model for traffic situation analysis
LangChain Framework - LLM prompt engineering and response control
Docker - Containerized environment for deployment and reproducibility

🛠 Development Environment

OS: Ubuntu 22.04.5 / Windows (WSL supported)
Docker: Latest version recommended (tested with Docker 27.5.1)
CUDA Support: PyTorch acceleration in an NVIDIA GPU environment (if available)
SUMO Version: 1.20.0 or later recommended
Python Environment: Python 3.10.12 or later (venv or Docker recommended)

📂 Project Structure

📦D3QN-LLM
 ┣ 📂asset
 ┣ 📂d3qn_imgs
 ┣ 📂d3qn_models
 ┣ 📂eval_d3qn_imgs
 ┣ 📂eval_d3qn_txts
 ┣ 📂llm
 ┣ 📂logs
 ┣ 📂sumo
 ┃ ┣ 📂add
 ┃ ┣ 📂detectors
 ┃ ┣ 📂net
 ┃ ┣ 📂rou
 ┃ ┣ 📂trip
 ┣ 📜d3qn_agent.py
 ┣ 📜d3qn_tsc_main.py
 ┗ 📜tsc_env.py

📄 Citation

The following research papers and presentations are related to this project.
If this research has been helpful, please cite the papers below.

📖 Scopus Journal

S. Choi and Y. Lim, "Optimizing Traffic Signal Control Using LLM-Driven Reward Weight Adjustment in Reinforcement Learning," Journal of Information Processing Systems, vol. 21, no. 1, pp. 43-51, 2025.

🎤 Domestic Conference

[Oral Presentation]
최수정 and 임유진, "LLM을 이용한 강화학습기반 교차로 신호 제어," 한국정보처리학회 학술대회논문집, vol. 31, no. 2, pp. 672-675, 2024.

📚 References

B. P. Gokulan and D. Srinivasan, "Distributed geometric fuzzy multiagent urban traffic signal control," IEEE Transactions on Intelligent Transportation Systems, vol. 11, no. 3, pp. 714-727, Sep. 2010. DOI
H. Ceylan and M. G. H. Bell, "Traffic signal timing optimisation based on genetic algorithm approach including drivers’ routing," Transportation Research Part B: Methodological, vol. 38, no. 4, pp. 329-342, 2004. DOI
Sujeong Choi and Yujin Lim, "Reinforcement Learning-Based Traffic Signal Control Using Large Language Models," Annual Conference of KIPS, vol. 31, no. 2, pp. 672-675, 2024.
G. Zheng, X. Zang, N. Xu, H. Wei, Z. Yu, V. Gayah, et al., "Diagnosing reinforcement learning for traffic signal control," arXiv preprint, 2019. DOI
H. Lee, Y. Han, Y. Kim, and Y. H. Kim, "Effects analysis of reward functions on reinforcement learning for traffic signal control," PLoS ONE, vol. 17, no. 11, 2022. DOI
S. Lai, Z. Xu, W. Zhang, H. Liu, and H. Xiong, "Large language models as traffic signal control agents: Capacity and opportunity," arXiv preprint, 2023. DOI
A. Pang, M. Wang, M. O. Pun, C. S. Chen, and X. Xiong, "iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement," arXiv preprint, 2024. DOI
P. A. Lopez, M. Behrisch, L. B. Walz, J. Erdmann, Y. P. Flotterod, R. Hilbrich, L. Lucken, J. Rummel, P. Wagner, and E. Wiessner, "Microscopic traffic simulation using SUMO," Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 2575-2582, 2018. DOI
L. Da, M. Gao, H. Mei, and H. Wei, "Prompt to transfer: Sim-to-real transfer for traffic signal control with prompt learning," AAAI Conference on Artificial Intelligence, vol. 38, pp. 82–90, 2024. DOI

📜 Usage and Copyright Notice

This project code was developed for personal research and experimental purposes and is stored in a public repository.
This code is not open-source and may not be used for commercial purposes, copied, or distributed without permission.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
D3QN-LLM		D3QN-LLM
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Optimizing Traffic Signal Control Using LLM-Driven Reward Weight Adjustment in Reinforcement Learning

📝 Research Overview

❗️ Contributions of the Research

🔧 Technology Stack and Environment

🚀 Key Technologies

🛠 Development Environment

📂 Project Structure

📄 Citation

📖 Scopus Journal

🎤 Domestic Conference

📚 References

📜 Usage and Copyright Notice

About

Uh oh!

Releases

Packages

Uh oh!

Languages

suzzang/D3QN-LLM-for-TSC

Folders and files

Latest commit

History

Repository files navigation

Optimizing Traffic Signal Control Using LLM-Driven Reward Weight Adjustment in Reinforcement Learning

📝 Research Overview

❗️ Contributions of the Research

🔧 Technology Stack and Environment

🚀 Key Technologies

🛠 Development Environment

📂 Project Structure

📄 Citation

📖 Scopus Journal

🎤 Domestic Conference

📚 References

📜 Usage and Copyright Notice

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages