Multi-Armed Bandit Experiments

This repository contains implementations of classic multi-armed bandit algorithms and experiments from Sutton & Barto's "Reinforcement Learning: An Introduction".

Repository Structure

Core Classes

bandit.py - Multi-armed bandit environment with support for stationary and nonstationary reward distributions
greedy_agent.py - Epsilon-greedy agent with configurable exploration rate and step sizes
ucb_agent.py - Upper Confidence Bound (UCB) agent for optimistic action selection
gradient_agent.py - Gradient bandit agent using softmax action selection with preference learning

Experiment Framework

experiment_utils.py - Shared utilities for running experiments and plotting results across multiple agents and configurations

Classic Experiments (Sutton & Barto Replications)

Stationary Bandits

fig_2_2_epsilon_greedy.py - Replicates Figure 2.2: 10-armed testbed comparing epsilon-greedy methods with different exploration rates (ε = 0, 0.01, 0.1)
fig_2_4_ucb.py - Replicates Figure 2.4: Upper-Confidence-Bound action selection compared to epsilon-greedy
fig_2_5_gradient.py - Replicates Figure 2.5: Gradient bandit algorithm comparing different step sizes and baseline effects

Nonstationary Bandits

ex_2_5_nonstationary.py - Implements Exercise 2.5: Compares sample averaging vs exponential recency-weighted averaging in nonstationary environments

Parameter Studies

ex_2_11_parameter_study.py - Comprehensive parameter sweep study comparing all algorithms across different configurations, with parallel processing support

Other Files

stationary_bandits.wls - Wolfram Language implementation of epsilon-greedy experiments
nonstationary_bandits.wls - Wolfram Language implementation of nonstationary bandit experiments

Key Features

Algorithm Implementations

Epsilon-Greedy: Configurable exploration rate, optimistic initialization, sample averaging or constant step sizes
Upper Confidence Bound (UCB): Optimism in the face of uncertainty with configurable confidence parameter
Gradient Bandit: Preference-based learning with optional baseline subtraction and numerical stability

Environment Support

Stationary Bandits: Fixed reward distributions
Nonstationary Bandits: Random walk reward distributions for studying adaptation
Configurable Parameters: Number of arms, reward variance, baseline shifts, drift rates

Experiment Infrastructure

Parallel Processing: Multi-core experiment execution for faster parameter studies
Standardized Interface: Consistent experiment running and result collection
Visualization: Automated plotting of average rewards and optimal action percentages
Reproducibility: Configurable random seeds and experiment parameters

Usage

Each experiment file can be run independently:

# Compare epsilon-greedy exploration rates (Figure 2.2)
python fig_2_2_epsilon_greedy.py

# Compare UCB vs epsilon-greedy (Figure 2.4)
python fig_2_4_ucb.py

# Compare gradient bandit with/without baseline (Figure 2.5)
python fig_2_5_gradient.py

# Study nonstationary adaptation (Exercise 2.5)
python ex_2_5_nonstationary.py

# Comprehensive parameter study (Exercise 2.11)
python ex_2_11_parameter_study.py

Design Principles

Modular Architecture: Separate environment, agent, and experiment concerns
Academic Fidelity: Faithful implementation of textbook algorithms and experiments
Performance: Optimized for large-scale parameter studies with multiprocessing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Armed Bandit Experiments

Repository Structure

Core Classes

Experiment Framework

Classic Experiments (Sutton & Barto Replications)

Stationary Bandits

Nonstationary Bandits

Parameter Studies

Other Files

Key Features

Algorithm Implementations

Environment Support

Experiment Infrastructure

Usage

Design Principles

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.gitignore		.gitignore
README.md		README.md
bandit.py		bandit.py
ex_2_11_parameter_study.py		ex_2_11_parameter_study.py
ex_2_5_nonstationary.py		ex_2_5_nonstationary.py
experiment_utils.py		experiment_utils.py
fig_2_2_epsilon_greedy.py		fig_2_2_epsilon_greedy.py
fig_2_4_ucb.py		fig_2_4_ucb.py
fig_2_5_gradient.py		fig_2_5_gradient.py
gradient_agent.py		gradient_agent.py
greedy_agent.py		greedy_agent.py
nonstationary.png		nonstationary.png
nonstationary_bandits.wls		nonstationary_bandits.wls
stationary.png		stationary.png
stationary_bandits.wls		stationary_bandits.wls
ucb_agent.py		ucb_agent.py

Folders and files

Latest commit

History

Repository files navigation

Multi-Armed Bandit Experiments

Repository Structure

Core Classes

Experiment Framework

Classic Experiments (Sutton & Barto Replications)

Stationary Bandits

Nonstationary Bandits

Parameter Studies

Other Files

Key Features

Algorithm Implementations

Environment Support

Experiment Infrastructure

Usage

Design Principles

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages