Skip to content

EmmaZou24/go-agent

Repository files navigation

README

Project Overview

This codebase contains search problem formulations, models, agents, etc. related to the game of Go. The main component of this codebase is the final agent, which combines aspects of Monte-Carlo tree sampling and iterative alpha-beta pruning, as well as an array of heuristics. My agent achieved #2 in the class rankings for all final agents.

The final agent code is in agents.py along with code for greedy, mini-max, iterative alpha-beta pruning, and MCTS agents. The agents can be run via terminal line commands; see game_runner.py for more information.

Final Agent Formulation

Better Playout with Heuristics (MCTS)

I read the thesis by Petr Baudis to get some ideas on heuristics and/or policies for better rollout simulations in MCTS. In their agent, Pachi, they use a rule-based policy and a combination of different heuristics to select the next state at each node, rather than using uniform random rollout. I decided to use a similar approach of defining several different heuristics that evaluate a state based on different criteria. Instead of applying each heuristic in a fixed order and picking only one heuristic to use at a time (like they do in the paper), however, I add the returned values and weighted them.

Additionally, each of these heuristics takes in TWO states rather than just one: the current state, and the potential next state being evaluated. This is so that the agent can choose a step based on its properties relative to another state. For example, we could calculate captured stones as simply num_black_pieces - num_white_pieces (or vice versa). However, if this results in a positive number but is actually less than the value of the previous state (meaning black overall has more pieces BUT it lost pieces through that move), we don't want to choose it.

The heuristics I implemented (which can be found in MCTSBetterAgent) are as follows:

  • CaptureHeuristic: variation on the SimpleHeuristic that accounts for the player to move. Rewards a state based on how many opponent's stones the current player has captured.
  • ProximityHeuristic: rewards a state based on how adjacent the pieces of the current player are
  • TerritoryHeuristic: uses the get_scores() method in the Game class on an input state, and rewards states based on their amount of territory
  • LearnedHeuristic: the heuristic learned from the ValueNetwork. There is no method for this; instead, the model and GoProblemLearnedHeuristic is initialized when the agent is first created, and the .heuristic() function is called directly on the GoProblemLearnedHeuristic.
  • BorderHeuristic: rewards a state based on the distance of stones from the borders. More distance (meaning more central) is better.

I also added two layers of randomization. When simulate() is called, there is a 20% chance that random_rollout is used instead of the heuristic combination rollout. Then, even in the heuristic combination rollout, there is a 30% chance in each "step" that a random action will be chosen rather than the best action. This is to prevent the algorithm from becoming too deterministic and getting stuck in a loop.

RESULTS: This MCTS agent (known as MCTSBetter) wins consistently against the default MCTS implementation.

Information Saving / Transposition Tables

Since there is a possibility that the same state may be achieved by get_move several times, I implemented a dictionary that maps states to their calculated best moves. In the case of a repeated state, the get_move of my final agent has an 80% chance of using the previously calculated best move and a 20% chance of proceeding with the full calculation again (to avoid determinism). This dictionary acts as a transposition table that allows reuse of information for already-visited states.

I also implemented a dictionary that tracks the best leaf node calculated by select() for each state. Similarly to the above approach, I added some degree of randomness that allows for recalculation of select() even if the state has already been visited.

RESULTS: While this didn't have a huge effect on the winrate of my agent, I noticed that it was able to decrease the average move time slightly.

Stage-Aware Agent (Time Limit / Hybrid Agents)

I noticed that if I am able to make my agent account for what "stage" of the game it is in, then several new opportunities for improvement appear: for example, I can then make the agent's time limit usage smarter, and also use hybrid agents depending on the stage of the game.

To get a clearer definition of the "stage" of a game, I ran a series of experiments pitting my final agent against itself, and recorded the average number of moves. To be minimally obtrusive in the stencil code for running games, tournaments, etc. I added a counter for total number of moves per game and printed that data, then manually calculated the average for 30 games. I got an average of 84.24 moves per game, which means about 42.12 moves per player.

For deciding between using the iterative deepening agent vs. the MCTS agent, I decided to represent "late game" as anything where the board has more than 20 stones on it, and "early game" as anything with less than 20 stones. Iterative deepening performs better and faster than MCTS in late game, so I implemented a check that made MCTS only be used for the early game.

For deciding the time cutoff for agents, I decided to give less time to game states with less than 8 or more than 18 stones on the board. Agents require less time to decide when in very early or late game stages, so this limit allows them to conserve more time for the more crucial middle game decisions.

RESULTS: My agent performed faster than before.

Performance Figure

I decided to make a heatmap of the # of rollouts per action for a given GoState, where the 2D heatmap cells correspond to the cells in the actual game board. Please see mcts_figure.png for my figure, and state_figure.png for a visualization of the state used to generate it! (Ignore the number labels in state_figure.png)

To create the figure, I added a parameter (create_plot = False by default) in my MCTS get_move method so that MCTS could still function normally, but is still able to feed data on rollouts into a plot upon request. My code for actually calling the function (which I use on a state that I get by iterating randomly through a few actions from the start state) is in mcts_figure.py. I also plot the state on a similar board (please note that black corresponds to the black player, while red corresponds to the white player) for better analysis.

For the given state (and for other randomly generated ones that I have generated a figure for), it appears that MCTS more frequently visits actions/cells that are near already-placed stones for the current player. It also frequently visits cells that could create territory or prevent the opponent from creating territory, which suggests that it has some sense of the concept of territory! One interesting thing is that I also generated a figure when I initially implemented the "one-child" version of expansion for MCTS, which resulted in more variation in the cells visited, compared to the few very highly visited cells in the current figure. This suggests that when MCTS adds all children during expansion, it is able to hone in on the best states more than when MCTS adds only one random child.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors