Skip to content

Random Selfplay#35

Closed
vwxyzjn wants to merge 4 commits intomasterfrom
league-selfplay
Closed

Random Selfplay#35
vwxyzjn wants to merge 4 commits intomasterfrom
league-selfplay

Conversation

@vwxyzjn
Copy link
Collaborator

@vwxyzjn vwxyzjn commented Jan 12, 2022

This PR prototypes fictitious selfplay.

Known issue: still needs to change the following variables to exclude data from agent2

        b_obs = obs.reshape((-1,) + envs.observation_space.shape)
        b_logprobs = logprobs.reshape(-1)
        b_actions = actions.reshape((-1,) + action_space_shape)
        b_advantages = advantages.reshape(-1)
        b_returns = returns.reshape(-1)
        b_values = values.reshape(-1)
        b_invalid_action_masks = invalid_action_masks.reshape((-1,) + invalid_action_shape)

@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Jan 12, 2022

Running an experiment here: https://wandb.ai/costa-huang/gym-microrts/runs/x9y055w6

# randomly load an opponent: fictitious self-play
list_of_agents = os.listdir(f"models/{experiment_name}")
list_of_agents.remove('agent.pt')
chosen_agent2pt = random.choice(list_of_agents)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we prune agents that have chance of winning < threshold? With the current implementation, it seems to that number of saved agent only grows over time, right? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have much large question here. Will formulate it properly and will ask on Discord, so we can discuss 😀

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yeah this first implementation is very crude. OpenAI Five does it by sampling opponents probabilistically according to their trueskill.

image

@vwxyzjn vwxyzjn changed the title Fictitious selfplay Random Selfplay Jan 19, 2022
@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Jan 19, 2022

Per discussion with @kachayev, it turns out the implementation in this repo is definitely not fictitious selfplay, which also has a supervised learning component. Instead, this PR implements what I call "Random Selfplay" where the agent plays against a random past version of self.

@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Feb 5, 2022

Closed in favor of #57

@vwxyzjn vwxyzjn closed this Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants