Add random selfplay by vwxyzjn · Pull Request #57 · Farama-Foundation/MicroRTS-Py

vwxyzjn · 2022-02-05T23:54:03Z

Continue from #35

vwxyzjn · 2022-02-07T01:47:08Z

https://wandb.ai/gym-microrts/gym-microrts/runs/3k4i5p4y?workspace=user-costa-huang tracks the run.
Surprisingly, just playing against past selves is enough to produce a SOTA bot as follows.

In comparison, playing against the latest self performs much more poorly as follows:

kachayev

Overall this makes sense. And I can see why playing against the last version leads to overfitting of some sort (I guess). I'm not sure how well does this scale with the number of historical version, but this is perfectly sounds approach for the simplest approach.

vwxyzjn · 2022-02-09T18:45:32Z

Thanks for reviewing @kachayev

I just discovered a problem with this implementation: We are only training the reason that starts from the top left of the map, and when we randomly sample selfs from the past, this self is not trained to start from the bottom left. As a result, we are essentially training an agent to play against a random player... I will need to fix this by placing with p1_idx and p2_idx.

kachayev · 2022-02-09T19:17:11Z

Oh, that's a really good point! Should this be a part of the environment setting, like a random placement of opponents? It should be simple to add, just need to be careful with flipping player ids in observations

vwxyzjn · 2022-02-09T21:33:52Z

Oh, that's a really good point! Should this be a part of the environment setting, like a random placement of opponents?

I considered something like this but abandoned the idea because it made training twice as slow. We thought this was good kind of slow because maybe the agent learns something general such as to move towards the enemy instead of "just going to the bottom right".

However, it turns out the agent just learned "going to the bottom right" and "going to the top left", so not that exciting from the generalization standpoint and therefore kind of a waste of compute.

Ultimately this is something we should do (at least give an option to do randomized starting location), but it's probably not a big priority right now.

vwxyzjn · 2022-02-09T21:35:33Z

I have done some index manipulation:

p1_idxs = [1, 3, 5, 7, 9, 11, 12, 14, 16, 18, 20, 22]
p2_idxs = [0, 2, 4, 6, 8, 10, 13, 15, 17, 19, 21, 23]

Now the agent issues action for the player starting from the bottom right (the 1, 3, 5, 7, 9, 11-th environments) and the reset starting from top left

vwxyzjn · 2022-02-09T21:39:04Z

Interestingly, the current runs we have suggest just playing with an almost-random player (red line) is still better than playing against the latest selfs (blue lines). The experiments are done with three random seeds each. I am going to run the experiments for the correct ppo_gridnet_rs.py.

vwxyzjn · 2022-02-19T23:59:06Z

When using the correct implementation, random selfplay performs no better than naive / latest selfplay

Add random selfplay

2fb2544

vwxyzjn mentioned this pull request Feb 5, 2022

Random Selfplay #35

Closed

vwxyzjn requested a review from kachayev February 7, 2022 01:47

kachayev approved these changes Feb 8, 2022

View reviewed changes

vwxyzjn added 2 commits February 9, 2022 16:26

let player 1 plays both starting locations

f38552e

fix formatting

8280572

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add random selfplay#57

Add random selfplay#57
vwxyzjn wants to merge 3 commits intomasterfrom
new-rsp

vwxyzjn commented Feb 5, 2022

Uh oh!

vwxyzjn commented Feb 7, 2022 •

edited

Loading

Uh oh!

kachayev left a comment

Uh oh!

vwxyzjn commented Feb 9, 2022

Uh oh!

kachayev commented Feb 9, 2022

Uh oh!

vwxyzjn commented Feb 9, 2022

Uh oh!

vwxyzjn commented Feb 9, 2022

Uh oh!

vwxyzjn commented Feb 9, 2022

Uh oh!

vwxyzjn commented Feb 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

vwxyzjn commented Feb 5, 2022

Uh oh!

vwxyzjn commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kachayev left a comment

Choose a reason for hiding this comment

Uh oh!

vwxyzjn commented Feb 9, 2022

Uh oh!

kachayev commented Feb 9, 2022

Uh oh!

vwxyzjn commented Feb 9, 2022

Uh oh!

vwxyzjn commented Feb 9, 2022

Uh oh!

vwxyzjn commented Feb 9, 2022

Uh oh!

vwxyzjn commented Feb 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vwxyzjn commented Feb 7, 2022 •

edited

Loading