This repository was archived by the owner on Oct 16, 2022. It is now read-only.

Description
Hello, thank you for making this repo,
I think while calculating the returns you should take done into consideration as,
def calculate_returns(self, rewards, dones, normalize = True):
returns = []
R = 0
for r, d in zip(reversed(rewards), reversed(dones)):
if d:
R = 0
R = r + R * self.gamma
returns.insert(0, R)
returns = torch.tensor(returns).to(device)
if normalize:
returns = (returns - returns.mean()) / returns.std()
return returns
Also can you please briefly describe the Generalized Advantage Estimation (GAE) while calculating the advantages.