Conversation
8930ae0 to
ad5f40e
Compare
| gradient dqda element-wise between ``[-dqda_clipping, dqda_clipping]``. | ||
| Does not perform clipping if ``dqda_clipping == 0``. | ||
| action_l2 (float): weight of squared action l2-norm on actor loss. | ||
| use_batch_ensemble (bool): whether to use BatchEnsemble FC and Conv2D |
There was a problem hiding this comment.
Ideally, we might should make these batch ensemble related parameters transparent to the ddpg_algorithm? Basically, the ddpg_algorithm should not use batch_ensemble related parameters in the ideal case.
There was a problem hiding this comment.
That's a good point. Currently ddpg needs the use_batch_ensemble to do some post processing when forwarding critic networks during training. Let me think it over if there might be some alternative methods to work around.
| pred_step.output) | ||
| return pred_step | ||
|
|
||
| if self.need_full_rollout_state(): |
There was a problem hiding this comment.
We want the algorithm use the same ensemble_id during an entire episode. This means that it should store ensembled_id in state and use the same ensemble_id to call actor_network
There was a problem hiding this comment.
Oh yes, good point, I think that is the reason why I had to tweak the ddpg_algorithm_test to pass the toy unittest. Updated.
Update
actor_network,critic_network, andddpg_algorithmto work with batch_ensemble layers.