- Trust Region Policy Optimization
- Reinforcement Learning with Deep Energy-Based Policies
- Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC
- The Reactor: A Sample-Efficient Actor-Critic Architecture
- SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY
- REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS
- Continuous control with deep reinforcement learning