Actor/Critic Papers
- Sample Efficient Actor-Critic With Experience Replay
- The Reactor: A Sample-Efficient Actor-Critic Architecture
- Also compares time stacked inputs versus LSTMs in section 3.3.
- Asynchronous Methods for Deep Reinforcement Learning
- The A3C paper.
- A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
- 2010, maybe?
- ON ACTOR-CRITIC ALGORITHMS
- 2003
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
- This algorithm is in OpenAI’s baseline repo.
- It’s a natural gradient actor critic method (Natural Gradients).
- Hindsight Experience Replay
- SOFT ACTOR-CRITIC: OFF-POLICY MAXIMUM ENTROPY DEEP REINFORCEMENT LEARNING WITH A STOCHASTIC ACTOR
- Mean Actor Critic
