Stable Baselines Algorithms
Published:
Intro
Stable Baselines (Docs) is a cleaned up and easier to use version of OpenAI’s baseline Reinforcement Learning algorithms. They support multiple RL algorithms (PPO, DQN, etc) each of which supports some sub-set of features. The docs, however, don’t include a single table where you can see what all the algorithms support in one place. The table below shows them all at a glance, making it easier to decide which algorithms you can or can’t use based on recurrance, continuous actions, multi-processing, etc.
Algorithms
| Algorithm | Recurrent | Multi-Processing | Replay Buffer | Action Spaces | Observation Spaces | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Discrete | Box | MultiDiscrete | MultiBinary | Discrete | Box | MultiDiscrete | MultiBinary | ||||
| A2C | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
| ACER | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
| ACKTR | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
| DDPG | ❌ | ❌ | ✔️ | ❌ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
| DQN | ❌ | ❌ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
| GAIL | ✔️ | ✔️ (MPI) | ❌ | ❌ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
| PPO1 | ✔️ | ✔️ (MPI) | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
| PPO2 | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
| SAC | ❌ | ❌ | ✔️ | ❌ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
| TRPO | ✔️ | ✔️ (MPI) | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Notes
- DDPG does not support stable_baselines.common.policies because it uses q-value instead of value estimation
- DQN does not support stable_baselines.common.policies
- PPO2 is the implementation OpenAI made for GPU. For multiprocessing, it uses vectorized environments compared to PPO1 which uses MPI
- SAC does not support stable_baselines.common.policies because it uses double q-values and value estimation
- HER (Hindsight Experience Replay) is not refactored yet.
