Stable Baselines Algorithms
Published:
Intro
Stable Baselines (Docs) is a cleaned up and easier to use version of OpenAI’s baseline Reinforcement Learning algorithms. They support multiple RL algorithms (PPO, DQN, etc) each of which supports some sub-set of features. The docs, however, don’t include a single table where you can see what all the algorithms support in one place. The table below shows them all at a glance, making it easier to decide which algorithms you can or can’t use based on recurrance, continuous actions, multi-processing, etc.
Algorithms
Algorithm | Recurrent | Multi-Processing | Replay Buffer | Action Spaces | Observation Spaces | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Discrete | Box | MultiDiscrete | MultiBinary | Discrete | Box | MultiDiscrete | MultiBinary | ||||
A2C | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
ACER | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
ACKTR | ✔️ | ✔️ | ❌ | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
DDPG | ❌ | ❌ | ✔️ | ❌ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
DQN | ❌ | ❌ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
GAIL | ✔️ | ✔️ (MPI) | ❌ | ❌ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
PPO1 | ✔️ | ✔️ (MPI) | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
PPO2 | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
SAC | ❌ | ❌ | ✔️ | ❌ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ |
TRPO | ✔️ | ✔️ (MPI) | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Notes
- DDPG does not support stable_baselines.common.policies because it uses q-value instead of value estimation
- DQN does not support stable_baselines.common.policies
- PPO2 is the implementation OpenAI made for GPU. For multiprocessing, it uses vectorized environments compared to PPO1 which uses MPI
- SAC does not support stable_baselines.common.policies because it uses double q-values and value estimation
- HER (Hindsight Experience Replay) is not refactored yet.