Stable Baselines Algorithms

1 minute read

Published:

Intro

Stable Baselines (Docs) is a cleaned up and easier to use version of OpenAI’s baseline Reinforcement Learning algorithms. They support multiple RL algorithms (PPO, DQN, etc) each of which supports some sub-set of features. The docs, however, don’t include a single table where you can see what all the algorithms support in one place. The table below shows them all at a glance, making it easier to decide which algorithms you can or can’t use based on recurrance, continuous actions, multi-processing, etc.

Algorithms

AlgorithmRecurrentMulti-ProcessingReplay BufferAction SpacesObservation Spaces
DiscreteBoxMultiDiscreteMultiBinaryDiscreteBoxMultiDiscreteMultiBinary
A2C✔️✔️✔️✔️✔️✔️✔️✔️✔️✔️
ACER✔️✔️✔️✔️✔️✔️✔️✔️
ACKTR✔️✔️✔️✔️✔️✔️✔️
DDPG✔️✔️✔️✔️✔️✔️
DQN✔️✔️✔️✔️✔️✔️
GAIL✔️✔️ (MPI)✔️✔️✔️✔️✔️
PPO1✔️✔️ (MPI)✔️✔️✔️✔️✔️✔️✔️✔️
PPO2✔️✔️✔️✔️✔️✔️✔️✔️✔️✔️
SAC✔️✔️✔️✔️✔️✔️
TRPO✔️✔️ (MPI)✔️✔️✔️✔️✔️✔️✔️✔️

Notes

  1. DDPG does not support stable_baselines.common.policies because it uses q-value instead of value estimation
  2. DQN does not support stable_baselines.common.policies
  3. PPO2 is the implementation OpenAI made for GPU. For multiprocessing, it uses vectorized environments compared to PPO1 which uses MPI
  4. SAC does not support stable_baselines.common.policies because it uses double q-values and value estimation
  5. HER (Hindsight Experience Replay) is not refactored yet.
Edit 1: add Replay Buffer. ---