Reinforcement Learning with Unsupervised Auxiliary Tasks

Published:

Download

Notes

Add tasks that are not directly related to rewards but are used to train the same network. Examples in the paper are pixel control, where the goal is to maximize changes in pixel intensities in different regions of the input images. Reward prediction tries to estimate reward from one frame after seeing three previous frames.