Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction

Published:

Download

Notes

Individual, small value functions (called ‘demons’ or ‘Generalized Value Functions’ in the paper) are learned, each with different functions for: policy, discount, reward and terminal reward. The functions are used to describe a predictive or control ‘question’ about the agent’s environment and through RL the demon learns to answer it. Policies consisted of linear function approximation over tiled representations of the state.

Because the demons can learn off-policy, they could train multiple demons, each with different function-questions while following a single behaivoral policy.

They presented results of several experiments using the Critterbot, which has a huge number of sensors.

The MultiModel paper has some similarities with this one. Each uses a number of small-ish, modular networks.