Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction
Using RL value functions to encode semantic knowledge, specifically by a robot.
Using RL value functions to encode semantic knowledge, specifically by a robot.
Off-Policy AC with linear state features. Includes elegibility traces.
One of the first deep reinforcement learning papers.
Improved Q-value estimation by reducing overestimates of Deep Q-networks.
In-progress second edition of an RL textbook.
Overview (with references) of attention and several types of augmentation for RNNs.
Increase speed of a Reinforcement Learning system with auxiliary task.
Five questions to ask about your deep learning project.
Relationships between objects.
A single ML model used for very different tasks.
Notes on using Eligibility Traces with neural networks
A long review of the use of DL in robotics
A long review of the use of DL in robotics
Pre-train using supervised learning on human provided demonstations.
AlphaGo Zero, all RL self-play.
Tensorizing LSTMs to make them wider and deeper without adding parameters and with minimal extra compute costs.
Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations.
Unsupervised learning of image encoding and dynamics model.
Unsupervised training of a memory that is used for prediction of state and reward.
Unsupervised learning of image encoding, dynamics and reward models.
Learned dynamics model with a GAN for image generation and MCTS for planning.
Continual learning with a universal, off-policy agent.
Access consciousness and it’s relation to general intelligence