Papers To Read

Sections

Papers read
Papers to Read, by Category
Reviews
Relevant to MaLPi
Not as relevant to MaLPi, but interesting
Autoencoders
Memory
Classes/Education
Simulators

Papers to Read, by Category

Reviews

Relevant to MaLPi

Curiosity-driven Exploration by Self-supervised Prediction
- Arxiv link
Learning Atari: An Exploration of the A3C Reinforcement Learning Methods.
- This paper is from Berkeley class, but I don’t have a direct link for it. Google search should work.
A Robust Adaptive Stochastic Gradient Method for Deep Learning
Bridging the Gap Between Value and Policy Based Reinforcement Learning
- PCL implementation
Learning from Demonstrations for Real World Reinforcement Learning
On Generalized Bellman Equations and Temporal-Difference Learning
Equivalence Between Policy Gradients and Soft Q-Learning
- Reddit discussion
Count-Based Exploration with Neural Density Models
- Replacing epsilon greedy exploration with a generalized count-based exploration strategy.
One-Shot Imitation Learning
Efficient Parallel Methods for Deep Reinforcement Learning
Recurrent Additive Networks
- A simpler type of RNN. Not sure if/where it’s been published. Only tested on language tasks?
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
- Followup to the auxiliary tasks paper.
Non-Markovian Control with Gated End-to-End Memory Policy Networks
Experience Replay Using Transition Sequences
Self-Normalizing Neural Networks
- A replacement for RELU activation. Looks fairly simple to implement and try. A quote from the abstract, “…thus, vanishing and exploding gradients are impossible.”
Grounded Language Learning in a Simulated 3D World
- Rewards for completing tasks given in written instructions.
MEC: Memory-efficient Convolution for Deep Neural Network
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- CNN -> LSTM architecture
Neural SLAM
Expected Policy Gradients
Noisy Networks for Exploration
- Replace e-greedy or entropy methods of exploration with noisy parameters
Learning from Demonstrations for Real World Reinforcement Learning
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
- Blog
- Very nice idea of having a layer between the agent and the environment for preventing disastrous behavior.
- Initially handled by a human but later by a learned system.
Bayesian Neural Networks with Random Inputs for Model Based Reinforcement Learning
- I read through this once, but don’t understand most of it.
Proximal Policy Optimization Algorithms
- From OpenAI.org: “outperforms other online policy gradient methods”
Better Exploration with Parameter Noise
- Looks like I would need Layer Normalization first.
Guiding Reinforcement Learning Exploration Using Natural Language
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
Decoupled Learning of Environment Characteristics for Safe Exploration
Knowledge Sharing for Reinforcement Learning: Writing a BOOK
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
- Variational or Bayesian Dropout, for use with RNN’s.
Training RNNs as Fast as CNNs
The Uncertainty Bellman Equation and Exploration
Lifelong Learning with Dynamically Expandable Networks
Overcoming Exploration in Reinforcement Learning with Demonstrations
- Mentions something called Hindsight Experience Replay.
Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation
Embodied Question Answering
- Paper
Time Limits in Reinforcement Learning
- For value networks that will be used in a non-episodic way, don’t end bootstraping at training episode boundaries.
Time-Contrastive Networks: Self-Supervised Learning from Video
- Website
Reverse Curriculum Generation for Reinforcement Learning Agents
- This could be very useful when I try to train MaLPi to find its charging station.
- Paper: Reverse Curriculum Generation for Reinforcement Learning
- Code
Deep reinforcement learning from human preferences
Ray RLLib
Attention based neural networks
Expected Policy Gradients for Reinforcement Learning
Model-Based Action Exploration
Curiosity-driven reinforcement learning with homeostatic regulation
Regret Minimization for Partially Observable Deep Reinforcement Learning
One-shot Imitation from Humans via Domain-Adaptive Meta-Learning
Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization
Temporal Difference Models: Model-Free Deep RL for Model-Based Control, BAIR
Reinforcement and Imitation Learning for Diverse Visuomotor Skills
Kickstarting Deep Reinforcement Learning
Composable Deep Reinforcement Learning for Robotic Manipulation, BAIR
Recall Traces: Backtracking Models for Efficient Reinforcement Learning, BAIR
Universal Planning Networks
Latent Space Policies for Hierarchical Reinforcement Learning, BAIR
- ICML link
Averaging Weights Leads to Wider Optima and Better Generalization
- Running average of weights during training to create an effect similar to ensembling, but at training time instead of run/inference time.
- Blog post about implementing it
- PyTorch implementation
Temporal Difference Models: Model-Free Deep RL for Model-Based Control, BAIR
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review BAIR
Hierarchical Reinforcement Learning with Deep Nested Agents
Data-Efficient Hierarchical RL
Variational Inference for Data-Efficient Model Learning in POMDPs
- intro to structured inference networks
Fast Policy Learning through Imitation and Reinforcement
Relational Deep Reinforcement Learning
Backplay: “Man muss immer umkehren”
- Another curriculum learning paper where they start near the goal and work backwards.
Shared Multi-Task Imitation Learning for Indoor Self-Navigation
A Distributional Perspective on Reinforcement Learning
Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision
Shared Multi-Task Imitation Learning for Indoor Self-Navigation
Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space
CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning
An Overview of Multi-Task Learning in Deep Neural Networks
Learning Hierarchical Information Flow with Recurrent Neural Modules
Venkatraman, et al. Predictive state decoders: Encoding the future into recurrent networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2017.
At Human Speed: Deep Reinforcement Learning with Action Delay
Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience
Safe Reinforcement Learning with Model Uncertainty Estimates
Curiosity-driven Exploration by Self-supervised Prediction
Towards Governing Agent’s Efficacy: Action-Conditional β-VAE for Deep Transparent Reinforcement Learning
Learned optimizers that outperform SGD on wall-clock and validation loss
Reversible Recurrent Neural Networks
Model-Based Active Exploration
Differentiable MPC for End-to-end Planning and Control
Toward an AI Physicist for Unsupervised Learning
Memory-based control with recurrent networks, Heess et al. Meta-learning
Gu, Holly, Lillicrap ‘16 parallel NAF. Continuous action space Q learning
Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting
Constrained Exploration and Recovery from Experience Shaping
Building a Winning Self-Driving Car in Six Months
QUOTA: The Quantile Option Architecture for Reinforcement Learning
Efficient Eligibility Traces for Deep Reinforcement Learning
Papers that cite World Models
Flatland: a Lightweight First-Person 2-D Environment for Reinforcement Learning
- Looks interesting, says code will be available at some point.
Guiding Policies with Language via Meta-Learning
Learning Actionable Representations with Goal-Conditioned Policies
Autoencoding beyond pixels using a learned similarity metric
Randomized Prior Functions for Deep Reinforcement Learning
An Introduction to Deep Reinforcement Learning
Retrieving from a large memory:
- The Kanerva Machine: A Generative Distributed Memory
- Followup
- Shaping Belief States with Generative Environment Models for RL
Adapting Auxiliary Losses Using Gradient Similarity
RUDDER: Return Decomposition for Delayed Rewards
Learning To Simulate
Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks
Self-supervised Learning of Image Embedding for Continuous Control
AlphaStar
- This blog about DeepMind’s StarCraft AI has a large list of potentially useful links.
- Original LSTM paper
- Pointer Networks
The Value Function Polytope in Reinforcement Learning
A Geometric Perspective on Optimal Representations for Reinforcement Learning
- Finding better representations. Follow on to previous paper.
Task2Vec: Task Embedding for Meta-Learning
Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup
Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
World Discovery Models
From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
Continual Learning with Tiny Episodic Memories
Using Natural Language for Reward Shaping in Reinforcement Learning
Assessing Generalization in Deep Reinforcement Learning
Inductive transfer with context-sensitive neural networks
- David Silver, adding context to multi-task learning, 2008.
Reinforced Imitation in Heterogeneous Action Space
Reinforcement Learning with Attention that Works: A Self-Supervised Approach
Gershman, S.J. and Daw, N.D. (2017) Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128
Meta-learning of Sequential Strategies
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
- Code
- Blog
- Based on the SVAE paper (in Autoencoders)
Robustness to Out-of-Distribution Inputs via Task-Aware Generative Uncertainty
Multi-Sample Dropout for Accelerated Training and Better Generalization
Learning Powerful Policies by Using Consistent Dynamics Model
- Add an auxiliary task to the learned model that penalizes errors in future predictions.
Soft Actor-Critic Algorithms and Applications
- Project Page
- Fewer hyperparameters, better sample efficiency.
- Learning to Walk via Deep Reinforcement Learning
  - SAC with a learneable temperature hyperparameter.
Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future
- Add an auxiliary task to predict the far future.
- Includes use in imitation learning
Unsupervised Learning of Object Keypoints for Perception and Control
Real-Time Freespace Segmentation on Autonomous Robots for Detection of Obstacles and Drop-Offs
Dynamics-aware Embeddings
A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
- Blog
Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients
- Code
A Mobile Manipulation System for One-Shot Teaching of Complex Tasks in Homes
OpenAI’s Automatic Domain Randomization on the DonkeyCar simulator?
- Blog
- Paper
- Start with a single, easy environment in sim. When performance plateaus, increase the range of simulated features. E.g. increase range of friction, or weight of car, or size/color of lane markings.
- They used Embed + Sum on the inputs so they didn’t need to change the policy between sim and real.
- They used Policy cloning (and DAgger?) to train a new policy from an older one, e.g. if the policy architecture did change. Section 6.4.
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction
- Repo, but no code yet.
Emergent Communication with World Models
Word2vec to behavior: morphology facilitates the grounding of language in machines
- Code
DeepRacer
CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
- Eliminates the need for a target network?
Optimizing agent behavior over long time scales by transporting value
- Looking back over episodic memory
- Code
Reinforcement Learning Upside Down: Don’t Predict Rewards – Just Map Them to Actions
Training Agents using Upside-Down Reinforcement Learning
A Simple Randomization Technique for Generalization in Deep Reinforcement Learning
Prioritized Sequence Experience Replay
RTFM: Generalising to New Environment Dynamics via Reading
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Gradient Surgery for Multi-Task Learning
Q-Learning in enormous action spaces via amortized approximate maximization
Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning
Reinforcement Learning with Convolutional Reservoir Computing
Neuroevolution of Self-Interpretable Agents
- David Ha, et. al.

Not as relevant to MaLPi, but interesting

Explaining and Harnessing Adversarial Examples
Beating Atari with Natural Language Guided Reinforcement Learning
A Neural Representation of Sketch Drawings
Hybrid computing using a neural network with dynamic external memory
Bayesian Recurrent Neural Networks
- A Tensorflow implementation
- Another TF implementation
ML for analyzing unix log files ss Concrete Dropout
Bayesian Reinforcement Learning: A Survey
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation
Meta learning Framework for Automated Driving
Representation Learning for Grounded Spatial Reasoning
- Instruction text -> LSTM -> vectors 1 and 2
- V1 is used as a kernel in a convolution over the state space object embeddings (hand built?)
- V2 is used to make a global map representation of the input
- both outputs are concatenated and input to a CNN to predict the final map value
Early Stage Malware Prediction Using Recurrent Neural Networks
Hwang J, Jung M, Madapana N, et al. Achieving “synergy” in cognitive behavior of humanoids via deep learning of dynamic visuo-motor-attentional coordination. Humanoid Robots (Humanoids), 2015 IEEE-RAS 15th International Conference on; Seoul. 2015. p. 817-824.
- Combined human gesture recogniztion, attention, object detection and grasping.
- Arxiv page
Deep Mixture Density Network (MDN)
- “MDNs combine the benefits of DNNs and GMMs (Gaussian mixture model) by using the DNN to model the complex relationship between input and output data, but providing probability distributions as output”
- C. Bishop. Mixture density networks, Tech. Rep. NCRG/94/004, Neural Computing Research Group. Aston University, 1994.
- H. Zen, A. Senior. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis, ICASSP, 2014.
Anytime Neural Networks via Joint Optimization of Auxiliary Losses
Language Grounding for Robotics accepted papers
Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav Gupta, Li Fei- Fei, and Ali Farhadi. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. In ICRA, 2017.
Augmenting End-to-End Dialog Systems with Commonsense Knowledge
Predictive representations can link model-based reinforcement learning to model-free mechanisms
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
Understanding Generalization and Stochastic Gradient Descent
- Includes how to choose the best batch size for test set accuracy.
A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs
Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands
Peephole: Predicting Network Performance Before Training
Peano-HASEL actuators: Muscle-mimetic, electrohydraulic transducers that linearly contract on activation
Hydraulically amplified self-healing electrostatic actuators with muscle-like performance
Unsupervised Low-Dimensional Vector Representations for Words, Phrases and Text that are Transparent, Scalable, and produce Similarity Metrics that are Complementary to Neural Embeddings
Emergent complexity via multi-agent competition
PRNN: Recurrent Neural Network with Persistent Memory
Convolutional Neural Networks for Sentence Classification
- Might be useful to classify task descriptions for a multi-task system.
Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods
Scalable Meta-Learning for Bayesian Optimization
Learning to Play with Intrinsically-Motivated Self-Aware Agents
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Accelerated Methods for Deep Reinforcement Learning. Adam Stooke and Pieter Abbeel
Learning and Querying Fast Generative Models for Reinforcement Learning
Learning by Playing - Solving Sparse Reward Tasks from Scratch
Selective Experience Replay for Lifelong Learning
Semi-Parametric Topological Memory For Navigation
Shifting Mean Activation Towards Zero with Bipolar Activation Functions
- Alternative to Batch Norm for normalization
Strategic attentive writer for learning macro-actions
The Limits and Potentials of Deep Learning for Robotics
AutoAugment: Learning Augmentation Policies from Data
Progress & Compress: A scalable framework for continual learning
Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems
Unsupervised Meta-Learning for Reinforcement Learning
Unsupervised Learning by Competing Hidden Units
Adaptive Neural Trees
- Combining Decision Trees and neural nets
Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors
Papers of the Year
Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning
LASER Language-Agnostic SEntence Representations
- Pre-trained multi-lingual embeddings. Possibly useful for task description embedding.
Building Machines That Learn and Think Like People
Learning to Understand Goal Specifications by Modelling Reward
Investigating Generalisation in Continuous Deep Reinforcement Learning
Hyperbolic Discounting and Learning over Multiple Horizons
- Also useful as an auxiliary task.
Recurrent Experience Replay in Distributed Reinforcement Learning
Stiffness: A New Perspective on Generalization in Neural Networks
IndyLSTMs: Independently Recurrent LSTMs
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Diagnosing Bottlenecks in Deep Q-learning Algorithms
Large-Scale Long-Tailed Recognition in an Open World
- Another use of memory, this time for situations that don’t occur enough to train on.
Human Visual Understanding for Cognition and Manipulation – A primer for the roboticist
Stand-Alone Self-Attention in Vision Models
- Replacing convolutions with attention in vision models.
Learning the Arrow of Time
Improving the robustness of ImageNet classifiers using elements of human visual cognition
- Episodic memory and shape based representations.
When to Trust Your Model: Model-Based Policy Optimization
Metalearned Neural Memory
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Hierarchical Decision Making by Generating and Following Natural Language Instructions
AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers
- Blog
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras
Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments
A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms
Weakly Supervised Disentanglement with Guarantees
Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning
Regularization Matters in Policy Optimization
Meta-Learning without Memorization

Autoencoders

Memory

Neural Episodic Control
AMRL: Aggregated Memory For Reinforcement Learning
- One of several recent papers on memory.
- See also: REALM: Retrieval-Augmented Language Model Pre-Training
  - Finished reading this. The Future Work section includes ideas about where this could be generalized, e.g. structured knowledge.
Reservoir memory machines

Classes/Education

CS 294: Deep Reinforcement Learning
CMU 10703, Spring 2017 Deep Reinforcement Learning and Control
Stanford CS234: Reinforcement Learning
David Silver’s UCL Course on RL
Deep Learning (DLSS) and Reinforcement Learning (RLSS) Summer School, Montreal 2017 (videos)
Deep RL Bootcamp (Aug 2017, Berkeley
Theories of Deep Learning (STATS 385), Stanford 2017
Bayesian Deeplearning
CS20: TensorFlow for Deep Learning Research
A list of fifteen more classes (Some overlap)
Hierarchical RL Workshop
- Includes lectures by David Silver
Google’s Machine Learning Crash Course (15 hours, lessons, videos, exercises)
CS294-158 Deep Unsupervised Learning Spring 2019
Testing and Debugging in Machine Learning (~4 hours)
Metacademy
- Lists of subjects and prerequisites for ML.
Mathematics for Machine Learning
- A free textbook.

Simulators