Sections
Papers to Read, by Category
Reviews
Relevant to MaLPi
- Curiosity-driven Exploration by Self-supervised Prediction
- Learning Atari: An Exploration of the A3C Reinforcement Learning Methods.
- This paper is from Berkeley class, but I don’t have a direct link for it. Google search should work.
- A Robust Adaptive Stochastic Gradient Method for Deep Learning
- Bridging the Gap Between Value and Policy Based Reinforcement Learning
- Learning from Demonstrations for Real World Reinforcement Learning
- On Generalized Bellman Equations and Temporal-Difference Learning
- Equivalence Between Policy Gradients and Soft Q-Learning
- Count-Based Exploration with Neural Density Models
- Replacing epsilon greedy exploration with a generalized count-based exploration strategy.
- One-Shot Imitation Learning
- Efficient Parallel Methods for Deep Reinforcement Learning
- Recurrent Additive Networks
- A simpler type of RNN. Not sure if/where it’s been published. Only tested on language tasks?
- Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
- Followup to the auxiliary tasks paper.
- Non-Markovian Control with Gated End-to-End Memory Policy Networks
- Experience Replay Using Transition Sequences
- Self-Normalizing Neural Networks
- A replacement for RELU activation. Looks fairly simple to implement and try. A quote from the abstract, “…thus, vanishing and exploding gradients are impossible.”
- Grounded Language Learning in a Simulated 3D World
- Rewards for completing tasks given in written instructions.
- MEC: Memory-efficient Convolution for Deep Neural Network
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- Neural SLAM
- Expected Policy Gradients
- Noisy Networks for Exploration
- Replace e-greedy or entropy methods of exploration with noisy parameters
- Learning from Demonstrations for Real World Reinforcement Learning
- Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
- Blog
- Very nice idea of having a layer between the agent and the environment for preventing disastrous behavior.
- Initially handled by a human but later by a learned system.
- Bayesian Neural Networks with Random Inputs for Model Based Reinforcement Learning
- I read through this once, but don’t understand most of it.
- Proximal Policy Optimization Algorithms
- From OpenAI.org: “outperforms other online policy gradient methods”
- Better Exploration with Parameter Noise
- Guiding Reinforcement Learning Exploration Using Natural Language
- Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
- DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
- Decoupled Learning of Environment Characteristics for Safe Exploration
- Knowledge Sharing for Reinforcement Learning: Writing a BOOK
- A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
- Variational or Bayesian Dropout, for use with RNN’s.
- Training RNNs as Fast as CNNs
- The Uncertainty Bellman Equation and Exploration
- Lifelong Learning with Dynamically Expandable Networks
- Overcoming Exploration in Reinforcement Learning with Demonstrations
- Mentions something called Hindsight Experience Replay.
- Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation
- Embodied Question Answering
- Time Limits in Reinforcement Learning
- For value networks that will be used in a non-episodic way, don’t end bootstraping at training episode boundaries.
- Time-Contrastive Networks: Self-Supervised Learning from Video
- Reverse Curriculum Generation for Reinforcement Learning Agents
- Deep reinforcement learning from human preferences
- Ray RLLib
- Attention based neural networks
- Expected Policy Gradients for Reinforcement Learning
- Model-Based Action Exploration
- Curiosity-driven reinforcement learning with homeostatic regulation
- Regret Minimization for Partially Observable Deep Reinforcement Learning
- One-shot Imitation from Humans via Domain-Adaptive Meta-Learning
- Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization
- Temporal Difference Models: Model-Free Deep RL for Model-Based Control, BAIR
- Reinforcement and Imitation Learning for Diverse Visuomotor Skills
- Kickstarting Deep Reinforcement Learning
- Composable Deep Reinforcement Learning for Robotic Manipulation, BAIR
- Recall Traces: Backtracking Models for Efficient Reinforcement Learning, BAIR
- Universal Planning Networks
- Latent Space Policies for Hierarchical Reinforcement Learning, BAIR
- Averaging Weights Leads to Wider Optima and Better Generalization
- Temporal Difference Models: Model-Free Deep RL for Model-Based Control, BAIR
- Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review BAIR
- Hierarchical Reinforcement Learning with Deep Nested Agents
- Data-Efficient Hierarchical RL
- Variational Inference for Data-Efficient Model Learning in POMDPs
- Fast Policy Learning through Imitation and Reinforcement
- Relational Deep Reinforcement Learning
- Backplay: “Man muss immer umkehren”
- Another curriculum learning paper where they start near the goal and work backwards.
- Shared Multi-Task Imitation Learning for Indoor Self-Navigation
- A Distributional Perspective on Reinforcement Learning
- Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision
- Shared Multi-Task Imitation Learning for Indoor Self-Navigation
- Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space
- CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
- GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning
- An Overview of Multi-Task Learning in Deep Neural Networks
- Learning Hierarchical Information Flow with Recurrent Neural Modules
- Venkatraman, et al. Predictive state decoders: Encoding the future into recurrent networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2017.
- At Human Speed: Deep Reinforcement Learning with Action Delay
- Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience
- Safe Reinforcement Learning with Model Uncertainty Estimates
- Curiosity-driven Exploration by Self-supervised Prediction
- Towards Governing Agent’s Efficacy: Action-Conditional β-VAE for Deep Transparent Reinforcement Learning
- Learned optimizers that outperform SGD on wall-clock and validation loss
- Reversible Recurrent Neural Networks
- Model-Based Active Exploration
- Differentiable MPC for End-to-end Planning and Control
- Toward an AI Physicist for Unsupervised Learning
- Memory-based control with recurrent networks, Heess et al. Meta-learning
- Gu, Holly, Lillicrap ‘16 parallel NAF. Continuous action space Q learning
- Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting
- Constrained Exploration and Recovery from Experience Shaping
- Building a Winning Self-Driving Car in Six Months
- QUOTA: The Quantile Option Architecture for Reinforcement Learning
- Efficient Eligibility Traces for Deep Reinforcement Learning
- Papers that cite World Models
- Flatland: a Lightweight First-Person 2-D Environment for Reinforcement Learning
- Looks interesting, says code will be available at some point.
- Guiding Policies with Language via Meta-Learning
- Learning Actionable Representations with Goal-Conditioned Policies
- Autoencoding beyond pixels using a learned similarity metric
- Randomized Prior Functions for Deep Reinforcement Learning
- An Introduction to Deep Reinforcement Learning
- Retrieving from a large memory:
- Adapting Auxiliary Losses Using Gradient Similarity
- RUDDER: Return Decomposition for Delayed Rewards
- Learning To Simulate
- Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks
- Self-supervised Learning of Image Embedding for Continuous Control
- AlphaStar
- The Value Function Polytope in Reinforcement Learning
- A Geometric Perspective on Optimal Representations for Reinforcement Learning
- Finding better representations. Follow on to previous paper.
- Task2Vec: Task Embedding for Meta-Learning
- Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
- World Discovery Models
- From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
- Continual Learning with Tiny Episodic Memories
- Using Natural Language for Reward Shaping in Reinforcement Learning
- Assessing Generalization in Deep Reinforcement Learning
- Inductive transfer with context-sensitive neural networks
- David Silver, adding context to multi-task learning, 2008.
- Reinforced Imitation in Heterogeneous Action Space
- Reinforcement Learning with Attention that Works: A Self-Supervised Approach
- Gershman, S.J. and Daw, N.D. (2017) Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128
- Meta-learning of Sequential Strategies
- SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
- Robustness to Out-of-Distribution Inputs via Task-Aware Generative Uncertainty
- Multi-Sample Dropout for Accelerated Training and Better Generalization
- Learning Powerful Policies by Using Consistent Dynamics Model
- Add an auxiliary task to the learned model that penalizes errors in future predictions.
- Soft Actor-Critic Algorithms and Applications
- Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future
- Add an auxiliary task to predict the far future.
- Includes use in imitation learning
- Unsupervised Learning of Object Keypoints for Perception and Control
- Real-Time Freespace Segmentation on Autonomous Robots for Detection of Obstacles and Drop-Offs
- Dynamics-aware Embeddings
- A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots
- Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
- Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients
- A Mobile Manipulation System for One-Shot Teaching of Complex Tasks in Homes
- OpenAI’s Automatic Domain Randomization on the DonkeyCar simulator?
- Blog
- Paper
- Start with a single, easy environment in sim. When performance plateaus, increase the range of simulated features. E.g. increase range of friction, or weight of car, or size/color of lane markings.
- They used Embed + Sum on the inputs so they didn’t need to change the policy between sim and real.
- They used Policy cloning (and DAgger?) to train a new policy from an older one, e.g. if the policy architecture did change. Section 6.4.
- Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
- Learning to Predict Without Looking Ahead: World Models Without Forward Prediction
- Emergent Communication with World Models
- Word2vec to behavior: morphology facilitates the grounding of language in machines
- DeepRacer
- CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
- Eliminates the need for a target network?
- Optimizing agent behavior over long time scales by transporting value
- Looking back over episodic memory
- Code
- Reinforcement Learning Upside Down: Don’t Predict Rewards – Just Map Them to Actions
- Training Agents using Upside-Down Reinforcement Learning
- A Simple Randomization Technique for Generalization in Deep Reinforcement Learning
- Prioritized Sequence Experience Replay
- RTFM: Generalising to New Environment Dynamics via Reading
- FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
- Gradient Surgery for Multi-Task Learning
- Q-Learning in enormous action spaces via amortized approximate maximization
- Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning
- Reinforcement Learning with Convolutional Reservoir Computing
- Neuroevolution of Self-Interpretable Agents
Not as relevant to MaLPi, but interesting
- Explaining and Harnessing Adversarial Examples
- Beating Atari with Natural Language Guided Reinforcement Learning
- A Neural Representation of Sketch Drawings
- Hybrid computing using a neural network with dynamic external memory
- Bayesian Recurrent Neural Networks
- ML for analyzing unix log files ss Concrete Dropout
- Bayesian Reinforcement Learning: A Survey
- Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation
- Meta learning Framework for Automated Driving
- Representation Learning for Grounded Spatial Reasoning
- Instruction text -> LSTM -> vectors 1 and 2
- V1 is used as a kernel in a convolution over the state space object embeddings (hand built?)
- V2 is used to make a global map representation of the input
- both outputs are concatenated and input to a CNN to predict the final map value
- Early Stage Malware Prediction Using Recurrent Neural Networks
- Hwang J, Jung M, Madapana N, et al. Achieving “synergy” in cognitive behavior of humanoids via deep learning of dynamic visuo-motor-attentional coordination. Humanoid Robots (Humanoids), 2015 IEEE-RAS 15th International Conference on; Seoul. 2015. p. 817-824.
- Combined human gesture recogniztion, attention, object detection and grasping.
- Arxiv page
- Deep Mixture Density Network (MDN)
- “MDNs combine the benefits of DNNs and GMMs (Gaussian mixture model) by using the DNN to model the complex relationship between input and output data, but providing probability distributions as output”
- C. Bishop. Mixture density networks, Tech. Rep. NCRG/94/004, Neural Computing Research Group. Aston University, 1994.
- H. Zen, A. Senior. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis, ICASSP, 2014.
- Anytime Neural Networks via Joint Optimization of Auxiliary Losses
- Language Grounding for Robotics accepted papers
- Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav Gupta, Li Fei- Fei, and Ali Farhadi. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. In ICRA, 2017.
- Augmenting End-to-End Dialog Systems with Commonsense Knowledge
- Predictive representations can link model-based reinforcement learning to model-free mechanisms
- Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
- Understanding Generalization and Stochastic Gradient Descent
- Includes how to choose the best batch size for test set accuracy.
- A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs
- Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands
- Peephole: Predicting Network Performance Before Training
- Peano-HASEL actuators: Muscle-mimetic, electrohydraulic transducers that linearly contract on activation
- Hydraulically amplified self-healing electrostatic actuators with muscle-like performance
- Unsupervised Low-Dimensional Vector Representations for Words, Phrases and Text that are Transparent, Scalable, and produce Similarity Metrics that are Complementary to Neural Embeddings
- Emergent complexity via multi-agent competition
- PRNN: Recurrent Neural Network with Persistent Memory
- Convolutional Neural Networks for Sentence Classification
- Might be useful to classify task descriptions for a multi-task system.
- Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods
- Scalable Meta-Learning for Bayesian Optimization
- Learning to Play with Intrinsically-Motivated Self-Aware Agents
- An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
- Accelerated Methods for Deep Reinforcement Learning. Adam Stooke and Pieter Abbeel
- Learning and Querying Fast Generative Models for Reinforcement Learning
- Learning by Playing - Solving Sparse Reward Tasks from Scratch
- Selective Experience Replay for Lifelong Learning
- Semi-Parametric Topological Memory For Navigation
- Shifting Mean Activation Towards Zero with Bipolar Activation Functions
- Alternative to Batch Norm for normalization
- Strategic attentive writer for learning macro-actions
- The Limits and Potentials of Deep Learning for Robotics
- AutoAugment: Learning Augmentation Policies from Data
- Progress & Compress: A scalable framework for continual learning
- Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems
- Unsupervised Meta-Learning for Reinforcement Learning
- Unsupervised Learning by Competing Hidden Units
- Adaptive Neural Trees
- Combining Decision Trees and neural nets
- Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors
- Papers of the Year
- Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning
- LASER Language-Agnostic SEntence Representations
- Pre-trained multi-lingual embeddings. Possibly useful for task description embedding.
- Building Machines That Learn and Think Like People
- Learning to Understand Goal Specifications by Modelling Reward
- Investigating Generalisation in Continuous Deep Reinforcement Learning
- Hyperbolic Discounting and Learning over Multiple Horizons
- Also useful as an auxiliary task.
- Recurrent Experience Replay in Distributed Reinforcement Learning
- Stiffness: A New Perspective on Generalization in Neural Networks
- IndyLSTMs: Independently Recurrent LSTMs
- Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
- Diagnosing Bottlenecks in Deep Q-learning Algorithms
- Large-Scale Long-Tailed Recognition in an Open World
- Another use of memory, this time for situations that don’t occur enough to train on.
- Human Visual Understanding for Cognition and Manipulation – A primer for the roboticist
- Stand-Alone Self-Attention in Vision Models
- Replacing convolutions with attention in vision models.
- Learning the Arrow of Time
- Improving the robustness of ImageNet classifiers using elements of human visual cognition
- Episodic memory and shape based representations.
- When to Trust Your Model: Model-Based Policy Optimization
- Metalearned Neural Memory
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
- Hierarchical Decision Making by Generating and Following Natural Language Instructions
- AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers
- Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras
- Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments
- A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms
- Weakly Supervised Disentanglement with Guarantees
- Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning
- Regularization Matters in Policy Optimization
- Meta-Learning without Memorization
Autoencoders
Memory
Classes/Education
Simulators