An Overview of Multi-Task Learning in Deep Neural Networks

Published: August 17, 2017

Older ideas

Hard sharing
One or more layers are shared between tasks.
Soft sharing
One or more layers are constrained between tasks. e.g. with distance measures.

Deep Relationship Networks.
In addition to the fully shared layers, the non-shared fully connected layers have priors on them that lets them learn the relationships between tasks.
Fully-adaptive feature sharing.
Starts with a single network that widens during training by grouping similar tasks and giving each group separate branches of the network.
Cross-stich networks.
Learned, linear combinations of the outputs of previous layers from multiple task-specific networks.
Low supervision.
I don’t understand this one.
Joint multi-task model.
Another NLP example I don’t get.
Multi-task loss.
With the loss for each task weighted by its uncertainty.
Tensor factorisation.
Split the model parameters into shared and task-specific for each layer.
Sluice networks.
The authors genralization of multiple MTL methods.

Related task.
Lots of examples, but hard to define ‘related’.
Adversarial.
Possible it’s easier to define the opposite of the desired loss function?
Hints.
Example is “predict whether an input sentence contains a positive or negative sentiment word as auxiliary tasks for sentiment analysis.” Not sure I understand it.
Focusing attention.
Example is predicting lane markings when network might ignore them as being too small a part of the image.
Quantization smoothing.
Use a continuous version of a discrete label as an auxiliary task.
Predicting inputs.
?
Using the future to predict the present.
Representation learning.
All auxiliary tasks do this implicity, but it can be made more explicit. One example is an autoencoder objective.
What auxiliary tasks are helpful?
Still little theoretical basis for ‘task similarity’.