One or more layers are shared between tasks.
Soft sharing
One or more layers are constrained between tasks. e.g. with distance measures.Newer ideas
- Deep Relationship Networks.
In addition to the fully shared layers, the non-shared fully connected layers have priors on them that lets them learn the relationships between tasks. - Fully-adaptive feature sharing.
Starts with a single network that widens during training by grouping similar tasks and giving each group separate branches of the network. - Cross-stich networks.
Learned, linear combinations of the outputs of previous layers from multiple task-specific networks. - Low supervision.
I don’t understand this one. - Joint multi-task model.
Another NLP example I don’t get. - Multi-task loss.
With the loss for each task weighted by its uncertainty. - Tensor factorisation.
Split the model parameters into shared and task-specific for each layer. - Sluice networks.
The authors genralization of multiple MTL methods.
Auxiliary tasks
- Related task.
Lots of examples, but hard to define ‘related’. - Adversarial.
Possible it’s easier to define the opposite of the desired loss function? - Hints.
Example is “predict whether an input sentence contains a positive or negative sentiment word as auxiliary tasks for sentiment analysis.” Not sure I understand it. - Focusing attention.
Example is predicting lane markings when network might ignore them as being too small a part of the image. - Quantization smoothing.
Use a continuous version of a discrete label as an auxiliary task. - Predicting inputs.
? - Using the future to predict the present.
- Representation learning.
All auxiliary tasks do this implicity, but it can be made more explicit. One example is an autoencoder objective. - What auxiliary tasks are helpful?
Still little theoretical basis for ‘task similarity’.