Tensorizing LSTMs
Published:
First paper
Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning
we introduce a way to both widen and deepen the LSTM whilst keeping the parameter number and runtime largely unchanged.
I wanted to quote this directly because it sums up why I’m interested in these papers.
This paper’s three novel contributions are:
- Tensorize hidden state vectors into higher dimensional tensors.
- Merge RNN deep computations into its temporal computations.
- Integrate a new memory cell convolution when extending the previous two to LSTMs.
Two more related papers
- Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition. BTD is a combination of:
- CP decomposition: J. D. Carroll and J.-J. Chang. Analysis of individual dif- ferences in multidimensional scaling via an n-way gener- alization of eckart-young decomposition. Psychometrika, 35(3):283–319, 1970.
- Tucker decomposition: L.R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279–311, 1966.
- Tensorizing Neural Networks (referenced from Wider and Deeper…)
- Matlab implementation of a “Tensor Train” library: TT-Toolbox
Conclusion
I don’t know nearly enough of the relevant math to implement this idea myself.
More papers
- Luca Bertinetto, João F Henriques, Jack Valmadre, Philip Torr, and Andrea Vedaldi. Learning feed-forward one-shot learners. In NIPS, 2016
- Misha Denil, Babak Shakibi, Laurent Dinh, Nando de Freitas, et al. Predicting parameters in deep learning. In NIPS, 2013.
- Timur Garipov, Dmitry Podoprikhin, Alexander Novikov, and Dmitry Vetrov. Ultimate tensorization: compressing convolutional and fc layers alike. In NIPS Workshop, 2016.
- Ozan Irsoy and Claire Cardie. Modeling compositionality with multiplicative recurrent neural networks. In ICLR, 2015.
- Ben Krause, Liang Lu, Iain Murray, and Steve Renals. Multiplicative lstm for sequence modelling. In ICLR Workshop, 2017.
- Alexander Novikov,Dmitrii Podoprikhin,Anton Osokin, and Dmitry P Vetrov. Tensorizing neural networks. In NIPS, 2015. https://arxiv.org/abs/1509.06569
- Ilya Sutskever, James Martens, and Geoffrey E Hinton. Generating text with recurrent neural networks. In ICML, 2011.
- Graham W Taylor and Geoffrey E Hinton. Factored conditional restricted boltzmann machines for modeling motion style. In ICML, 2009.
- Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, and Ruslan Salakhutdinov. On multiplicative integration with recurrent neural networks. In NIPS, 2016.
- A. Novikov, D. Podoprikhin, A. Osokin, and D. P. Vetrov. Tensorizing neural networks. In Advances in Neural Infor- mation Processing Systems, pages 442–450, 2015.
- A. Tjandra, S. Sakti, and S. Nakamura. Compressing recurrent neural network with tensor train. arXiv preprint arXiv:1705.08052, 2017.
- Y. Yang, D. Krompass, and V. Tresp. Tensor-train recurrent neural networks for video classification. arXiv preprint arXiv:1707.01786, 2017.
- R. Yu, S. Zheng, A. Anandkumar, and Y. Yue. Long-term forecasting using tensor-train rnns. arXiv preprint arXiv:1711.00073, 2017.