Deep learning of seman/cs for natural language
Machine Learning, AI & No Free Lunch
Bypassing the curse of dimensionality
Progress in Deep Learning Theory
Exponential advantage of distributed representations
Exponential advantage of distributed representations
Exponential advantage of depth
Exponential advantage of depth
A Myth is Being Debunked: Local Minima in Neural Nets
Saddle Points
Why N-grams have poor generalization
Neural Language Models: fighting one exponential by another one!
The Next Challenge: Rich Semantic Representations for Word Sequences
Attention Mechanism for Deep Learning
Applying an attention mechanism to
End-to-End Machine Translation
2014: The Year of Neural Machine Translation Breakthrough
Encoder-Decoder Framework
Bidirectional RNN for Input Side
Attention: Many Recent Papers
Soft-Attention vs Stochastic Hard-Attention
Attention-Based Neural Machine Translation
Predicted Alignments
En-Fr & En-De Alignments
Improvements over Pure AE Model
End-to-End Machine Translation with Recurrent Nets and Attention Mechanism
IWSLT 2015 – Luong & Manning (2015) TED talk MT, English-German
Image-to-Text: Caption Generation with Attention
Paying Attention to Selected Parts of the Image While Uttering Words
Speaking about what one sees
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
The Good
And the Bad
Interesting extensions
Multi-Lingual Neural MT with Shared Attention Mechanism
Multi-Lingual Neural MT with Shared Attention Mechanism
Character-Based Models
Experiments on Character-Based NMT
Experiments on Character-Based NMT
Attention Mechanisms for Memory Access
Large Memory Networks: Sparse Access Memory for Long-Term Dependencies
Delays & Hierarchies to Reach Farther
Ongoing Project: Knowledge Extraction
The Next Big Challenge: Unsupervised Learning
Conclusions
• Theory for deep learning has progressed substanFally on several fronts: why it generalizes beder, why local minima are not the issue people thought, and the probabilisFc interpretaFon of deep unsupervised learning.
• AdenFon mechanisms allow the learner to make a selecFon, sol or hard
• They have been extremely successful for machine translaFon and capFon generaFon
• They could be interesFng for speech recogniFon and video, especially if we used them to capture mulFple Fme scales
• They could be used to help deal with long-term dependencies, allowing some states to last for arbitrarily long
• AdenFon mechanisms allow the learner to make a selecFon, sol or hard
• They have been extremely successful for machine translaFon and capFon generaFon
• They could be interesFng for speech recogniFon and video, especially if we used them to capture mulFple Fme scales
• They could be used to help deal with long-term dependencies, allowing some states to last for arbitrarily long