Word Vectors
[slides]
[notes]
Gensim word vectors example:
[code]
[preview]
|
Suggested Readings:
- Efficient Estimation of Word Representations in Vector Space (original word2vec paper)
- Distributed Representations of Words and Phrases and their Compositionality (negative sampling paper)
|
Assignment 1 out
[code]
[preview]
|
|
Word Vectors 2 and Word Window Classification
[slides]
[notes]
|
Suggested Readings:
- GloVe: Global Vectors for Word Representation (original GloVe paper)
- Improving Distributional Similarity with Lessons Learned from Word Embeddings
- Evaluation methods for unsupervised word embeddings
Additional Readings:
- A Latent Variable Model Approach to PMI-based Word Embeddings
- Linear Algebraic Structure of Word Senses, with Applications to Polysemy
- On the Dimensionality of Word Embedding
|
|
|
Python Review Session
[code]
[preview]
|
10:00am - 11:20am
|
|
|
Backprop and Neural Networks
[slides]
[notes]
|
Suggested Readings:
- matrix calculus notes
- Review of differential calculus
- CS231n notes on network architectures
- CS231n notes on backprop
- Derivatives, Backpropagation, and Vectorization
- Learning Representations by Backpropagating Errors (seminal Rumelhart et al. backpropagation paper)
Additional Readings:
- Yes you should understand backprop
- Natural Language Processing (Almost) from Scratch
|
Assignment 2 out
[code]
[handout]
|
Assignment 1 due |
Dependency Parsing
[slides]
[notes]
[slides (annotated)]
|
Suggested Readings:
- Incrementality in Deterministic Dependency Parsing
- A Fast and Accurate Dependency Parser using Neural Networks
- Dependency Parsing
- Globally Normalized Transition-Based Neural Networks
- Universal Stanford Dependencies: A cross-linguistic typology
- Universal Dependencies website
|
|
|
PyTorch Tutorial Session
[colab notebook]
[preview]
[jupyter notebook]
|
10:00am - 11:20am
|
|
|
Recurrent Neural Networks and Language Models
[slides]
[notes (lectures 5 and 6)]
|
Suggested Readings:
- N-gram Language Models (textbook chapter)
- The Unreasonable Effectiveness of Recurrent Neural Networks (blog post overview)
- Sequence Modeling: Recurrent and Recursive Neural Nets (Sections 10.1 and 10.2)
- On Chomsky and the Two Cultures of Statistical Learning
|
Assignment 3 out
[code]
[handout]
|
Assignment 2 due |
Vanishing Gradients, Fancy RNNs, Seq2Seq
[slides]
[notes (lectures 5 and 6)]
|
Suggested Readings:
- Sequence Modeling: Recurrent and Recursive Neural Nets (Sections 10.3, 10.5, 10.7-10.12)
- Learning long-term dependencies with gradient descent is difficult (one of the original vanishing gradient papers)
- On the difficulty of training Recurrent Neural Networks (proof of vanishing gradient problem)
- Vanishing Gradients Jupyter Notebook (demo for feedforward networks)
- Understanding LSTM Networks (blog post overview)
|
|
|
Machine Translation, Attention, Subword Models
[slides]
[notes]
|
Suggested Readings:
- Statistical Machine Translation slides, CS224n 2015 (lectures 2/3/4)
- Statistical Machine Translation (book by Philipp Koehn)
- BLEU (original paper)
- Sequence to Sequence Learning with Neural Networks (original seq2seq NMT paper)
- Sequence Transduction with Recurrent Neural Networks (early seq2seq speech recognition paper)
- Neural Machine Translation by Jointly Learning to Align and Translate (original seq2seq+attention paper)
- Attention and Augmented Recurrent Neural Networks (blog post overview)
- Massive Exploration of Neural Machine Translation Architectures (practical advice for hyperparameter choices)
- Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
- Revisiting Character-Based Neural Machine Translation with Capacity and Compression
|
Assignment 4 out
[code]
[handout]
[Azure Guide]
[Practical Guide to VMs]
|
Assignment 3 due |
Final Projects: Custom and Default; Practical Tips
[slides]
[notes]
|
Suggested Readings:
- Practical Methodology (Deep Learning book chapter)
|
Project Proposal out
[instructions]
Default Final Project out
[handout (IID SQuAD track)]
[handout (Robust QA track)]
|
|
Transformers (lecture by John Hewitt)
[slides]
[notes]
|
Suggested Readings:
- Project Handout (IID SQuAD track)
- Project Handout (Robust QA track)
- Attention Is All You Need
- The Illustrated Transformer
- Transformer (Google AI blog post)
- Layer Normalization
- Image Transformer
- Music Transformer: Generating music with long-term structure
|
|
|
More about Transformers and Pretraining (lecture by John Hewitt)
[slides]
[notes]
|
Suggested Readings:
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
-
Contextual Word Representations: A Contextual Introduction
- The Illustrated BERT, ELMo, and co.
|
Assignment 5 out
[code]
[handout]
|
Assignment 4 due |
Question Answering (guest lecture by Danqi Chen)
[slides]
|
Suggested Readings:
-
SQuAD: 100,000+ Questions for Machine Comprehension of Text
-
Bidirectional Attention Flow for Machine Comprehension
-
Reading Wikipedia to Answer Open-Domain Questions
-
Latent Retrieval for Weakly Supervised Open Domain Question Answering
-
Dense Passage Retrieval for Open-Domain Question Answering
-
Learning Dense Representations of Phrases at Scale
|
|
Project Proposal due |
Natural Language Generation (lecture by Antoine Bosselut)
[slides]
|
Suggested readings:
-
The Curious Case of Neural Text Degeneration
-
Get To The Point: Summarization with Pointer-Generator Networks
-
Hierarchical Neural Story Generation
-
How NOT To Evaluate Your Dialogue System
|
|
|
|
|
Project Milestone out
[instructions]
|
Assignment 5 due |
Reference in Language and Coreference Resolution
[slides]
|
Suggested readings:
-
Coreference Resolution chapter of Jurafsky and Martin
-
End-to-end Neural Coreference Resolution.
|
|
|
T5 and large language models: The good, the bad, and the ugly (guest lecture by Colin Raffel)
[slides]
|
Suggested readings:
-
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
|
|
|
Integrating knowledge in language models (lecture by Megan Leszczynski)
[slides]
|
Suggested readings:
-
ERNIE: Enhanced Language Representation with Informative Entities
-
Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling
-
Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model
-
Language Models as Knowledge Bases?
|
|
Project Milestone due |
Social & Ethical Considerations in NLP Systems (guest lecture by Yulia Tsvetkov)
[slides]
|
|
|
|
Model Analysis and Explanation (lecture by John Hewitt)
[slides]
|
|
|
|
Future of NLP + Deep Learning (lecture by Shikhar Murty)
[slides]
|
|
|
|
|
|
Project Summary Image and Paragraph out
[instructions]
|
|
Ask Me Anything / Final Project Assistance |
|
|
Project due
[instructions]
|
Final Project Emergency Assistance
|
|
|
|
|
|
|
Project Summary Image and Paragraph due
|