Dataset
Take a large corpus of text data, and convert it into (input, output) pairs, using a rolling window of fixed length. 
- Example dataset: "the quick brown fox jumped over the lazy dog". If window size =1, windows are "the quick brown", "quick brown fox", "brown fox jumped", etc. 
Goal 
Train a distributed vector representation of words (why?). So, we'll construct a toy task to learn these representations in an unsupervised setting. 

Word2Vec

Continuous Bag-of-Words (CBOW) Model
Using the outer words, predict center word. 
- Outer word vectors are summed.
- Example: Dataset would be ([the, brown], quick), 
([quick, fox], brown), 
([brown, jumped], fox),
...

Models

Skip-Gram Model
Using center word, predict outer words.
- Example: Dataset would be 
(quick, the), 
(quick, brown), 
(brown, quick), 
(brown, fox), 
...
U Matrix (Outer words)
Outer word lookup matrix.
u0 may correspond to "aardvark", u1 to "a", un to "zebra", etc.
V Matrix (Center words)
Center word lookup matrix.
 v0 may correspond to "aardvark", v1 to "a", vn to "zebra", etc.
Training U & V Matrices
To train U and V matrices for all of the words in our vocabulary. 
- To compute our final word vector for each word wi, we simply sum wi = ui + vi.  
- Alternatively, we could concatenate ui  and vi.
Objective Function - Softmax Cross-Entropy Loss
For each word t = 1 ... T, predict surrounding words within the window size m. We'll minimize:

One-Hot Vectors
Vectors that contain a single 1 along a dimension, and zeros everywhere else. Example:
p(o | c) 
Uses the softmax function, which converts scores all classes into a probability distribution. 
Why Dot Product?
- Dot products are bigger when component vectors are more similar to each other. 
- Thus, this model pushes the vectors uo and vc to be similar to each other.
Objective Function - Negative Sampling Loss

- Problem with Softmax Cross-Entropy: calculating p(o | c) is expensive. Specifically, calculating the denominator of the softmax requires a pass over the entire vocabulary!
 
 

Cost Functions

Feedback | How-To