Classification
Methods
- Nearest Neighbours
- CNNs: 
   - later layers increasingly abstract
   - neurons from all layers activate for relatively uniform contribution to object representation
Representations
Transfers to other tasks! i.e. if you take a trained AlexNet and use that as a feature extractor, it works well on other image recognition tasks. 
Receptive Field of  a Neuron
What does each neuron "care" about? I.e.
Method
Siamese CNNs - 1 takes in single frame to evaluate spatial stream, the other takes in multi-frame optical flow (temporal stream)
Video Processing
Frame Fusion
Take in multiple frames and conduct classification based on how you fuse the outputs from their various CNN streams. Separate frames could share CNN streams or have separate streams.

Static since no recurrent structure.
Structural-RNN
Spatio-Temporal Problems
Use RNN in place of CNN for sequential data.

Strucutural RNN takes in spatio-temporal graph (high level mappings of where objects moved)  as inputs as recursively combine them (Rec-NN).
Factor graphs
The Factor graph consists of spatial factors and temporal factors.

They look at specific object entities in the image (factors) and map how they change placement over time (temporal).
Factor Sharing
To input the factor graph into the S-RNN we factor share.

Factor Sharing: Take relationships between factors (not double counting) and parametrize those as inputs to the S-RNN. Basically take spatial and temporal edges from factor graph and make inputs.
Paper Link
Paper link
Key Takeaway
Generally, adding structure to model always HELPS!
   Login to remove ads X
Feedback | How-To