Deep Learning Representations II

Classification

Methods

- Nearest Neighbours

- CNNs:

- later layers increasingly abstract

- neurons from all layers activate for relatively uniform contribution to object representation

Representations

Transfers to other tasks! i.e. if you take a trained AlexNet and use that as a feature extractor, it works well on other image recognition tasks.

Receptive Field of a Neuron

What does each neuron "care" about? I.e.

Method

Siamese CNNs - 1 takes in single frame to evaluate spatial stream, the other takes in multi-frame optical flow (temporal stream)

Video Processing

Frame Fusion

Take in multiple frames and conduct classification based on how you fuse the outputs from their various CNN streams. Separate frames could share CNN streams or have separate streams.

Static since no recurrent structure.

Structural-RNN

Spatio-Temporal Problems

Use RNN in place of CNN for sequential data.

Strucutural RNN takes in spatio-temporal graph (high level mappings of where objects moved) as inputs as recursively combine them (Rec-NN).

Factor graphs

The Factor graph consists of spatial factors and temporal factors.

They look at specific object entities in the image (factors) and map how they change placement over time (temporal).

Factor Sharing

To input the factor graph into the S-RNN we factor share.

Factor Sharing: Take relationships between factors (not double counting) and parametrize those as inputs to the S-RNN. Basically take spatial and temporal edges from factor graph and make inputs.

Paper Link

Paper link

Key Takeaway

Generally, adding structure to model always HELPS!