- Distinction between regular CNN using full RGB-D as input: process RGB and D separately then concatenate feature representations later.
- CNN filters unsupervised training (no back-prop): patches extracted, whitened and normalized then clustered using k-means.