2. Generative Learning Algorithms

Generative Learning Algorithms

Parameters

For a specific xi feature,

(^ also but for y=0)

Difference between discriminative

Discriminative models p(y|x), whereas generative models p(x|y).

Why this is useful

We can still get our original p(y|x) by computing using Bayes theorem:

(and we don't even need to compute p(x) because we'll be trying to maximize p(y|x) and p(x) won't change)

Gaussian Discriminant Analysis

RECAP: Multivariate Normal Distribution

Why we use this: now we're looking for p(x|y=something). Since x is multidimensional, we use a multivariate normal distribution to model p(x).
When we use this (GDAs): when we have classification problem in which input features x are continuous-valued random variables.

GDA Model

The model is:

And is parameterized by a mean vector and a covariance matrix

AKA

Thus, we have the log likelihood function

Maximizing l wrt to parameters, we have

Relationship to logistic regression

Logistic regression also takes in x (continuous valued random variables) and classifies y. So, when is one model better than the other?

GDA vs Logistic Regression

If p(x|y) is multivariate gaussian, then it follows that p(y|x) is a logistic function. HOWEVER, the converse does not hold. Thus, the GDA makes stronger modeling assumptions about the data so will better fit the model.
- Specifically, when p(x|y) is indeed gaussian (with shared covariance matrix), then GDA is asymptotically efficient.
- So, when p(x|y) is not gaussian, then logistic reg is more data efficient (requires less training to do "well")

Multi-variate Bernoulli event model
(Naive Bayes)

Overview

Why we use this: if we have a massive feature vector, unreasonable to use GDA bc you have shitton of parameters to calculate.
When we use this: When we have feature vectors that are discrete (or we can discretize the features) and we have a classification problem.

How phis are modified (multi-variate)