Stanford CS 229 Machine Learning - Lecture 3

Probablistic Interpretation of Linear Regression

Why minimize sum of squares of errors?

Actual value = Predicted value + Error Terms

The Error Terms inc ludes "unmodeled feature affects" and "random noise" and are assumed to be independent of each other and randomly distributed.

Locally Weighted Regression

More weight given to the training cases near the case to be predicted. Since model is based on the case to be predicted, the training set must be available during prediction.

Under and Over Fitting

Under-fitting: Less parameters than needed for the model (e.g., linear model when the model needed is quadratic)

Over-fitting: More parameters than needed for the model (e.g., cubic model when the model needed is quadratic)

Model Types

Fixed Parameter Models:

Variable Parameter Models:

Classification: Value to be predicted is discrete (e.g., patient has a disease or not)

Regression: Value to be predicted is continuous (e.g., value of house)

Classification

Use sigmoid function 

Learning Algorithms

Parametric Learning Algorithms use fixed number of parameters -- e.g., linear regression.

Non-parametric Learning Algorithms employ variable number of parameters -- they grow with the size of the data. Usually, they need the training data all the time. E.g., Locally weighted regression. Advantage is we don't need to decide on the features/parameters in advance?

   Login to remove ads X
Feedback | How-To