Input features - input variables
Target variable - output you are trying to predict
Training set - list of m training examples
Hypothesis - function that given x will predict y
Regression problem - when the target variable is continuous
Classification problem - when the target variable is discrete
Parameters (weights) - weights for each x input (parameterize the space of linear functions mapping from X to Y).
Intercept term - convention of letting x0 = 1, so that our θ0x0 term becomes simply θ0.
Parametric algorithm - algorithm with fixed number of parameters (thetas), and don't need to keep around the entire data training set. So, stuff we need to keep around is unrelated to size of training set.
Let us assume that all target variables and inputs are related via:
ei an error term, which are distributed IID according to Normal distribution with mean 0 and some variance.
(with respect to theta j)
To maximize theta, pick a theta close to minima and repeat: