Why minimize sum of squares of errors?
Actual value = Predicted value + Error Terms
The Error Terms inc ludes "unmodeled feature affects" and "random noise" and are assumed to be independent of each other and randomly distributed.
More weight given to the training cases near the case to be predicted. Since model is based on the case to be predicted, the training set must be available during prediction.
Under-fitting: Less parameters than needed for the model (e.g., linear model when the model needed is quadratic)
Over-fitting: More parameters than needed for the model (e.g., cubic model when the model needed is quadratic)
Fixed Parameter Models:
Variable Parameter Models:
Classification: Value to be predicted is discrete (e.g., patient has a disease or not)
Regression: Value to be predicted is continuous (e.g., value of house)
Use sigmoid function
Parametric Learning Algorithms use fixed number of parameters -- e.g., linear regression.
Non-parametric Learning Algorithms employ variable number of parameters -- they grow with the size of the data. Usually, they need the training data all the time. E.g., Locally weighted regression. Advantage is we don't need to decide on the features/parameters in advance?