1. Start off with no features. 2. Find the optimal feature to add. Add each feature, evaluate generalization error w cross validation. Pick best feature that minimizes cross validation. 3. Repeat above until you've exhausted all features.
Regularization and Model Selection
Hold-out cross validation
Randomly split S into S(train) and S(cv). Train on S(train) and test on S(cv) with various hypotheses, pick one with least generalization error.
Sucks though
Because you waste a large chunk to test on that chunk. A problem when data is scarce.
k-fold cross validation
1. Split S into k disjoint subsets of m/k training examples each. 2. For j = 1, ... k Train model on all but that jth subset, and test on that jth subset. Then, calculate avg generalization error for each hypothesis. 3. Pick hypothesis w lowest generalization error.
Cross Validation
Feature Selection
Backward search
Similar to forward, except now removing features one at a time.
Filter feature selection
Description
Frequentist vs Bayesian
Frequentist view - θ is constant-valued (not a variable) and unknown Bayesian view - θ is a random variable and unknown