Evaluating what's been learned

Issues

  1. How predictive is the model we learned.
  2. Error on the training data is not a good indicator of performance on future data. This error may be easily reduced to 0, however we need generalizations of our data.
  3. Solution: split data into training and test set
  4. However, to create a good model we need a large training set and to evaluate its performance we need a large test set as well. So, what we really need is lots of preclassified data.
  5. We need also statistical reliability of estimated differences in performance (significance tests)
  6. Performance measures:
  7. To improve the classifier accuracy we may combining multiple models
  8. We can measure the classifier predictiveness by applying Minimum Description Length Principle (MDL).

Training and testing

  1. Error rate
  2. Testing
  3. Predicting performance (true success/error rate).

Estimating classifier accuracy

  1. Holdout
  2. Repeated holdout. Success/error estimate can be made more reliable by repeating the process with different subsamples.
  3. Cross-validation (CV). Avoids overlapping test sets.
  4. Leave-one-out cross-validation (LOO CV).
  5. Bootstrapping
  6. Counting the cost

Combining multiple models

  1. Basic idea: meta learning
  2. Bagging
  3. Boosting
  4. Stacking

Minimum Description Length Principle

Click here to read a PDF document.