Data mining algorithms: Prediction

The prediction task

  1. Supervised learning task where the data are used directly (no explicit model is created) to predict the class value of a new instance.
  2. Basic approaches:

Statistical modeling

  1. Basic assumptions
  2. Probabilities of weather data

    outlook temp humidity windy play
    sunny hot high false no
    sunny hot high true no
    overcast hot high false yes
    rainy mild high false yes
    rainy cool normal false yes
    rainy cool normal true no
    overcast cool normal true yes
    sunny mild high false no
    sunny cool normal false yes
    rainy mild normal false yes
    sunny mild normal true yes
    overcast mild high true yes
    overcast hot normal false yes
    rainy mild high true no

  4. Bayes theorem (Bayes rule)
  5. Bayes for classification
  6. The “zero-frequency problem”
  7. Missing values
  8. Numeric attributes
  9. Discussion

Bayesian networks

  1. Basics of BN
  2. Example:
  3. Naive Bayes as a BN

Instance-based methods

  1. Distance function defines what's learned.
  2. Example: weather data
       ID  2  8  9   11
       D(X, ID)  1  2  2  2
       play  no  no yes  yes
  1. Discussion

Linear models

  1. Basic idea
  2. Classification by linear regression
  3. Discussion