Machine Learning - Project 1

Machine Learning - Summer 2003

Project 4 - Prediction

Posted: 6/17/2003
Due: 6/23/2003

1. Estimating probabilities. Consider the following test instance for the PlayTennis data:

test(15,_,[outlook = overcast, temp = mild, humidity = low, wind = weak]).

The instance cannot be classified, because the Naive Bayes algorithm (as implemented in bayes.pl) gives zero probabilities for both hypotheses (yes and no). The situation is illustrated below:

?- test(15,_,X),probs(X,P).

X=[outlook = overcast,temp = mild,humidity = low,wind = weak]
P=[no / 0.000,yes / 0.000]

Investigate the reasons for this situation and resolve the problem by applying new approaches to estimate the conditional probability of attribute value pairs. In particular:

Use a small value to substitute the zero probability.
Use Laplace estimate.
Use the m-estimate of probability (see Mitchell - Section 6.9.1.1).

Classify the new instance applying each one of the above three approaches and analyze the results.

2. LOO evaluation of Naive Bayes and K-Nearest Neighbor. Rate the algorithms bayes.pl, knn.pl (k=1), knn.pl (k=3) and knn.pl (distance weighted with k=total number of examples) according to their performance on the data sets: animals.pl, loandata.pl and PlayTennis. Apply leave-one-out cross validation for each data set and average the results. Analyze the results.

Extra Credit (max 10% of the project grade)

Bayesian Belief Networks. Create a belief network for the PlayTennis data (see Lab experiments 8, Section III) and classify test instance 15 (see Question 1) by using the bn.pl program or MSBNx. Hint: note that "low" is a new value for "humidity" which does not occur in the training set, so its probability has to be estimated and then defined in the belief network. Include in your report: The Prolog definitions for the belief network (if you use bn.pl) or assessment windows (CPT's) for each node and evaluation results with the evidence set according to test instance 15 (if you use MSBNx).