1. Estimating probabilities. Consider the following test instance for the PlayTennis data:
test(15,_,[outlook = overcast, temp = mild, humidity = low, wind = weak]).
The instance cannot be classified, because the Naive Bayes algorithm (as implemented in bayes.pl) gives zero probabilities for both hypotheses (yes and no). The situation is illustrated below:
?- test(15,_,X),probs(X,P).
X=[outlook = overcast,temp = mild,humidity = low,wind = weak]
P=[no / 0.000,yes / 0.000]
Investigate the reasons for this situation and resolve the problem by applying new approaches to estimate the conditional probability of attribute value pairs. In particular:
2. LOO evaluation of Naive Bayes and K-Nearest Neighbor. Rate the algorithms bayes.pl, knn.pl (k=1), knn.pl (k=3) and knn.pl (distance weighted with k=total number of examples) according to their performance on the data sets: animals.pl, loandata.pl and PlayTennis. Apply leave-one-out cross validation for each data set and average the results. Analyze the results.
Extra Credit (max 10% of the project grade)
Bayesian Belief Networks. Create a belief network for the
PlayTennis data (see Lab
experiments 8, Section III) and classify test instance 15 (see
Question 1) by using the bn.pl program or
MSBNx. Hint:
note that "low" is a new value for "humidity" which does not occur in the
training set, so its probability has to be estimated and then defined in
the belief network. Include in your report: The Prolog
definitions for the belief network (if you use bn.pl) or assessment
windows (CPT's) for each node and evaluation results with the
evidence set according to test instance 15 (if you use MSBNx).