Machine Learning - Spring 2004 ============================== Lab experiments 7 ----------------- Programs: id3.pl, lgg.pl, search.pl Data: loan23.pl, loan13.pl, animal23.pl animal13.pl --------------------------------------------------- 1. EXPERIMENTS WITH HOLDOUT (2/3 for training, 1/3 for testing) FOR A TWO-CLASS PREDICTION ------------------------------------------------------------------------------------------ The loandata set (12 examples) is split into two disjoint sets: - training set of 2/3 (8 examples stored in loan23.pl). - test set of 1/3 (4 examples stored in loan13.pl). ?- ['c:/prolog/id3.pl']. % load program ?- ['c:/prolog/loan23.pl']. % load training set. ?- listing(example). % These are 2/3 of the loandata examples. example(1, approve, [emp=yes, buy=comp, sex=f, married=no]). example(11, reject, [emp=no, buy=comp, sex=m, married=yes]). example(3, approve, [emp=yes, buy=comp, sex=m, married=no]). example(4, approve, [emp=yes, buy=car, sex=f, married=yes]). example(2, reject, [emp=no, buy=comp, sex=f, married=yes]). example(6, approve, [emp=yes, buy=comp, sex=f, married=yes]). example(7, approve, [emp=yes, buy=comp, sex=f, married=no]). example(8, approve, [emp=yes, buy=comp, sex=m, married=no]). Now, let's create a decision tree using the training set. ?- id3. These are the rules corresponding to the tree: ?- listing(if). if[emp=no]then reject. if[emp=yes]then approve. Let's load now in the database the test set loan13.pl. Note that this actually replaces the previous 8 examples (from loan23.pl) with the 4 ones contained in loan13.pl. ?- ['c:/prolog/loan13.pl']. Veryfying the contents of the database. ?- listing(example). example(9, approve, [emp=yes, buy=comp, sex=m, married=yes]). example(10, approve, [emp=yes, buy=comp, sex=m, married=yes]). example(5, reject, [emp=yes, buy=car, sex=f, married=no]). example(12, reject, [emp=no, buy=car, sex=f, married=yes]). Currently in the Prolog database we have: - A hypothesis (2 rules) created from the training set of 8 examples (2/3 of the original loandata set). - A test set of 4 examples (1/3 of the original loandata set). We now want to evaluate the classification error of the hypothesis on the test data. This will be the observed (sample) error. We'll do this by examining the coverage of the rules. ?- if H then C, model(H,M). H = [emp=no] C = reject M = [12] ; H = [emp=yes] C = approve M = [9, 10, 5] The first rule ([emp=no]) predicts class "reject" for example 12. Looking above at the set of examples, we see that the actual class of example 12 is "reject". So, this prediction is correct. The second rule ([emp=yes]) predicts "approve" for examples 9, 10, 5. The actual class of 9 and 10 is "approve", but the class of 5 is "reject". So, this rule has one wrong prediction. In total (for the whole hypothesis consisting of two rules) we have 1 error out of 4 predictions. So, the error rate is 1/4 = 25%. We can collect all this information with one single query: ?- if H then Predicted,model(H,M),member(E,M),example(E,Actual,_),write(Predicted-Actual),nl,fail. reject-reject approve-approve approve-approve approve-reject The first column shows the predicted classes and the second one - the actual. In three rows the predicted and the actuall values are the same, in one - they are different. So, we have 1/4 error rate. 2. COUNTING THE COST -------------------- We can look further into the error rate for each class. The rule predicting "reject" is 100% correct, but the one predicting "approve" is 2/3 correct. Taking into account the real meaning of these rules we may want the inverted situation, because approving a bank customer who had to be rejected might be costly for the bank. That is why we consider this approach to evaluating the hypothesis as "Counting the cost" (see the lecture notes, Section 10). To make the cost of the hypothesis easier to estimate we put these counts in a 2X2 matrix called "confusion matrix". Thus we have four counts as follows: 2 True Positive (TP) - actual "approve" and predicted "approve". 1 False Positive (FP) - actual "reject" and predicted "approve". 0 False Negative (FN) - actual "approve" and predicted "reject". 0 True Negative (TN) - actual "reject" and predicted "reject". Confision matrix Actual \ Predicted | approve | reject | --------------------|---------|--------| approve | 2 (TP) | 0 (FN) | --------------------|---------|--------| reject | 1 (FP) | 1 (TN) | --------------------|---------|--------| Then the Sample Error is: (# of wrong predictions)/(total # of predictions) = (FP+FN)/(TP+FP+TN+FN) = 1/4 3. EXPERIMENTS WITH HOLDOUT FOR A MULTI-CLASS PREDICTION -------------------------------------------------------- How can we evaluate the cost of a hypothesis that predicts more than two classes? Let's do another example. Repeat the experiments from item 1 with the animals data (they have 5 different class labels). ?- ['c:/prolog/animal23.pl']. % load the training set - 2/3 of the original animals data set ?- id3. ?- listing(if). if[has_covering=none]then amphibian. if[has_covering=feathers]then bird. if[has_covering=scales, gills=f]then reptile. if[has_covering=scales, gills=t]then fish. if[has_covering=hair]then mammal. ?- ['c:/prolog/animal13.pl']. % load the test set - 1/3 of the original animals data set ?- if H then Predicted,model(H,M),member(E,M),example(E,Actual,_),write(Predicted-Actual),nl,fail. amphibian-mammal bird-bird reptile-reptile mammal-mammal NOTE an interesting fact. We have five rules above, but four pairs of "predicted - actual" classes. The "amphibian" class is missing (Why?). The important thing however is that all four examples are covered (we have 4 pairs). With these results we may create a number of confusion matrices: - Four 2X2 matrices. For each one we select one class and put the other three in a single contrasting class. For example, the matrix for "mammal" and "non-mammal" (including "bird", "amphibian" and "reptile") is the following: Confision matrix for "mammal" Actual \ Predicted | mammal | non-mammal | -------------------|---------|------------| mammal | 1 (TP) | 1 (FN) | -------------------|---------|------------| non-mammal | 0 (FP) | 2 (TN) | -------------------|---------|------------| Sample Error = (FP+FN)/(TP+FP+TN+FN) = 1/4 - One 4 X 4 matrix. Here we don't have TP's, FP's, FN's and TN's - they apply for two-class problems only. Each cell in the matrix represents the count of the corresponding pair of predicted-actual classes. Actual \ Predicted | amphibian | bird | reptile | mammal | -------------------|------------|------|---------|--------| amphibian | 0 | 0 | 0 | 0 | -------------------|------------|------|---------|--------| bird | 0 | 1 | 0 | 0 | -------------------|------------|------|---------|--------| reptile | 0 | 0 | 1 | 0 | -------------------|------------|------|---------|--------| mammal | 1 | 0 | 0 | 1 | -------------------|------------|------|---------|--------| What we would like to have in this matrix is non-zero counts in the diagonal cell only, because they are the correct classifications. All other cells are misclassifications. Thus the Sample Error would be (total in the non-diagonal cells)/(total in all cells) = 1/4 Of course, this is the same value as we got from the 2X2 matrix. 4. USING CLOSED WORLD ASSUMPTION (CWA) -------------------------------------- Let's repeat the experiments form item 3 with another learning algorithm. ?- ['c:/prolog/lgg.pl']. % load program ?- ['c:/prolog/animal23.pl']. % load training set. ?- lrn. ?- listing(if). if[has_covering=hair, milk=t, homeothermic=t, gills=f]then mammal. if[has_covering=scales, milk=f, homeothermic=f, habitat=sea, eggs=t, gills=t]then fish. if[has_covering=scales, milk=f, homeothermic=f, habitat=land, eggs=t, gills=f]then reptile. if[has_covering=feathers, milk=f, homeothermic=t, habitat=air, eggs=t, gills=f]then bird. if[has_covering=none, milk=f, homeothermic=f, habitat=land, eggs=t, gills=f]then amphibian. ?- ['c:/prolog/animal13.pl']. % load test set. ?- listing(example). example(1, mammal, [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f]). example(2, mammal, [has_covering=none, milk=t, homeothermic=t, habitat=sea, eggs=f, gills=f]). example(7, reptile, [has_covering=scales, milk=f, homeothermic=f, habitat=sea, eggs=t, gills=f]). example(9, bird, [has_covering=feathers, milk=f, homeothermic=t, habitat=land, eggs=t, gills=f]). ?- if H then Predicted,model(H,M),member(E,M),example(E,Actual,_),write(Predicted-Actual),nl,fail. mammal-mammal Note that the query that produced the predictions of id3.pl for all test examples in loan13.pl (item 1) does not work here. We have just one example covered. Let's modify the query to see the example ID. ?- if H then Predicted,model(H,M),member(E,M),example(E,Actual,_),write(E-Predicted-Actual),nl,fail. 1-mammal-mammal Ok, this is example 1. So, it appears that the current hypothesis created by lgg.pl from the training set animal23.pl is too specific and covers only one example from the test set (this just confirms the lgg's bias towards overspecialization). To see this we may just print the models of all rules as follows: ?- if H then Predicted,model(H,M). H = [has_covering=hair, milk=t, homeothermic=t, gills=f] Predicted = mammal M = [1] ; H = [has_covering=scales, milk=f, homeothermic=f, habitat=sea, eggs=t, gills=t] Predicted = fish M = [] ; H = [has_covering=scales, milk=f, homeothermic=f, habitat=land, eggs=t, gills=f] Predicted = reptile M = [] ; H = [has_covering=feathers, milk=f, homeothermic=t, habitat=air, eggs=t, gills=f] Predicted = bird M = [] ; H = [has_covering=none, milk=f, homeothermic=f, habitat=land, eggs=t, gills=f] Predicted = amphibian M = [] ; How can we classify the rest of the examples in the training set (2,7,9)? The solution is to use the Closed World Assumption (CWA). It states that what is not covered by the hypothesis belongs to the contrasting class. Thus examples 2,7 and 9 belong to class "non-mammal" and we can put this in the following confusion matrix. Confision matrix -------------------|---------|------------| Actual \ Predicted | mammal | non-mammal | -------------------|---------|------------| mammal | 1 (TP) | 1 (FN) | -------------------|---------|------------| non-mammal | 0 (FP) | 2 (TN) | -------------------|---------|------------| Sample Error = (FP+FN)/(TP+FP+TN+FN) = 1/4 5. EVALUATING HYPOTHESES WITH LEAVE ONE OUT CROSS-VALODATION (LOO CV) -------------------------------------------------------------------- For this experiment we are going to use the whole data set loandata.pl the ID3 algorithm. ?- ['c:/prolog/id3.pl']. % load program ?- ['c:/prolog/loandata.pl']. % load data set. ?- id3. ?- listing(if). % this is the hypothesis created by ID3 if[emp=no]then reject. if[emp=yes, buy=car, married=no]then reject. if[emp=yes, buy=car, married=yes]then approve. if[emp=yes, buy=comp]then approve. Let's see how another form of the "model" procedure works. It produces the set of examples for a given predicted class value. ?- model(approve,M). M = [1, 3, 4, 6, 7, 8, 9, 10] ?- model(reject,M). M = [2, 5, 11, 12] This is obviously a good hypothesis, because the actual class of all examples from each list is the same as the predicted one. Also, it's a complete hypothesis as the union of the two lists is the whole set of examples. Now, the idea is to leave out one example, build a hypothesis without it and then put the example back and verify whether it is covered by the hypothesis. If so, it's correctly predicted. Then doing this for each example in the data set and totaling the wrong predictions we get the LOO CV error rate. One way of doing this with the loandata would be to create 12 data files each one with one example left out. These will be used for training and the whole dataset for testing. However there is an easier way. Prolog provides tools to modify dynamically its database. To use these tools however we have to set a parameter. This is the declaration: :- dynamic(example/3). It has to be included in loandata.pl before the first example. So, assume we've done this and then we reload loandata.pl. ?- ['c:/prolog/loandata.pl']. % reload data set with :-dynamic(example/3) The following query first deletes an example ("retract(example(E,C,L))"), id3 then creates a hypothesis without it, "assert(example(E,C,L))" puts back the example in the database and finally "model(C,M)" computes the model for the examples' class. On backtracking (entering ; for alternative solutions) the examples are picked in the order they are loaded. ?- retract(example(E,C,L)),id3,assert(example(E,C,L)),model(C,M). E = 1 C = approve L = [emp=yes, buy=comp, sex=f, married=no] M = [1, 3, 4, 6, 7, 8, 9, 10] ; E = 2 C = reject L = [emp=no, buy=comp, sex=f, married=yes] M = [2, 5, 11, 12] ; E = 3 C = approve L = [emp=yes, buy=comp, sex=m, married=no] M = [1, 3, 4, 6, 7, 8, 9, 10] ; E = 4 C = approve L = [emp=yes, buy=car, sex=f, married=yes] M = [1, 3, 6, 7, 8, 9, 10] ; E = 5 C = reject L = [emp=yes, buy=car, sex=f, married=no] M = [2, 11, 12] ; E = 6 C = approve L = [emp=yes, buy=comp, sex=f, married=yes] M = [1, 3, 4, 6, 7, 8, 9, 10] ; E = 7 C = approve L = [emp=yes, buy=comp, sex=f, married=no] M = [1, 3, 4, 6, 7, 8, 9, 10] ; E = 8 C = approve L = [emp=yes, buy=comp, sex=m, married=no] M = [1, 3, 4, 6, 7, 8, 9, 10] ; E = 9 C = approve L = [emp=yes, buy=comp, sex=m, married=yes] M = [1, 3, 4, 6, 7, 8, 9, 10] ; E = 10 C = approve L = [emp=yes, buy=comp, sex=m, married=yes] M = [1, 3, 4, 6, 7, 8, 9, 10] ; E = 11 C = reject L = [emp=no, buy=comp, sex=m, married=yes] M = [2, 5, 11, 12] ; E = 12 C = reject L = [emp=no, buy=car, sex=f, married=yes] M = [2, 5, 11, 12] How do we interpret this output? If the example occurs in the list, then it's been predicted correctly, if not, we have an error. Checking each of the examples above we see that examples 4 and 5 do not occur in their respective lists, which means that they are not predicted correctly. So, we have 2 errors out of 12 predictions and the LOO CV error is 2/12 = 0.167. Finally, here are some improvements of the last query, which make counting of errors easier. ?- retract(example(E,C,L)),id3,assert(example(E,C,L)),model(C,M),\+ member(E,M),write(E),nl,fail. 4 5 This query prints the examples, which are predicted correctly. The following one prints all examples with an indication if they are predicted correctly (ok), or incorrectly (notok). ?- retract(example(E,C,L)),id3,assert(example(E,C,L)),model(C,M),write(E),(member(E,M),write(ok);\+ member(E,M),write(notok)),nl,fail. 1ok 2ok 3ok 4notok 5notok 6ok 7ok 8ok 9ok 10ok 11ok 12ok Now you may be curious to know how the LOO CV error of lgg.pl compares to the one of id3.pl. We need to replace the algorithm. ?- ['c:/prolog/lgg.pl']. And the goal in the query that execuites it (lrn instead of id3). ?- retract(example(E,C,L)),lrn,assert(example(E,C,L)),model(C,M),write(E),(member(E,M),write(ok);\+ member(E,M),write(notok)),nl,fail. 1ok 2ok 3ok 4notok 5notok 6ok 7ok 8ok 9ok 10ok 11notok 12notok Thus, the LOO error here is 4/12 = 0.333. Obviously ID3 performes better. Note that there is a little trick here. If you run the above query with the original algorithm you get a lot of output. Just comment the following line in lgg.pl to avoid it (it's at the end of the code): writel([]) :- !. writel([example(ID,Class,List)|T]) :- % write(example(ID,Class,List)),nl, assertz(if List then Class), writel(T). If you use search.pl for LOO CV you may want to prevent it from printing while learning the hypothesis. This can be done by commenting (putting % in the beginning of the line) all lines that contain "write".