Machine Learning - Spring 2004 ============================== Lab experiments 6 ----------------- Programs: ilp.pl, foil.pl Data: path.ilp, path.fol, kinship.ilp, kinship.fol -------------------------------------------------- 1. Lgg-based relational induction (RLGG) --------------------------------------- ?- ['c:/prolog/ilp']. Warning: (c:/prolog/ilp:22): Singleton variables: [S1, S2] ... (a lot more warnings appear here, but these are only reminders that some variables are not used. Usually in these cases programmers use anounymous variables ("_"), but some use normal variables just to make the code clearer.) ... ?- ['c:/prolog/path.ilp']. ?- examples(E). E = [+path(2, 5), +path(1, 4), +path(1, 3), +path(1, 5), +path(2, 4), +path(1, 2), +path(3, 4), +path(..., ...), +...|...] These are examples for the target predicate "path", which describe all paths in a directed graph. If you want to see them all open the file "path.ilp" with a text editor, or use "write" as shown below. ?- examples(E), write(E). [+path(2, 5), +path(1, 4), +path(1, 3), +path(1, 5), +path(2, 4), +path(1, 2), +path(3, 4), +path(2, 3), +path(3, 5), -path(1, 1), -path(2, 1), -path(2, 2), -path(3, 1), -path(3, 2), -path(3, 3), -path(4, 1), -path(4, 2), -path(4, 3), -path(4, 4), -path(4, 5), -path(5, 1), -path(5, 2), -path(5, 3), -path(5, 4), -path(5, 5)] E = [+path(2, 5), +path(1, 4), +path(1, 3), +path(1, 5), +path(2, 4), +path(1, 2), +path(3, 4), +path(..., ...), +...|...] We also have background knowledge (BK) given similarly as a list of ground facts (facts without variables). These facts describe the links (edges) of the directed graph. ?- bg_model(BK). BK = [link(1, 2), link(2, 3), link(3, 4), link(3, 5)] The "bg_model" fact should be included in the database if we need background knowledge. The "examples" fact is used for convenience only. We put the examples in a list, so that they can be easily passed to the learning algorithm (induce_rlgg). We do this as folllows: ?- examples(E),induce_rlgg(E,Clauses). RLGG of path(1, 2) and path(3, 4) is path(A, B) :- link(A, B). Covered example: path(1, 2) Covered example: path(3, 4) Covered example: path(2, 3) Covered example: path(3, 5) RLGG of path(1, 3) and path(2, 4) is path(A, B) :- path(C, B), path(A, C). Covered example: path(1, 3) Covered example: path(2, 4) Covered example: path(2, 5) Covered example: path(1, 4) Covered example: path(1, 5) E = [+path(1, 2), +path(3, 4), +path(2, 3), +path(3, 5), +path(1, 3), +path(2, 4), +path(2, 5), +path(..., ...), +...|...] Clauses = [ (path(_G2657, _G2658):-[path(_G2730, _G2658), path(_G2657, _G2730)]), (path(_G792, _G793):-[link(_G792, _G793)])] To avoid the strange variable names in the returned list of clauses use the "portray_clause" procedure. So, if you add "portray_clause(Clauses)" to the query above, you'll see the list as follows: [ (path(A, B):-[path(C, B), path(A, C)]), (path(D, E):-[link(D, E)])]. (Of course, the standard Prolog answer will be still there.) The "induce_rlgg" algorithm works as described in Lecture Notes Chapter 8, Section 2. While working it prints the current clauses found and their coverage. In the end it returns in the variable specified as a second parameter of "induce_rlgg" the list of clauses (rules), which has been found for the tagret predicate (path). Note that there is a difference in the definition found by the algortihm and the one, described in Section 2. This is because both definitions are correct with respect to BK. The algorithm just stops after finding the first one and does not look for all solutions. To see other results that show the sensitivity of the algorith to example ordering and negative examples try reordering positive examples or removing negatives. (The positives and the negatives are processed separately, so the way they are mixed does not matter.) Let's try now another dataset - kinship.ilp. This file includes a part of the family relationships, shown in kinship.pdf. The set of examples define the relation "child". The goal is to find three clauses that define three cases of being a child, involving the relations "father", "mother", "husband" and "child". ?- ['c:/prolog/kinship.ilp']. ?- examples(E),induce_rlgg(E,Clauses). RLGG of child(james, andrew) and child(arthur, christopher) is child(A, B) :- father(B, A). Covered example: child(james, andrew) Covered example: child(arthur, christopher) RLGG of child(jennifer, christine) and child(victoria, penelope) is child(A, B) :- mother(B, A). Covered example: child(jennifer, christine) Covered example: child(victoria, penelope) RLGG of child(james, christine) and child(arthur, penelope) is child(A, B) :- husband(C, B), child(A, C). Covered example: child(james, christine) Covered example: child(arthur, penelope) Let's now investigate how the negative examples affect the performance of RLGG. Remove from the file (or comment it) the last negative example: -child(arthur, margaret) and run RLGG again. The algorithm will build the same first two clauses. However, the third one will be different (note the new variable D): child(A, B) :- husband(C, B), child(A, D). The explanation is that this clause covers the removed negative example. That is, instantiating the head, we may find the body literals in BK or in the positive examples (variables C and D are free and can match any constant). So, we have: child(arthur, margaret) :- husband(C, margaret), child(arthur, D). The presence of the above negative example however forces the algorithm (during the reduction step) to look for more specific body literals and this is how it produces the clause: child(A, B) :- husband(C, B), child(A, C). Now instantiating this clause with the negative example in question we get child(arthur, margaret) :- husband(C, margaret), child(arthur, C). This clause is now false, because there is no value for C, so that the two body literals are found in the BK or in the positive examples. 2. Heuristic search - FOIL -------------------------- ?- ['c:/prolog/foil']. ?- ['c:/prolog/path.fol']. Here we load the dataset used in the experiments with RLGG, however formated differently. There are three basic differences: - all examples and BK are put as facts and loaded directly in the Prolog database. - FOIL requires some parameters, that are included (and explained in the file). - The negative examples are not included, because FOIL can generate them automatically by using CWA (the parameter foil_cwa(true) specifies this). Alternatively, we can specify the negative examples explicitly (see kinship.fol). To run FOIL we specify the target predicate (name/arity). ?- foil(path/2). ... Here FOIL explaines the steps in finding the clauses for the target predicate and then prints: Found definition: path(A, B) :- link(A, C), path(C, B). path(A, B) :- link(A, B). Note that this definition is the most accurate one. Let's run FOIL on the kinship data. The dataset formatted accordingly is in the file "kinship.fol". ?- ['c:/prolog/kinship.fol']. This is the same dataset as used in the RLGG experiments. So, in this case we use explicit negative examples (foil_cwa(false)). ?- foil(child/2). ... Found definition: child(A, B) :- mother(B, C), father(D, A). child(A, B) :- father(B, A). child(A, B) :- mother(B, A). Note that the definition found by FOIL is different from the one found by RLGG. The first clause is false. This is because the set of negative examples is incomplete. Try to prove the negative examples using the first clause - this is not possible. So, this clause is acceptable in the presence of the current negative examples. To correct this result we may add more negatives (try to find out which ones) or use all negatives through the CWA mode. Just change foil_cwa(false) to foil_cwa(true). Then we get the correct defintion: child(A, B) :- mother(B, C), father(D, A), husband(D, B). child(A, B) :- father(B, A). child(A, B) :- mother(B, A). Note that this definition is more complete than the one produced by RLGG. The reason for this is the restricted set of negative examples used in kinship.ilp. You may want to figure out more negatives and add them to kinship.ilp, so that RLGG gets the right definition of "child".