Machine Learning - Summer 2010 ============================== Lab experiments 1 Part I: Nominal Attributes -------------------------- Programs needed: covering.pl Data files: animals.pl Prolog query: ?- ['c:/prolog/covering', 'c:/prolog/animals']. (assuming that prolog files are stored in "c:/prolog") ----------------------------------------------------------------------- 1. ACCESSING INDIVIDUAL EXAMPLES IN THE DATABASE ------------------------------------------------ ?- example(1,mammal,E). E = [has_covering = hair,milk = t,homeothermic = t,habitat = land,eggs = f,gills = f] 2. ACCESSING EXAMPLES IN THE DATABASE BY USING ALTERNATIVE SOLUTIONS (;) ------------------------------------------------------------------------ ?- example(N,mammal,E). N = 1 E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] ; N = 2 E = [has_covering=none, milk=t, homeothermic=t, habitat=sea, eggs=f, gills=f] ; N = 3 E = [has_covering=hair, milk=t, homeothermic=t, habitat=sea, eggs=t, gills=f] ; N = 4 E = [has_covering=hair, milk=t, homeothermic=t, habitat=air, eggs=f, gills=f] 3. PRINTING ALL SOLUTIONS BY ADDING write(...),nl,fail. TO THE END OF THE QUERY ------------------------------------------------------------------------------- ?- example(N,mammal,E),write(E),nl,fail. [has_covering = hair,milk = t,homeothermic = t,habitat = land,eggs = f,gills = f] [has_covering = none,milk = t,homeothermic = t,habitat = sea,eggs = f,gills = f] [has_covering = hair,milk = t,homeothermic = t,habitat = sea,eggs = t,gills = f] [has_covering = hair,milk = t,homeothermic = t,habitat = air,eggs = f,gills = f] no NOTE: the "no" here means that the query failed. This always happens when "fail" is added at the end. After generating all alternative solutions from the goals before "fail" the query fails. "fail" plays the role of a barrier that bounces back the control to the goals on the left and forces them to generate alternative solutions.) 4. COLLECTING SOLUTIONS IN A SET: GENERATING A SET OF EXAMPLES FROM THE DATABASE -------------------------------------------------------------------------------- ?- findall(N,example(N,mammal,_),L). N = _G360 L = [1, 2, 3, 4] ?- findall(N,example(N,_,_),L). N = _G333 L = [1, 2, 3, 4, 5, 6, 7, 8, 9|...] NOTE that the list shown here is not complete. In fact, it is, but Prolog prints ... to avoid long answers. To see the complete list you can use "write(L)". ?- findall(N,example(N,_,_),L),write(L). [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] N = _G387 L = [1, 2, 3, 4, 5, 6, 7, 8, 9|...] 5. FINDING COVERAGE (MODEL) OF HYPOTHESES ----------------------------------------- ?- model([],L),write(L). [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] L = [1, 2, 3, 4, 5, 6, 7, 8, 9|...] NOTE: Hereafter we use write(Variable), when we want to see the value of the Variable and it's a list with too many elements. Note also, that Prolog always shows the values of the variables in the query. ?- model([has_covering=hair],M). M = [1, 3, 4] ?- model([milk=t],M). M = [1, 2, 3, 4] 6. GENERALIZING EXAMPLES (GENERATING HYPOTHESES) ------------------------------------------------ ?- example(1,mammal,E),generalize(E,H). E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [has_covering=hair] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [milk=t] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [homeothermic=t] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [habitat=land] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [eggs=f] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [gills=f] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [eggs=f, gills=f] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [habitat=land, eggs=f] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [habitat=land, gills=f] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [homeothermic=t, habitat=land] ; E = [has_covering=hair, milk=t, homeothermic=t, habitat=land, eggs=f, gills=f] H = [homeothermic=t, eggs=f] ; ... NOTE: All the hypotheses above are built by generalizing example 1. Try another example. Or try the above query with all examples, i.e. ?- example(_,_,E),generalize(E,H). The underscores here ("_") mean any value, that is all examples will be used for generalization. To get a list of all hypoitheses generated from all examples try this: ?- findall(H,(example(_,_,E),generalize(E,H)),L),length(L,N). H = _G521 E = _G518 L = [[has_covering=hair], [milk=t], [homeothermic=t], [habitat=land], [eggs=f], [gills=f], ...] N = 630 ; In this query we added lentgh(L,N) in order to get the number of hypotheses. Note however that in the 630 hypotheses there are many repetitions (why?). If you want to see them all, use write(L) at the end. 7. LIMITING THE SIZE OF THE GENERALIZATIONS (HYPOTHESES) -------------------------------------------------------- ?- example(1,mammal,E),generalize(E,H),length(H,3),write(H),nl,fail. [habitat=land, eggs=f, gills=f] [homeothermic=t, eggs=f, gills=f] [homeothermic=t, habitat=land, eggs=f] [homeothermic=t, habitat=land, gills=f] [milk=t, eggs=f, gills=f] [milk=t, habitat=land, eggs=f] [milk=t, habitat=land, gills=f] [milk=t, homeothermic=t, habitat=land] [milk=t, homeothermic=t, eggs=f] [milk=t, homeothermic=t, gills=f] [has_covering=hair, eggs=f, gills=f] [has_covering=hair, habitat=land, eggs=f] [has_covering=hair, habitat=land, gills=f] [has_covering=hair, homeothermic=t, habitat=land] [has_covering=hair, homeothermic=t, eggs=f] [has_covering=hair, homeothermic=t, gills=f] [has_covering=hair, milk=t, homeothermic=t] [has_covering=hair, milk=t, habitat=land] [has_covering=hair, milk=t, eggs=f] [has_covering=hair, milk=t, gills=f] NOTE: "length" is a goal that returns the length of a list or succeds (i.e. allows the control to be passed to the next goal) if the number specified as second argument is the length of the list. So, length(H,3) succeeds for all H's that have exactly 3 elements. 8. LIMITING THE SIZE OF THE MODELS (NUMBER OF EXAMPLES COVERED) -------------------------------------------------------------- ?- example(1,mammal,E),generalize(E,H),model(H,M),length(M,L),L>3,write(H-M),nl,fail. [milk=t]-[1, 2, 3, 4] [homeothermic=t]-[1, 2, 3, 4, 8, 9] [habitat=land]-[1, 6, 9, 10] [gills=f]-[1, 2, 3, 4, 6, 7, 8, 9, 10] [habitat=land, gills=f]-[1, 6, 9, 10] [homeothermic=t, gills=f]-[1, 2, 3, 4, 8, 9] [milk=t, homeothermic=t]-[1, 2, 3, 4] [milk=t, gills=f]-[1, 2, 3, 4] [milk=t, homeothermic=t, gills=f]-[1, 2, 3, 4] NOTE 1: Here length(H,L) just returns the length of H in L. Then the goal "L>3" succeeds with all L's greater than 3, i.e. lists M with more that 3 elements. NOTE 2: write(H-M) is a common way to print two variables with one write goal. If you want them separated by comma, use write((H,M)). It also works for more variables, e.g. write(A-B-C-...) or write((A,B,C,...)). 9. SELECTING CORRECT HYPOTHESES (HYPOTHESES THAT COVER EXAMPLES FROM A GIVEN CLASS ONLY) ---------------------------------------------------------------------------------------- ?- example(1,mammal,E),generalize(E,H),model(H,M),length(M,L),L>3, \+ (member(X,M),example(X,C,_),C\=mammal),write(H-M),nl,fail. [milk=t]-[1, 2, 3, 4] [milk=t, homeothermic=t]-[1, 2, 3, 4] [milk=t, gills=f]-[1, 2, 3, 4] [milk=t, homeothermic=t, gills=f]-[1, 2, 3, 4] NOTE 1: A long line may be broken after a comma. NOTE 2: "\=" means not equal and works with variables, numbers and other structures too. NOTE 3: "\+" means logical negation. That is, \+ Goal is true if Goal is false and vice versa. To use it with conjunctions of goals the goals must be put in parentheses. So, in the above query \+ (member(X,M),example(X,C,_),C\=mammal) is a logical negation of (member(X,M),example(X,C,_),C\=mammal). NOTE 4: The above query implements the so-called "generate and test" algorithm. 1. The first line (example(1,mammal,E),generalize(E,H),model(H,M),length(M,L),L>3) generates all hypotheses built by generalizing example 1, which cover more than 3 examples (the same as in item 8). 2. The second part, \+ (member(X,M),example(X,C,_),C\=mammal) filters out all hypotheses, which cover an example (a member of the model) whose class (C) is not equal to mammal. 3. "write(H-M)" prints the hypothesis curently found and its model, "nl" moves to the next line, and "fail" forces all previous goals (in fact, the generate part) to find all alternative solutions. NOTE 5: The following query prints all hypotheses generated from all examples from class mammal (not just example 1). Note however that they are just the same as above. ?- example(_,mammal,E),generalize(E,H),model(H,M),length(M,L),L>3, \+ (member(X,M),example(X,C,_),C\=mammal),write(H-M),nl,fail. [milk=t]-[1, 2, 3, 4] [milk=t, homeothermic=t]-[1, 2, 3, 4] [milk=t, gills=f]-[1, 2, 3, 4] [milk=t, homeothermic=t, gills=f]-[1, 2, 3, 4] [milk=t]-[1, 2, 3, 4] [milk=t, homeothermic=t]-[1, 2, 3, 4] [milk=t, gills=f]-[1, 2, 3, 4] [milk=t, homeothermic=t, gills=f]-[1, 2, 3, 4] [milk=t]-[1, 2, 3, 4] [milk=t, homeothermic=t]-[1, 2, 3, 4] [milk=t, gills=f]-[1, 2, 3, 4] [milk=t, homeothermic=t, gills=f]-[1, 2, 3, 4] [milk=t]-[1, 2, 3, 4] [milk=t, homeothermic=t]-[1, 2, 3, 4] [milk=t, gills=f]-[1, 2, 3, 4] [milk=t, homeothermic=t, gills=f]-[1, 2, 3, 4] 10. USING LEAST GENERAL GENERALIZATION (lgg) TO GENERATE HYPOTHESES ------------------------------------------------------------------- ?- example(N1,mammal,E1),example(N2,mammal,E2),N1>N2,lgg(E1,E2,H),model(H,M), \+ (member(X,M),example(X,C,_),C\=mammal),write(H-M),nl,fail. [milk=t, homeothermic=t, eggs=f, gills=f]-[1, 2, 4] [has_covering=hair, milk=t, homeothermic=t, gills=f]-[1, 3, 4] [milk=t, homeothermic=t, habitat=sea, gills=f]-[2, 3] [has_covering=hair, milk=t, homeothermic=t, eggs=f, gills=f]-[1, 4] [milk=t, homeothermic=t, eggs=f, gills=f]-[1, 2, 4] [has_covering=hair, milk=t, homeothermic=t, gills=f]-[1, 3, 4] NOTE: This query also implements the generate and test algorithm: 1. The first three goals (example(N1,mammal,E1),example(N2,mammal,E2),N1>N2) pick two examples, whcih are different (ensured by N1>N2). 2. lgg(E1,E2,H) builds the least general hypothesis that covers both E1 and E2. This is simply the intersection of E1 and E2. 3. model(H,M) returns in M the model of H. So, M is the set of examples that H covers. 4. \+ (member(X,M),example(X,C,_),C\=mammal) filters out those hypotheses that cover examples from non-mammal classes. 5. "write(H-M),nl,fail" prints the currently generated hypotesis H and its model M and forces all previous goals to find all alternative solutions. ====================================================================================== Lab experiments 1 Part II: Structural Attributes ---------------------------------------- Programs needed: covering.pl Data files: shapes.pl, taxonomy.pl Prolog query: ?- ['c:/prolog/covering','c:/prolog/shapes','c:/prolog/taxonomy']. (assuming that prolog files are stored in "c:/prolog") ----------------------------------------------------------------------- 11. ACCESSING INDIVIDUAL EXAMPLES IN THE DATABASE ------------------------------------------------- ?- example(N,C,E). N = 1 C = pos E = [red, square] ; N = 2 C = pos E = [blue, rectangle] ; N = 3 C = neg E = [pink, triangle] ; N = 4 C = neg E = [blue, ellipse] ; N = 5 C = pos E = [orange, square] ; NOTE the difference in the representation. The attribute names are not used explicitly. Instead just the values are included in the list. The attributes are determined by the order of their values. That is, the first value encodes the color and the second one - the shape. 12. FINDING COVERAGE (MODEL) OF HYPOTHESES ------------------------------------------ ?- smodel([mono,polygon],L). L = [1, 2] ?- smodel([color,shape],L). L = [1, 2, 3, 4, 5] NOTE 1: With structural attributes all hypotheses have the same number of elements (the number of attributes). NOTE 2: The most general hypothesis is the pair of the root values in the taxonomies: [color,shape]. Obviously it covers all examples. 13. GENERALIZATION/SPECIALIZATION OF EXAMPLES/HYPOTHESES -------------------------------------------------------- ?- example(1,pos,E),scovers(H,E),write(H>E),nl,fail. [red, square]>[red, square] [red, 4-sided]>[red, square] [red, polygon]>[red, square] [red, shape]>[red, square] [mono, square]>[red, square] [mono, 4-sided]>[red, square] [mono, polygon]>[red, square] [mono, shape]>[red, square] [color, square]>[red, square] [color, 4-sided]>[red, square] [color, polygon]>[red, square] [color, shape]>[red, square] NOTE that the goal scovers(H,E) may by used in three different ways: 1. To check for covering, e.g. ?- scovers([mono, square],[red, square]). will return yes. 2. To generate generalizations, as in the example above, where the second argument is the input (the example) and the first one is the output (the hypothesis). 3. To generate specializations, like in the following example, where the first argument is the input (the hypothesis) and the second one is the output (its specializations): ?- scovers([mono,4-sided],E),write(E),nl,fail. [mono, 4-sided] [mono, rectangle] [mono, square] [mono, trapezoid] [red, 4-sided] [red, rectangle] [red, square] [red, trapezoid] [blue, 4-sided] [blue, rectangle] [blue, square] [blue, trapezoid] [white, 4-sided] [white, rectangle] [white, square] [white, trapezoid] [black, 4-sided] [black, rectangle] [black, square] [black, trapezoid] 14. LEAST GENERAL GENERALIZATIONS --------------------------------- ?- example(N1,pos,E1),example(N2,pos,E2),N1>N2,slgg(E1,E2,H). N1 = 2 E1 = [blue, rectangle] N2 = 1 E2 = [red, square] H = [mono, 4-sided] ; N1 = 5 E1 = [orange, square] N2 = 1 E2 = [red, square] H = [color, square] ; N1 = 5 E1 = [orange, square] N2 = 2 E2 = [blue, rectangle] H = [color, 4-sided] NOTE 1: Take a look at the picture of the taxonomy in taxonomy.pl and see how these hypotheses are generated. NOTE 2: Test the hypotheses above for covering negative examples.