Machine Learning - Spring 2004 ============================== Lab experiments 9 ----------------- Program: bn.pl Data: alarm.pl, loandata.pl --------------------------- The belief network representation is explained in the data file alarm.pl. Read it. The BN reasoning algorithm is used through p(Var,Obs,Dist), where: - Var is a variable (as defined in variables([...])); - Obs is a list of observations (variable=value pairs); - Dist is a distibution of P(Var|Obs); - Var must not appear in the observations Obs. I. Reasoning with the Alarm belief network ========================================== ?- ['c:/prolog/bn.pl']. % load program ?- ['c:/prolog/alarm.pl']. % load data set 1. Diagnostic reasoning: from effect to cause (the effect is given as evidence) ------------------------------------------------------------------------------- What is the probability distribution of Burglary (b) given the evidence that John calls (j=t)? ?- p(b,[j=t],P). P = [0.0162837, 0.983716] The probability distribution [0.0162837, 0.983716] corresponds to the values of the variable b as defined in values(b,[t,f]). Thus, the probabilty of "f" is 0.983716. Without this evidence we have ?- p(b,[],P). P = [0.001, 0.999] This is the prior distribution of b as it has no parents. Let's add more evidence: Jonh calls and Mary calls too. ?- p(b,[j=t,m=t],P). P = [0.284172, 0.715828] The probabilty of burglary increases. If they both call, it becomes more likely that there is a burglary. But still, the alarm is not on, so the probabilty of b is not too high. Adding the alarm (a=t) further increases this probabilty. ?- p(b,[j=t,m=t,a=t],P). P = [0.373551, 0.626449] Because there is another possible cause for the alarm (earthquake), the probabilty of b can also increase by adding the evidence that there is no earthquake (e=f). ?- p(b,[j=t,m=t,a=t,e=f],P). P = [0.484786, 0.515214] This last example however is not diagnostic reasoning, because b (burglary) and e (earthquake) are not connected by a causal relation. 2. Predictive reasoning: from cause to effect --------------------------------------------- What is the probability distribution of John calls (j), given that there is a burglary (b=t)? ?- p(j,[b=t],P). P = [0.849017, 0.150983] So, it's very likely that Jonh calls in this situation (the first value for j is t). Similarly, we can get the probability of Mary calls with the same evidence. It's lower than John's (why?). ?- p(m,[b=t],P). P = [0.658614, 0.341386] II. Creating a belief network for the loandata (adding additional data to loandata.pl) ====================================================================================== 1. The variables are the class and the attributes: variables([class, emp, buy, sex, married]). 2. The structure of the graph represents the causal relationship between the attributes and the class. The class value determines (is a cause for) the attibute values (the effects). So, we have the class node as a parent of all attributes. In Prolog this is: parents(emp,[class]). parents(buy,[class]). parents(sex,[class]). parents(married,[class]). parents(class,[]). 3. Attribute values -> values for variables values(emp,[yes,no]). values(buy,[comp,car]). values(sex,[m,f]). values(married,[yes,no]). values(class,[approve,reject]). 4. Conditional probabilities: use bayes.pl to compute them. Conditional independance assumption: Attr_i is conditionally independent of Attr_j given Class. CPT for emp: ------------ ?- cond_prob([emp=yes],approve,P). P=[(emp = yes) / 1.000] ?- cond_prob([emp=no],approve,P). P=[(emp = no) / 0.000] ?- cond_prob([emp=yes],reject,P). P=[(emp = yes) / 0.250] ?- cond_prob([emp=no],reject,P). P=[(emp = no) / 0.750] => pr(emp,[class=approve],[1.000,0.000]). pr(emp,[class=reject],[0.250,0.750]). CPT for buy: ------------ ?- cond_prob([buy=comp],approve,P). P=[(buy = comp) / 0.875] ([buy=car] is obviously 0.125) ?- cond_prob([buy=comp],reject,P). P=[(buy = comp) / 0.500] ([buy=car] is obviously 0.500) => pr(buy,[class=approve],[0.875,0.125]). pr(buy,[class=reject],[0.500,0.500]). and so on ... ------------- pr(sex,[class=approve],[0.500,0.500]). pr(sex,[class=reject],[0.250,0.750]). pr(married,[class=approve],[0.500,0.500]). pr(married,[class=reject],[0.750,0.250]). CPT for class: -------------- ?- class_prob(approve,P). P=0.667 ?- class_prob(reject,P). P=0.333 => pr(class,[],[0.667,0.333]). III. Classification of new examples =================================== ?- ['c:/prolog/loandata.pl']. % the BN is included in the data file The example is supplied as evidence: ?- p(class,[emp=yes,buy=car,sex=m,married=no],P). P = [0.889037, 0.110963] Because the probabilty of class = approve is higher (0.889037) we can classify this examples as approve. Basically the same results are obtained by naive bayes: ?- ['c:/prolog/bayes.pl']. % load the program only, the examples are already loaded with loandata.pl ?- probs([emp=yes,buy=car,sex=m,married=no],[C1/L1,C2/L2]), P1 is L1/(L1+L2),P2 is L2/(L1+L2). C1 = approve L1 = 0.0208333 C2 = reject L2 = 0.00260417 P1 = 0.888889 P2 = 0.111111 We need a longer query here, because naive bayes originally computes likelihoods (L1 and L2). To get the probabilties (P1 and P2) we aplly normalization.