Machine Learning - Spring 2004
==============================
Lab experiments 9
-----------------
Program: bn.pl
Data: alarm.pl, loandata.pl
---------------------------

The belief network representation is explained in the data 
file alarm.pl. Read it.

The BN reasoning algorithm is used through p(Var,Obs,Dist), where:
 - Var is a variable (as defined in variables([...]));
 - Obs is a list of observations (variable=value pairs);
 - Dist is a distibution of P(Var|Obs);
 - Var must not appear in the observations Obs.

I. Reasoning with the Alarm belief network
==========================================

?- ['c:/prolog/bn.pl'].      % load program 

?- ['c:/prolog/alarm.pl'].   % load data set


1. Diagnostic reasoning: from effect to cause (the effect is given as evidence)
-------------------------------------------------------------------------------

What is the probability distribution of Burglary (b) given the evidence
that John calls (j=t)?

?- p(b,[j=t],P).

P = [0.0162837, 0.983716] 

The probability distribution [0.0162837, 0.983716] corresponds to the
values of the variable b as defined in values(b,[t,f]). Thus, the
probabilty of "f" is 0.983716. Without this evidence we have

?- p(b,[],P).

P = [0.001, 0.999] 

This is the prior distribution of b as it has no parents.
Let's add more evidence: Jonh calls and Mary calls too. 

?- p(b,[j=t,m=t],P).

P = [0.284172, 0.715828] 

The probabilty of burglary increases. If they both call, it becomes
more likely that there is a burglary. But still, the alarm is not on,
so the probabilty of b is not too high. Adding the alarm (a=t) further
increases this probabilty.

?- p(b,[j=t,m=t,a=t],P).

P = [0.373551, 0.626449] 

Because there is another possible cause for the alarm (earthquake),
the probabilty of b can also increase by adding the evidence that
there is no earthquake (e=f).

?- p(b,[j=t,m=t,a=t,e=f],P).

P = [0.484786, 0.515214] 

This last example however is not diagnostic reasoning, because 
b (burglary) and e (earthquake) are not connected by a causal relation.


2. Predictive reasoning: from cause to effect
---------------------------------------------

What is the probability distribution of John calls (j), given that 
there is a burglary (b=t)?


?- p(j,[b=t],P).

P = [0.849017, 0.150983] 

So, it's very likely that Jonh calls in this situation (the first 
value for j is t). Similarly, we can get the probability of Mary calls
with the same evidence. It's lower than John's (why?).

?- p(m,[b=t],P).

P = [0.658614, 0.341386] 


II. Creating a belief network for the loandata (adding additional data to loandata.pl)
======================================================================================
1. The variables are the class and the attributes:

   variables([class, emp, buy, sex, married]).

2. The structure of the graph represents the causal relationship between
   the attributes and the class. The class value determines (is a cause for) the 
   attibute values (the effects). So, we have the class node as a parent
   of all attributes. In Prolog this is:

   parents(emp,[class]).
   parents(buy,[class]).
   parents(sex,[class]).
   parents(married,[class]).
   parents(class,[]).

3. Attribute values -> values for variables

   values(emp,[yes,no]).
   values(buy,[comp,car]).
   values(sex,[m,f]).
   values(married,[yes,no]).
   values(class,[approve,reject]).

4. Conditional probabilities: use bayes.pl to compute them.
   Conditional independance assumption: Attr_i is conditionally independent
   of Attr_j given Class.

CPT for emp:
------------
?- cond_prob([emp=yes],approve,P).

P=[(emp = yes) / 1.000]

?- cond_prob([emp=no],approve,P).

P=[(emp = no) / 0.000]

?- cond_prob([emp=yes],reject,P).

P=[(emp = yes) / 0.250]

?- cond_prob([emp=no],reject,P).

P=[(emp = no) / 0.750]

=> pr(emp,[class=approve],[1.000,0.000]).
   pr(emp,[class=reject],[0.250,0.750]).

CPT for buy:
------------
?- cond_prob([buy=comp],approve,P).

P=[(buy = comp) / 0.875]   ([buy=car] is obviously 0.125)

?- cond_prob([buy=comp],reject,P).

P=[(buy = comp) / 0.500]   ([buy=car] is obviously 0.500)

=> pr(buy,[class=approve],[0.875,0.125]).
   pr(buy,[class=reject],[0.500,0.500]).

and so on ...
-------------
   pr(sex,[class=approve],[0.500,0.500]).
   pr(sex,[class=reject],[0.250,0.750]).

   pr(married,[class=approve],[0.500,0.500]).
   pr(married,[class=reject],[0.750,0.250]).

CPT for class:
--------------
?- class_prob(approve,P).

P=0.667

?- class_prob(reject,P).

P=0.333

=> pr(class,[],[0.667,0.333]).


III. Classification of new examples
===================================

?- ['c:/prolog/loandata.pl'].    % the BN is included in the data file 

The example is supplied as evidence:

?- p(class,[emp=yes,buy=car,sex=m,married=no],P). 

P = [0.889037, 0.110963] 

Because the probabilty of class = approve is higher (0.889037) we can classify this examples as
approve.

Basically the same results are obtained by naive bayes:

?- ['c:/prolog/bayes.pl'].   % load the program only, the examples are already loaded with loandata.pl

?- probs([emp=yes,buy=car,sex=m,married=no],[C1/L1,C2/L2]), P1 is L1/(L1+L2),P2 is L2/(L1+L2).

C1 = approve
L1 = 0.0208333
C2 = reject
L2 = 0.00260417
P1 = 0.888889
P2 = 0.111111 

We need a longer query here, because naive bayes originally computes likelihoods (L1 and L2).
To get the probabilties (P1 and P2) we aplly normalization.