###
__Machine Learning - Summer__** 2003**

##
**Project 3 - Evaluating hypotheses/algorithms**

**Posted: 6/10/2003**

**Due: 6/16/2003**
**1. Holdout procedure. **Find **two stratified splits** of the
**PlayTennis**
data into **training (2/3 or 9 examples)** and **test (1/3 or 5 examples)
sets**, so that the **id3** generated hypothesis for the first split
has **higher error** than the hypothesis generated from the second split.
__Include
in your report:__ the training and test sets, as well as a description
of the procedure you used to calculate the error (e.g. the outputs of the
Zprolog queries you used).

**2. Leave-one-out cross-validation. ** Rate the algorithms
id3.pl,
lgg.pl
and search.pl. according to the **hypothesis
accuracy** they achieve on the data sets:
animals.pl,
loandata.pl
and __PlayTennis__. Evaluate the hypothesis accuracy by using **LOO
cross-validation**.
__Include in your report:__ a description
of the procedure you used to calculate the error (e.g. the outputs of the
Zprolog queries you used).

**3. Evaluating error free hypotheses with MDL.** Use the three algorithms
id3.pl,
lgg.pl
and search.pl. to build three hypotheses
(sets of rules) for the **complete** **PlayTennis** data set. Then
**compute
the information compression** for each hypothesis. That is
**compr(Hi)=L(E)-L(Hi)**,
where Hi (i = 1, 2, 3) is the corresponding hypothesis and E is the data
set (we cannot use here the code length of the exceptions L(E|Hi), as the
hypotheses are error free, i.e. there are no exceptions). __Include
in your report:__ all work when you compute information compression
including Zprolog queries and output.

**4. Evaluating hypotheses with MDL by encoding exceptions. **Get
the **PlayTennis** data split with **lower error rate** as you found
it **in Problem 1**. Then using the three algorithms
id3.pl,
lgg.pl
and search.pl. build three hypotheses
H1, H2, and H3 **from the larger (2/3) subset** in this split and compute
the MDL compression using exceptions for each one. That is **compr(Hi)=L(E)-L(Hi)-L(E|Hi)**,
for i=1, 2, 3**. Note **that E here is the whole PlayTennis data set.
__Include in your report:__ all work when you compute information
compression including Zprolog queries and output.