- Introducing students to the basic
**concepts and techniques of Artificial Intelligence**. -
**Learning AI by doing it**, i.e. developing skills of using AI algorithms for solving practical problems. - To gain experience of doing
**independent study and research**.

- Short
- Russell Beale, University of Birmingham, UK: AI can be defined as the attempt to get real machines to behave like the ones in the movies.
- John McCarthy, Stanford University: It is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable.
- Ron Brachman, AAAI: There's a lot to try to understand about the kind of reasoning and learning that people do that's very different than sort of classical, von Neumann, step-by-step computer algorithms.
- Long (Aaron Sloman, University of Birmingham, UK): Physics and chemistry study matter, energy, forces, and the various ways they can be combined and transformed. Biology, geology, medicine, and many other sciences and engineering disciplines build on this by studying more and more complex systems built from physical components. All this research requires an understanding of naturally occurring and artificial machines which operate on forces, energy of various kinds, and transform or rearrange matter. But some of the machines, both natural and artificial, also manipulate knowledge. It is now clear that a new kind of science is required for the study of the principles by which
- knowledge is acquired and used,
- goals are generated and achieved,
- information is communicated,
- collaboration is achieved,
- concepts are formed,
- languages are developed.

- Stuart Russell, Peter Norvig. Artificial Intelligence: A Modern Approach, Second Edition, Prentice Hall. 2003, ISBN: 0-13-790395-2. (hereafter referred to as AIMA)

- SWI-Prolog. Use the stable versions and the self-installing executable for Windows 95/98/ME/NT/2000/XP. For this course you need only the basic components, so you may uncheck all optional components.
- Prolog Tutorials

- AI Topics (AAAI)
- AI Overview
- The World Wide Web Virtual Library: Artificial Intelligence
- AI on the Web
- The World Wide Web Virtual Library: Logic Programming (Prolog)

- Semester project (30%).
- Midterm and final exam (20%).
- Programming assignments (40%)
- Paper (10%)

The letter grades will be calculated according to the following
table:

A | A- | B+ | B | B- | C+ | C | C- | D+ | D | D- | F |

95-100 | 90-94 | 87-89 | 84-86 | 80-83 | 77-79 | 74-76 | 70-73 | 67-69 | 64-66 | 60-63 | 0-59 |

Late assignments will be marked one letter grade down for each two classes they are late. It is expected that all students will conduct themselves in an honest manner and NEVER claim work which is not their own. Violating this policy will result in a substantial grade penalty or a final grade of F.

- Introduction - Intelligent Agents
- Lecture notes/slides: Introduction to AI, Intelligent Agents
- Reading: AIMA - Chapters 1, 2
- Additional reading
- Problem Solving by Search, Introduction to Prolog
- Lecture notes/slides: Problem Solving by Search
- Reading: AIMA - Chapter 3
- Additional reading: Quick Introduction to Prolog
- Programs: path.pl, kinship.pl, farmer.pl, search1.pl, graph.pl, map.pl, 8-puzzle.pl
- Lab experiments:
- Search in Prolog: use path.pl and try all versions of path with the modified graph too (add/remove arc(4,1) in two different places. Explain the results.
- Prolog as Search: use kinship.pl and do the exercises described in it.
- Solve the the farmer, wolf, cabbage and goat problem by using Prolog search: farmer.pl
- Use search1.pl to find paths from node 1 to node 9 in graph.pl (explain the results from the following queries):
- ?- depth_first([[1]],9,P,N).
- ?- breadth_first([[1]],9,P,N)
- ?- iterative_deepening([[1]],9,P)
- ?- uni_cost([[1]],9,P,N)
- ?- uni_cost([[1]],9,P,N),reverse_path_cost(P,C). or ?- uni_cost([[1]],9,P,N),reverse(P,P1),path_cost(P1,C).
- Add writeln(NewQueue) after append (in depth_first and breadth_first) and after sort_queue (in uni_cost) to see all step of the search process.
- Evaluate search algorithms:
- Complexity: time (number of explored nodes N) and space (max length of the queue).
- Completeness: compare ?- depth_bound(3,[[1]],9,P). and ?- depth_bound(4,[[1]],9,P).
- Optimality: lowest path cost
- Search the Web: use the WebSPHINX: A Personal, Customizable Web Crawler to search the web in depth-first and breadth-first fashion (run the Web Crawler here)
- Solving the 8-puzzle: board(2,3,5,0,1,4,6,7,8) => board(0,1,2,3,4,5,6,7,8)
- Compare breadth_first, iterative_deepening and depth_first.
- Explain why depth_first fails.
- How to find a board state that can be solved.
- Exercises:
- Find the shortest path (by number of towns passed and by distance) from Arad to Bucharest (use map.pl)
- Solve the farmer, wolf, cabbage and goat problem by using search1.pl
- Implement bubble sort by state space search
- Heuristic (Informed) Search
- Lecture notes/slides: Heuristic Search
- Reading: AIMA - Chapter 4
- Programs: search2.pl, map.pl, 8-puzzle.pl
- Lab experiments:
- Solving the 8-puzzle: board(2,3,5,0,1,4,6,7,8) => board(0,1,2,3,4,5,6,7,8)
- Use best_first, a_star and beam search. For beam search try n=100,10,1. What changes? Explain why it fails with n=1?
- Compare results with depth_first, breadth_first and iterative deepening.
- Print the queue and see how memory complexity changes.
- Finding paths from Arad to Bucharest. Compare cost and size of paths with informed and uninformed search.
- Exercises:
- Define a non-admissible heuristic function for graph.pl and compare the results from best_first, a_star and uni_cost algorithms.
- Find admissible heuristic functions and implement bubble sort by heuristic search.
- Constraint Satisfaction
- Lecture notes/slides: Constraint Satisfaction
- Reading: AIMA - Chapter 5
- Programs: backtrack.pl, csp.pl, 4-queens.pl
- Lab experiments: solving the map coloring problem by backtracking and constraint propagation (using heuristics)
- Exercises:
- Sovle the coloring problem by csp.pl
- Represent the list sorting problem in CSP terms and solve it by csp.pl
- Assignment
#1: Problem Solving by Searching.
**Due on February 24.** - Games
- Lecture notes/slides: Searching Game Trees
- Reading: AIMA - Chapter 6
- Additional reading:
- The first game-playing program, for checkers
- Deep-Blue
- David G. Stork: The end of an era, the beginning of another? HAL, Deep Blue and Kasparov.
- Programs: minimax.pl, gametree.pl
- Lab experiments and exercises:
- The game of tic-tac-toe (xo.pl) using minimax.pl
- Find an evaluation function for the game of tic-tac-toe and implement a depth-bounded search.
- Knowledge-Based Agents - Propositional and First-Order Logic
- Lecture slides: logic.pdf
- Lecture notes: lnai-logic.pdf
- Reading: AIMA - Chapters 7, 8
- Additional reading: Clue Deduction: an introduction to satisfiability reasoning
- Programs: clausify.pl, logic.pl, agents.pl
- Lab experiments/Exercises: logic.pl (read the comment inside)
- Describing the wumpus world in Propositional and Fisrt-Order Logic: logic.pl, agents.pl
- Representing English sentences as Propositional and FOL formulas: logic.pl
- Translating formulas into clausal (CNF) form: clausify.pl
- Unification in Prolog
- Models and Satisfiability of Propositional sentences: sat.pl
- Inference in First-Order Logic, Logic Programming and Prolog
- Lecture notes/slides: inference.pdf
- Reading: AIMA - Chapter 9.
- Programs: clausify.pl, resolve.pl, logic.pl, wumpus.pl, agents.pl
- Lab experiments/exercises: logic.pl (read the comment inside)
- Finite/infinite models: ?- resolve([cl([p(f(X))],[p(X)]),cl([p(a)],[])]).
- Clause subsumption
- Clausal resolution and SLD (Prolog) resolution.
- Incompleteness of Prolog: ?- p(a,c).
- Completness of SLD resolution: draw the refutation tree of P.
- Knowledge Representation
- Reading: AIMA - Chapter 10
- Programs: es.pl, cars.pl
- Lab experiments/exercises: car diagnostic with a MYCIN-like expert system shell.
- Knowledge representation for the Web: Semantic Web
- Assignment #2: Reasoning with
Propositional
and
First-Order Logic.
**Due on April 5.** - Planning
- Lecture notes/slides: planning.pdf
- Reading: AIMA - Chapter 11
- Additional reading: Shakey the Robot
- Programs: planner.pl, scalc.pl
- Lab experiments/exercises:
- Comparing deductive (scalc.pl) and STRIPS planning (planner.pl): efficiency and optimality.
- Adding constraints: e.g. put a smaller block on a bigger one.
- Solving Hanoi towers by planning.
- Uncertainty and Probabilistic Reasoning
- Lecture notes/slides: probreasoning.pdf
- Reading: AIMA - Chapters 13, 14
- Additional reading/software: Microsoft Bayesian Network Editor (MSBNx)
- Programs: bn.pl, alarm.pl, loandata.pl, tennis.pl
- Lab experiments:
- Use the tennis.pl data and create a joint probability distribution of outlook and humidity.
- Use the tennis.pl data and compute: P(outlook=sunny|play=yes) = ?, P(humidity=high|temp=hot) = ?
- Using bn.pl with loandata.pl to compute the distribution of "class" with the evidence [emp=yes, buy=car, sex=m, married=no].
- Exercises: Implement the Naive Bayes approach to predict playing tennis using bn.pl and the tennis.pl data.
- Machine Learning - Basic Concepts, Version Space, Decision Trees
- Illustrative example/lab project: Web/text document classification (files: textmine.pl, webdata.zip, artsci.pl)
- Lecture notes/slides:
- Decision tree learning: Lecture notes.
- Languages for learning: Slides, Lecture notes.
- Version space learning: Slides, Lecture notes.
- Reading: AIMA - Chapters 18, 19
- Additional Reading: Chris Thornton, Truth from Trash:How Learning Makes Sense, MIT Press, 2000
- Programs: covering.pl, vs.pl, id3.pl
- Data: taxonomy.pl, shapes.pl, tennis.pl, animals.pl, loandata.pl
- Lab experiments:
- Machine Learning - Numeric Approaches, Clustering, Evaluation
- Lecture notes/slides:
- Naive Bayes
- Nearest Neighbor
- Evaluating
hypotheses (pages 1-10)

- Clustering
(Examples: cells,
mixture)

- Reading: AIMA - Chapter 20
- Programs: bayes.pl, knn.pl, cluster.pl, cells.pl
- Lab experiments:
- Naive
Bayes (parts I, II and IV)

- Nearest
Neighbor

- Clustering
(part II)

- Learning with Background Knowledge - Explanation-Based Learning, Inductive Logic Programming
- Lecture notes/slides:
- Reading: AIMA - Chapter 19
- Programs: ebl.pl, ilp.pl, foil.pl
- Lab experiments:
- Exercises:
- Assignment #3: Probabilistic Reasoning and Learning
- Natural Language Processing
- Lecture notes/slides:
- Reading: AIMA - Chapter 22
- Programs: grammar.pl, qa.pl
- Lab experiments:
- Exercises:
- Other Topics and Philosophical Foundations
- Lecture notes/slides:
- Reading: AIMA - Chapter 26

p(a,b).

p(c,b).

p(X,Y) :- p(X,Z),p(Z,Y).

p(X,Y) :- p(Y,X).

To do the semester projects students have to form teams of 3 people (2-people teams should consult the instructor first). Each team chooses one project to work on. The projects to choose from are the following:

- Web Document Classification Project
- Intelligent Web Browser Project
- Character Recognition and Learning with Neural Networks
- Clue Deduction: an introduction to satisfiability reasoning
- Solving the N-Puzzle Problem

To complete the project students are required to:

- Choose a project and
on the project team. Do this__submit the project title and the names of the students__Note that there is a restriction that__no later than February 15.__. Projects will be assigned on a__no more than two teams can work on the same project__.__first come first serve basis__ - Write an initial project description based on the general description provided in the WebCT course. This must include the following (may be discussed with the instructor before submission):
- A brief introduction to the area of the project
- Specific goals (must include all deliverables for projects 1-3 and for project 4 - a few of the logic exercises of Section 7 and the ClueReasoner described in Section 8)
- Approaches and algorithms to be used
- Resources to be used (data, programming tools or applications).
- A plan how to achieve the goals and evaluate the project results.
- Work distribution among the students on the team and a timetable.
- Submit reports on:
- the initial project description (item 2) by February 29.
- the progress they made by midterm (due during the midterm week)
- the results they achieved upon project completion (due during the final week). See the requirements for this report below.
- Make a presentation of the final report (during the final week).

- General introduction to the area
- Description of the problems addressed (experimented with or solved)
- Descriptions of the approaches and algorithms used to solve the problems
- Descriptions of the software applications used or the programs implemented
- Description of the experiments done for each problem attempted or solved
- Comment on the relation of the approaches used in the project to the areas of machine learning (ML), search and knowledge representation and reasoning (KR&R). In particular, the following questions should be answered:
- Which ML techniques have been used in the project?
- If no ML has been used explicitly, what is the relation of the approaches used in the project to ML? (any ML components used or project approaches applicable in the area of ML).
- Which search techniques have been used in the project?
- If no search has been used explicitly, what is the relation of the approaches used in the project to the area of search? (any search components used or project approaches applicable in the area of search).
- Which KR&R techniques have been used in the project?
- If no KR&R has been used explicitly, what is the relation of the approaches used in the project to the area of KR&R? (any KR&R components used or project approaches applicable in the area of KR&R).

- Creating intelligent (thinking) systems (machines, robots etc.) in order to:
- Model and study the natural (human) intelligence. Philosophy and psychology also study the natural (human) intelligence, but their goals do not include creating intelligent systems.
- Solving problems that require intelligence.
- Questions: What problems require intelligence? What is intelligence?

- Making computers "think", creating machines with "brains" (Haugeland, 85)
- Studying psychology by computational models (Charniak & McDermott, 85)
- Studying computational models that can reason and act rationally (Winston, 90)
- Building machine to perform functions which require intelligence when performed by humans (Kurzweil, 90)
- How to make computers do things which we (humans) still do better (Rich & Knight, 91)
- A branch of Computer Science, which deals with modeling intelligent behavior (Luger & Stubblefield, 93)
- Building intelligent agents (Russell & Norvig, 95)

- Models of human reasoning - cognitive modeling (cognitive science), GPS (Newell & Simon, 1972).
- Models of human behavior - The Turing Test (1950), Loebner Prize. Natural language processing, knowledge representation, reasoning and learning.
- Models of rational thought (logical approach). Aristotle's syllogistic logic: "Socrates is a human, humans are mortal, thus Socrates is mortal". Requires 100% accurate knowledge.
- Models of rational behavior - rational agents. Includes both (2) and (3), but is more general.

- Philosophy: AI history starts with Aristotle's invention of syllogistic logic, the first formal deductive reasoning system.
- Mathematics: Logic, algorithms, satisfiability, resolution.
- Psychology: experimental psychology, cognitive science.
- Computer Science: using Von Neumann architectures to model non-Von Neumann computation.
- Linguistics: Syntactic structures, formal grammars (Chomsky, 1957), natural language translation ("The spirit is willing but the flesh is weak" - "The vodka is good, but the meat is rotten").
- Brain Science: Neural networks
- Biology, Artificial Life, Evolutionary computation

- Agent = Architecture + Program
- Types of agents
- Reflex agent (stateless, condition-action rules)
- Model-based reflex agent (memory, knowledge representation)
- Goal-based agents (planning)
- Utility-based agents (decision making, uncertainty)
- Learning agents (machine learning)
- The role of the environment
- Web agents

- Haugeland, J. (editor). Artificial Intelligence: The Very Idea}, MIT Press, Cambridge, Massachusetts, 1985.
- Charniak, E. and D. McDermott. Introduction to Artificial Intelligence, Addison-Wesley, Reading, Massachusetts, 1985.
- Winston, P.H. Artificial Intelligence, Addison-Wesley, Reading, Massachusetts, third edition, 1992.
- Kurzweil, R. The Age of Intelligent Machines, MIT Press, Cambridge, Massachusetts, 1990.
- Rich, E. and K. Knight. Artificial Intelligence, McGraw-Hill, New York, Second edition, 1991.
- Luger, G.F and W. A. Stubblefield. Artificial Intelligence: Structures and Strategies for Complex Problem Solving, Benjamin/Cummings, Redwood City, California, 2/E, 1993.
- Russell, S and P. Norvig. Artificial Intelligence: A Modern Approach, Prentice Hall, Upper Saddle River, New Jersey, 1995.

- A set of pairs <state, action> exhaustively describing all possible states and the right action leading to the goal.
- Knowledge to model the states of the world and how the agent actions would change them (state transitions).
- Simulation of how various agent actions change the world, so that
the
right
sequence can be found. This is achieved by
*searching the state space*.

- States (legal, initial)
- Operators (state transitions)
- Goal state (or test for the goal state)

- Given: Initial and Goal states
- Find: a sequence of operators (state transitions) which transform and the initial state into the goal state.

- Finding a path between two towns (path in a directed graph)
- Solving the farmer, wolf, cabbage and goat problem
- Solving the 8-puzzle problem
- Solving the 8-queens problem
- Searching the web

- Search
- Extending a state (node)
- Selecting a successor node
- Search tree
- Algorithms: uninformed (exhaustive, blind), informed (heuristic)
- Evaluating performance:
- Optimality: path cost
- Search complexity: time complexity - number of visited nodes, space complexity - maximal length of the queue
- Completeness
- Constraint satisfaction (Lecture #4)

- General approach - using a queue
- Depth-first search - adding the new node in the beginning of the queue
- Depth-bound search
- Iterative deepening
- Breadth-first search - adding the new node at the end of the queue
- Uniform cost search (sorting the queue by the cost of the path)

- Many problems have exponential complexity
- We know the path cost form the initial state to the current one. Knowing the path cost to the goal would help making the right decision when selecting the successor node.
- Heuristic function
*h(n)*= an evaluation (approximation) of the path cost from node*n*to the goal.

- Best-first search: sorting the queue according to h(n). Efficient, complete, but not optimal.
- Beam search: using a parameter
*n*. Limiting the size of the queue by choosing the*n*best nodes. Memory efficient, but incomplete and not optimal. If*n*=1: Hill-climbing search. - A* search: minimizing the total path cost. The queue is sorted by
*f(n)=g(n)+h(n)*, where (n) is the path cost from the initial state to n. A combination between uniform cost (based on*g(n)*) and best first search (based on*h(n)*).

- Completeness if the branches are finite and the cost of each transition is positive.
- Optimality when an
*admissible*heuristic is used.*h(n)*is admissible if*h(n)*=< path cost from n to the goal. - Worst case complexity:
*O(b*^{d})

- Example: 8-puzzle problem. average branching factor = 3, average
path
to
the goal = 20 transitions. Number of states = 9!=362,888. Exhaustive
search
explores 3
^{20}states. Heuristics:*h*= number of misplaced tiles,_{1}(n)*h*= city block distance._{2}(n) - Comparing heuristics: dominant heuristics,
*h*._{2}(n) >= h_{1}(n) - Finding heuristics:
- Relaxing the problem restrictions (simplifying the problem), e.g. the rules for moving tiles.
- Learning heuristics from experience: learning a function as weighted sum of features (quantitative properties of the states).

*Distance or similarity function*defines what's learned.*Euclidean distance*(for*numeric*attributes): D(X,Y) = sqrt[(x_{1}-y_{1})^{2}+ (x_{2}-y_{2})^{2}+ ... + (x_{n}-y_{n})^{2}], where X = {x_{1}, x_{2}, ..., x_{n}}, Y = {y_{1}, y_{2}, ..., y_{n}}.- Cosine similarity
(dot product when normalized to unit length): Sim(X,Y) = x
_{1}.y_{1}+ x_{2}.y_{2}+ ... + x_{n}.y_{n} - Other popular metric:
*city-block*distance. D(X,Y) = |x_{1}-y_{1}| + |x_{2}-y_{2}| + ... + |x_{n}-y_{n}|. - As different attributes use diferent scales,
*normalization*is required. V_{norm}= (V-V_{min}) / (V_{max}- V_{min}). Thus V_{norm}is within [0,1]. - Nominal attributes: number of differences, i.e. city block
distance, where |x
_{i}-y_{i}| = 0 (x_{i}=y_{i}) or 1 (x_{i}<>y_{i}). - Missing attributes: assumed to be maximally distant (given normalized attributes).

- Example: weather data

ID outlook temp humidity windy play 1 sunny hot high false no 2 sunny hot high true no 3 overcast hot high false yes 4 rainy mild high false yes 5 rainy cool normal false yes 6 rainy cool normal true no 7 overcast cool normal true yes 8 sunny mild high false no 9 sunny cool normal false yes 10 rainy mild normal false yes 11 sunny mild normal true yes 12 overcast mild high true yes 13 overcast hot normal false yes 14 rainy mild high true no X sunny cool high true

- Discussion
- Instance space: Voronoi diagram
- 1-NN is very accurate but also slow: scans entire training data to derive a prediction (possible improvements: use a sample)
- Assumes all attributes are equally important. Remedy: attribute selection or weights (see attribute relevance).
- Dealing with noise (wrong values of some attributes)
- Taking a majority vote over the k nearest neighbors (k-NN).
- Removing noisy instances from dataset (difficult!)

- Numeric class attribute: take mean of the class values the k nearest neighbors.
- k-NN has been used by statisticians since early 1950s. Question: k=?
- Distance weighted k-NN:
- Weight each vote or class value (for numeric) with the distance.
- For example: instead of summing up votes, sum up 1 / D(X,Y)
or 1 / D(X,Y)
^{2} - Then it makes sense to use all instances (k=n).

ID | 2 | 8 | 9 | 11 |

D(X, ID) | 1 | 2 | 2 | 2 |

play | no | no | yes | yes |

- Basic assumptions
- Opposite of KNN: use all examples
- Attributes are assumed to be:
- equally important: all attributes have the same relevance to the classification task.
- statistically independent (given the class value): knowledge about the value of a particular attribute doesn't tell us anything about the value of another attribute (if the class is known).

- Although based on assumptions that are almost never correct, this scheme works well in practice!

- Probabilities of weather data

outlook temp humidity windy play sunny hot high false no sunny hot high true no overcast hot high false yes rainy mild high false yes rainy cool normal false yes rainy cool normal true no overcast cool normal true yes sunny mild high false no sunny cool normal false yes rainy mild normal false yes sunny mild normal true yes overcast mild high true yes overcast hot normal false yes rainy mild high true no - outlook = sunny [yes (2/9); no (3/5)];
- temperature = cool [yes (3/9); no (1/5)];
- humidity = high [yes (3/9); no (4/5)];
- windy = true [yes (3/9); no (3/5)];
- play = yes [(9/14)]
- play = no [(5/14)]
- New instance: [outlook=sunny, temp=cool, humidity=high, windy=true, play=?]
- Likelihood of the two classes (play=yes; play=no):
- yes = (2/9)*(3/9)*(3/9)*(3/9)*(9/14) = 0.0053;
- no = (3/5)*(1/5)*(4/5)*(3/5)*(5/14) = 0.0206;

- Conversion into probabilities by normalization:
- P(yes) = 0.0053 / (0.0053 + 0.0206) = 0.205
- P(no) = 0.0206 / (0.0053 + 0.0206) = 0.795

- Bayes theorem (Bayes rule)
- Probability of event H, given evidence E: P(H|E) = P(E|H) * P(H) / P(E);
- P(H): a
*priori*probability of H (probability of event*before*evidence has been seen); - P(H|E): a
*posteriori*(conditional) probability of H (probability of event*after*evidence has been seen);

- Bayes for classification
- What is the probability of the class given an instance?
- Evidence E = instance
- Event H = class value for instance
- Naïve Bayes assumption: evidence can be split into
independent parts (attributes of the instance).
- E = [A
_{1},A_{2},...,A_{n}] - P(E|H) = P(A
_{1}|H)*P(A_{2}|H)*...*P(A_{n}|H) - Bayes: P(H|E) = P(A
_{1}|H)*P(A_{2}|H)*...*P(A_{n}|H)*P(H) / P(E)

- E = [A
- Weather data:
- E = [outlook=sunny, temp=cool, humidity=high, windy=true]
- P(yes|E) = (outlook=sunny|yes) * P(temp=cool|yes) * P(humidity=high|yes) * P(windy=true|yes) * P(yes) / P(E) = (2/9)*(3/9)*(3/9)*(3/9)*(9/14) / P(E)

- The “zero-frequency problem”
- What if an attribute value doesn't occur with every class
value (e. g. humidity = high for class yes)?
- Probability will be zero, for example P(humidity=high|yes) = 0;
- A posteriori probability will also be zero: P(yes|E) = 0 (no matter how likely the other values are!)

- Remedy: add 1 to the count for every attribute value-class combination (i.e. use the Laplace estimator: (p+1) / (n+1) ).
- Result: probabilities will never be zero! (also stabilizes probability estimates)

- What if an attribute value doesn't occur with every class
value (e. g. humidity = high for class yes)?
- Missing values
- Calculating probabilities: instance is not included in frequency count for attribute value-class combination.
- Classification: attribute will be omitted from calculation
- Example: [outlook=?, temp=cool, humidity=high, windy=true, play=?]
- Likelihood of yes = (3/9)*(3/9)*(3/9)*(9/14) = 0.0238;
- Likelihood of no = (1/5)*(4/5)*(3/5)*(5/14) = 0.0343;

- P(yes) = 0.0238 / (0.0238 + 0.0343) = 0.41
- P(no) = 0.0343 / (0.0238 + 0.0343) = 0.59

- Numeric attributes
- Assumption: attributes have a
*normal*or*Gaussian*probability distribution (given the class) - Parameters involved: mean, standard deviation, density function for probabilty

- Assumption: attributes have a
- Discussion
- Naïve Bayes works surprisingly well (even if independence assumption is clearly violated).
- Why? Because classification doesn't require accurate
probability estimates as long as

maximum probability is assigned to correct class. - Adding too many redundant attributes will cause problems (e. g. identical attributes).
- Numeric attributes are often not normally distributed.
- Yet another problem: estimating prior probability is difficult.
- Advanced approaches: Bayesian networks.

- Represent the problem of sorting a four-element list as a state space search problem:
- Use state transitions that swap two neighboring elements
- Use state transitions that swap two neighboring elements, only if they are not in the correct order (like in bubble sort).
- Show how the problem is solved in both cases by simple 3-4 step hand solved examples.
- Write two Prolog programs that represents both types of state spaces.
- Implement an admissible heuristic function based on the number of pairs of neighboring elements that are not in the right order. Add the function to the representation (define a rule for h(Node, Value) and include it in the files with the state transitions). Explain what makes the function admissible and how it can be made inadmissible.
- Sort the list (4,3,2,1) (goal state: (1,2,3,4)) by using all uninformed (search1.pl) and heuristic (search2.pl) search algorithms: depth_first, breadth_first, iterative_deepening, uni_cost, best_first, a_star and beam (with n=10 and n=1) and collect statistics about their performance. Do this for both types of state spaces.
- Compare the performance of all those algorithms by the following criteria: time complexity (number of explored nodes), space complexity (max length of the queue) and optimality (put the results in a table and explain the reasons for getting each best and worst result for each performance criterion). Again do this for both types of state spaces (use two separate tables).
- Compare now the two types of state representations. Which space is bigger? Why? How does this affect the search algorithms' perfomance.

** Documentation and submission:** Write a report
describing
the

- Use the wumpus world shown in Figure1 of logic.pdf
- Represent the upper left corner of the wumpus world (rooms:
(3,1),
(3,2),
(4,1), (4,2)). For each room encode the knowledge about perceiving
stench
and breeze in the rooms neighboring to the beast and pit.
**Restriction:**put the perceptions in the "if" part of the implications (left of "->"). - Create two versions of the representation - one in PL and one in FOL. Then for each language prove the presence of the beast in room (3,1) given the agent perception in rooms (3,2), (4,1) and (4,2). Do this by:
- Refutation (using the deduction theorem) in two ways:
- through satisfiability test using sat.pl (use this for the PL version onlly).
- through resolution (inferring the empty clause) using resolve.pl
- Resolution using resolve.pl (inferring the goal clause directly as a resolvent)
- Explain the outputs of the Prolog queries in terms of PL or FOL semantics.
- Define the complete wumpus world (4x4 board) in FOL and translate it into clausal form (use wumpus_fol from logic.pl):
- Is the result a set of Horn clauses? If so, try to represent it as a Prolog program.
- Is it possible to represent all classes in Prolog?
- What happens with the negative unit clauses?
- Explain the problems and represent as much of the wumpus world as possible in Prolog.
- What can be inferred by using this program? Show the queries and explain the results.

** Documentation and submission:** Write a report
describing
the

Use the weather (tennis) data in

- Create a Bayesian network for the weather data using the approach taken in loandata.pl. Then use this network with bn.pl to:
- Decide on playing tennis on a day described as [outlook=sunny, temp=mild, humidity=normal, wind=weak].
- Find an attribute (if possible) that may be used to decide whether or not to play tennis (the class prediction based on this attribute value is the same no matter what the values of the other attributes are).
- Add taxonomies for the attributes so that they can be used as structural (see how this is done in loandata.pl). Then use version space learning (vs.pl) to create the largest possible concepts for each class ("yes" and "no") by putting an example from it in the beginning of the set of examples. In other words, find a good order of the examples (one starting with an example from class "yes" and one - from class "no"), so that the program converges after reading as many as possible examples. Use the following approach (see also Version space learning):
- If VS stops before reaching the end of the examples, because of inconsistency (empty G and S), reorder the examples so that the concept is learned before reaching the inconsistency.
- If VS stops before reaching the end of the examples, because of convergence (it finds a consistent hypothesis), add more examples so that you reach a concept that covers as many examples as possible.
- Use decision tree learning (id3.pl) with the weather data and:
- Create all possible decision trees (by varying the threshold)
and compute the total error (the proportion of misclassified training
examples) for each.

- Decide on playing tennis on a day described as [outlook=sunny, temp=mild, humidity=normal, wind=weak] using each of the trees. Compare the decisions.
- Compare the trees with the concepts learned with VS with respect to their coverage and whether ot not they are disjunctive.
- Use Naive Bayes (bayes.pl) and Nearest Neighbor (knn.pl with k=1,3,5 and knnw with k=14) and:
- Compute the error of each algorithm (and each parameter for knn) on the training data (tennis.pl).
- Compute the holdout error of each algorithm (and each parameter for knn) by splitting the weather data into 8-example training set and 6-example test set. Use Naive Bayes Lab, part IV as a guideline.
- Classify [outlook=sunny, temp=mild, humidity=normal, wind=weak] with each algorithm (and each parameter for knn).
- Compare (use a table to summarize) results and find out which
algorithm performs better.

- Compute the total error using cluster to classes evaluation on the weather data (tennis.pl). For how to compute the error see Clustering - part II.
- Add the example [outlook=sunny,
temp=mild, humidity=normal, wind=weak] to the data and classify it by
the majority class in the cluster it falls.

- Describe the approaches you use and explain the output from the Prolog queries.

** Documentation and submission:** Write a report
describing
the

The test includes the following topics:

- Uninformed and informed search algorithms
- Searching game trees
- Constraint satisfaction
- Propositional and First-order logic (languages, clausal form)
- Inference in PL and FOL (clause subsumption and resolution).

The test includes the following topics:

- Planning

- Bayesian Reasoning
- Bayesian Networks
- Decision Tree Learning
- Representing hypotheses, Generalization/Specialization with
taxonomies

- Version Space Learning
- Naive Bayes
- Nearest Neighbor
- Hypothesis Evaluation
- Clustering

Last updated: 5-10-2005