The mid-term exam is scheduled in the Testing Center for a Thursday, Friday, and Saturday (see the course schedule on Learning Suite). You will have a 3 hour time limit. The exam is closed book, and you may have no notes. No calculators or digital assistants (you won't need one). I think a well-prepared student will be able to complete the exam in two hours. You’re going to do well!

The format is short answers, worked mathematical solutions, and possibly some T/F.

You will be expected to show your work. You will be graded not only on the correctness of your answer, but also on the clarity with which you express your rationale for your answer; plan to be neat. It is your job to make your understanding clear to us; non-neat work is likely to earn a lower grade. If using a pencil (rather than a pen) helps you be neat, please plan accordingly.

I recommend the following activities to study the topics covered by the exam:

- Review the lecture notes and identify the topics we emphasized in class. Focus on those listed below.
- Compare the homework solution keys to your homework assignments, and make sure that you understand the major principles covered in the homework problems.

The exam will cover a subset of the following topics:

- Probability theory: sample spaces, sigma algebras, probability functions
- The three axioms of probability
- NO proofs involving set theory
- Definition of conditional probability
- Marginalization, Law of Total Probability
- Product rule, chain rule
- Independence and conditional independence of events
- Random variables
- Independence and conditional independence of random variables
- Bayes rule
- Basic discrete distributions: bernoulli, binomial, categorical, multinomial
- Parametric distribution; parameters of distributions
- Expected value of a random variable
- Querying joint distributions
- Efficiency of storage in joint distributions as tables
- Rationale for directed grpahical models
- Directed graphical models as joint distributions
- Visual language of directed graphical models
- Reading independence and conditional independence in a directed graphical model
- Reading influence / information flow in a directed graphical model
- VERY IMPORTANT: Answering questions on directed graphical models: joint queries, marginal queries, conditional queries
- Efficiency of answering conditional queries
- Text classification
- Other kinds of classification problems
- “Bag-of-words” assumption
- VERY IMPORTANT: Naive Bayes as a directed graphical model, classifying with Naive Bayes, shortcomings of Naive Bayes models
- Various event models for Naive Bayes: multivariate bernoulli, multivariate categorical, multinomial (especially multivariate categorical)
- Class-conditional language models as classifiers
- Evaluating classifiers
- Maximum likelihood estimation for the categorical distribution
- NO Lagrange Multipliers
- The purpose and shapes and parametrization of the Beta distribution
- The purpose and shapes and parametrization of the Dirichlet distribution
- NO analytical forms of the Beta and Dirichlet distribution
- Beta-Binomial conjugacy
- Dirichlet-Multinomial conjugacy
- NO Completing the integral
- Point estimates to summarize the posterior distribution
- Maximum a Posteriori (MAP) parameter estimation for the categorical distribution
- Relationship between MAP estimation and add-one smoothing
- Reading generative stories from a directed graphical model
- Plate notation
- High-level steps of the Expectation Maximization algorithm