# Final Exam Study Guide

## Plan

The final exam is scheduled in the classroom on the date scheduled by the University (see the course schedule on Learning Suite). You will have a 3 hour time limit. The exam is closed book, and you may have no notes. No calculators or digital assistants (you won't need one). I think a well-prepared student will be able to complete the exam in two hours. You're going to do well!

The exam is comprehensive. The format is short answers, worked mathematical solutions, and possibly some T/F.

The difficulty level is comparable to the difficulty of the mid-term exam.

## Study

I recommend the following activities to study the topics covered by the exam:

• Review the lecture notes and identify the topics we emphasized in class. Focus on those topics listed below.
• Compare the homework solution keys to your homework assignments, and make sure that you understand the major principles covered in the homework problems.
• While you are reviewing the lecture notes, the homework solutions, and the topics in the mid-term study guide and this final study guide, I strongly encourage you to build the following lists:
• Problems (e.g., classification, clustering)
• Models (e.g., Naive Bayes, Gausian Mixture Model)
• Algorithms (e.g., the Viterbi algorithm, the Expectation Maximization (EM) algorithm)
• Theories (e.g., probability theory)
• Methodologies (e.g., feature engineering, unsupervised learning)
• Identify common themes and ideas within each of the lists. This will aid you in organizing your thoughts and making comparisons and contrasts.

## Topics

The final exam is comprehensive and will cover a subset of the following topics as well as topics from the mid-term exam study guide:

1. Steps of the Expectation Maximization algorithm
2. Mixture models
3. Mixture of multinomials model
4. NO deriving new EM algorithms for new models
5. Initialization for Expectation Maximization
6. Computing the likelihood of the data according to a model
7. Converting likelihood expressions into log-space
8. Interpreting Hierarchical Bayesian models
9. (Multivariate) Gaussian distributions
10. Gaussian Mixture Models (GMMs)
11. Sequence labeling
12. Part-of-speech tagging
13. Hidden Markov Models (HMMs)
14. Independence assumptions in HMMs
15. The Viterbi algorithm
16. Components of a speech recognition system
17. Application of HMMs in speech recognition
18. Application of GMMs in speech recognition
19. Formulating recognition problems in the source/channel (aka “noisy channel”) paradigm
20. Language models as Markov chains
21. Decoding as search
22. Beam search as an approximation to the Viterbi algorithm
23. The Monte Carlo principle
24. Gibbs Sampling
25. Justifying steps in the derivation of complete conditional distributions for Gibbs sampling
26. NO novel derivations of complete conditional distributions for Gibbs sampling
27. Document clustering with Gibbs sampling on a mixture of multinomials
28. Metrics for clustering
29. Topic modeling and topic discovery
30. Latent Dirichlet Allocation (LDA): the generative story and model
31. Inference in LDA using Gibbs sampling
32. Strengths and limitations of joint models
33. Strengths and limitations of conditional models
34. Answering conditional queries using a joint model versus using a conditional model directly
35. Maximum entropy classifiers / Logistic regression
36. NO derivations of gradients of the likelihood (using differential Calculus) for gradient descent / ascent learning of maximum entropy model parameters
37. The feature engineering cycle
38. Pros and cons of Naive Bayes versus Maximum entropy as classifiers 