Naive Bayes Classifier
- IAML5.1: Overview
 - IAML5.2: Bayesian classification
 - IAML5.3: Class model and the prior
 - IAML5.4: Role of denominator in Naive Bayes
 - IAML5.5: Probabilistic classifiers: generative vs discriminative
 - IAML5.6: Independence assumption in Naive Bayes
 - IAML5.7: Mutual independence vs conditional independence
 - IAML5.8: Naive Bayes for real-valued data
 - IAML5.9: Gaussian Naive Bayes classifier
 - IAML5.10: Naive Bayes decision boundary
 - IAML5.11: Example where Naive Bayes fails
 - IAML5.12: Naive Bayes for spam detection
 - IAML5.13: The zero-frequency problem
 - IAML5.14: Missing values in Naive Bayes
 
Decision Tree Learning
- IAML7.1 Decision Trees: an introduction
 - IAML7.2 Decision tree example
 - IAML7.3 Quinlan’s ID3 algorithm
 - IAML7.4 Decision tree: split purity
 - IAML7.5 Decision tree entropy
 - IAML7.6 Information gain
 - IAML7.7 Overfitting in decision trees
 - IAML7.8 Decision tree pruning
 - IAML7.9 Information gain ratio
 - IAML7.10 Decision trees are DNF formulas
 - IAML7.11 Decision trees and real-valued data
 - IAML7.12 Decision tree regression
 - IAML7.13 Pros and cons of decision trees
 - IAML7.14 Random forest algorithm
 - IAML7.15 Summary
 
Generalization and Evaluation
- IAML8.1 Generalization in machine learning
 - IAML8.2 Overfitting and underfitting
 - IAML8.3 Examples of overfitting and underfitting
 - IAML8.4 How to control overfitting
 - IAML8.5 Generalization error
 - IAML8.6 Estimating the generalization error
 - IAML8.7 Confidence interval for generalization
 - IAML8.8 Why we need validation sets
 - IAML8.9 Cross-validation
 - IAML8.10 Leave-one-out cross-validation
 - IAML8.11 Stratified sampling
 - IAML8.12 Evaluating classification and regression
 - IAML8.13 False positives and false negatives
 - IAML8.14 Classification error and accuracy
 - IAML8.15 When classification error is wrong
 - IAML8.16 Recall, precision, miss and false alarm
 - IAML8.17 Classification cost and utility
 - IAML8.18 Receiver Operating Characteristic (ROC) curve
 - IAML8.19 Evaluating regression: MSE, MAE, CC
 - IAML8.20 Mean squared error and outliers
 - IAML8.21 Mean absolute error (MAE)
 - IAML8.22 Correlation coefficient
 
k-Nearest Neighbor Algorithm
- kNN.1 Overview
 - kNN.2 Intuition for the nearest-neighbor method
 - kNN.3 Voronoi cells and decision boundary
 - kNN.4 Sensitivity to outliers
 - kNN.5 Nearest-neighbor classification algorithm
 - kNN.6 MNIST digit recognition
 - kNN.7 Nearest-neighbor regression algorithm
 - kNN.8 Nearest-neighbor regression example
 - kNN.9 Number of nearest neighbors to use
 - kNN.10 Similarity / distance measures
 - kNN.11 Breaking ties between nearest neighbors
 - kNN.12 Parzen windows, kernels and SVM
 - kNN.13 Pros and cons of nearest-neighbor methods
 - kNN.14 Computational complexity of finding nearest-neighbors
 - kNN.15 K-d tree algorithm
 - kNN.16 Locality sensitive hashing (LSH)
 - kNN.17 Inverted index
 
K-means Clustering
- Clustering 1: monothetic vs. polythetic
 - Clustering 2: soft vs. hard clustering
 - Clustering 3: overview of methods
 - Clustering 4: K-means clustering: how it works
 - Clustering 5: K-means objective and convergence
 - Clustering 6: how many clusters?
 - Clustering 7: intrinsic vs. extrinsic evaluation
 - Clustering 8: alignment and pair-based evaluation
 - Clustering 9: image representation
 
IR15 Web Search and PageRank
- Web search 1: more data = higher precision
 - Web search 2: big data beats clever algorithms
 - Web search 3: introduction to PageRank
 - Web search 4: PageRank algorithm: how it works
 - Web search 5: PageRank at convergence
 - Web search 6: PageRank using MapReduce
 - Web search 7: sink nodes in PageRank
 - Web search 8: hubs and authorities
 - Web search 9: link spam
 - Web search 10: anchor text
 
IR7 Inverted Indexing
- Indexing 1: what makes google fast
 - Indexing 2: inverted index
 - Indexing 3: sparseness and linear merge
 - Indexing 4: phrases and proximity
 - Indexing 5: XML, structure and metadata
 - Indexing 6: delta encoding (compression)
 - Indexing 7: v-byte encoding (compression)
 - Indexing 8: doc-at-a-time query execution
 - Indexing 9: doc-at-a-time worst case
 - Indexing 10: term-at-a-time query execution
 - Indexing 11: query execution tradeoffs
 - Indexing 12: expected cost of execution
 - Indexing 13: heuristics for faster search
 - Indexing 14: structured query execution
 - Indexing 15: index construction
 - Indexing 16: MapReduce
 - Indexing 17: distributed search
 
IR13 Evaluating Search Engines
IR13 Evaluating Search Engines
- Evaluation 1: overview
 - Evaluation 2: research hypotheses
 - Evaluation 3: effectiveness vs. efficiency
 - Evaluation 4: Cranfield paradigm
 - Evaluation 5: relevance judgments
 - Evaluation 6: precision and recall
 - Evaluation 7: why we can’t use accuracy
 - Evaluation 8: F-measure
 - Evaluation 9: when recall/precision is misleading
 - Evaluation 10: recall and precision over ranks
 - Evaluation 11: interpolated recall-precision plot
 - Evaluation 12: mean average precision
 - Evaluation 13: MAP vs NDCG
 - Evaluation 14: query logs and click deviation
 - Evaluation 15: binary preference and Kendall tau
 - Evaluation 16: hypothesis testing
 - Evaluation 17: statistical significance test
 - Evaluation 18: the sign test
 - Evaluation 19: training / testing splits
 
IR10 Crawling the Web
- Web crawling 1: sources of data
 - Web crawling 2: blogs, tweets, news feeds
 - Web crawling 3: the algorithm
 - Web crawling 4: inside an HTTP request
 - Web crawling 5: robots.txt
 - Web crawling 6: keeping index fresh