Naive Bayes Classifier
- IAML5.1: Overview
- IAML5.2: Bayesian classification
- IAML5.3: Class model and the prior
- IAML5.4: Role of denominator in Naive Bayes
- IAML5.5: Probabilistic classifiers: generative vs discriminative
- IAML5.6: Independence assumption in Naive Bayes
- IAML5.7: Mutual independence vs conditional independence
- IAML5.8: Naive Bayes for real-valued data
- IAML5.9: Gaussian Naive Bayes classifier
- IAML5.10: Naive Bayes decision boundary
- IAML5.11: Example where Naive Bayes fails
- IAML5.12: Naive Bayes for spam detection
- IAML5.13: The zero-frequency problem
- IAML5.14: Missing values in Naive Bayes
Decision Tree Learning
- IAML7.1 Decision Trees: an introduction
- IAML7.2 Decision tree example
- IAML7.3 Quinlan’s ID3 algorithm
- IAML7.4 Decision tree: split purity
- IAML7.5 Decision tree entropy
- IAML7.6 Information gain
- IAML7.7 Overfitting in decision trees
- IAML7.8 Decision tree pruning
- IAML7.9 Information gain ratio
- IAML7.10 Decision trees are DNF formulas
- IAML7.11 Decision trees and real-valued data
- IAML7.12 Decision tree regression
- IAML7.13 Pros and cons of decision trees
- IAML7.14 Random forest algorithm
- IAML7.15 Summary
Generalization and Evaluation
- IAML8.1 Generalization in machine learning
- IAML8.2 Overfitting and underfitting
- IAML8.3 Examples of overfitting and underfitting
- IAML8.4 How to control overfitting
- IAML8.5 Generalization error
- IAML8.6 Estimating the generalization error
- IAML8.7 Confidence interval for generalization
- IAML8.8 Why we need validation sets
- IAML8.9 Cross-validation
- IAML8.10 Leave-one-out cross-validation
- IAML8.11 Stratified sampling
- IAML8.12 Evaluating classification and regression
- IAML8.13 False positives and false negatives
- IAML8.14 Classification error and accuracy
- IAML8.15 When classification error is wrong
- IAML8.16 Recall, precision, miss and false alarm
- IAML8.17 Classification cost and utility
- IAML8.18 Receiver Operating Characteristic (ROC) curve
- IAML8.19 Evaluating regression: MSE, MAE, CC
- IAML8.20 Mean squared error and outliers
- IAML8.21 Mean absolute error (MAE)
- IAML8.22 Correlation coefficient
k-Nearest Neighbor Algorithm
- kNN.1 Overview
- kNN.2 Intuition for the nearest-neighbor method
- kNN.3 Voronoi cells and decision boundary
- kNN.4 Sensitivity to outliers
- kNN.5 Nearest-neighbor classification algorithm
- kNN.6 MNIST digit recognition
- kNN.7 Nearest-neighbor regression algorithm
- kNN.8 Nearest-neighbor regression example
- kNN.9 Number of nearest neighbors to use
- kNN.10 Similarity / distance measures
- kNN.11 Breaking ties between nearest neighbors
- kNN.12 Parzen windows, kernels and SVM
- kNN.13 Pros and cons of nearest-neighbor methods
- kNN.14 Computational complexity of finding nearest-neighbors
- kNN.15 K-d tree algorithm
- kNN.16 Locality sensitive hashing (LSH)
- kNN.17 Inverted index
K-means Clustering
- Clustering 1: monothetic vs. polythetic
- Clustering 2: soft vs. hard clustering
- Clustering 3: overview of methods
- Clustering 4: K-means clustering: how it works
- Clustering 5: K-means objective and convergence
- Clustering 6: how many clusters?
- Clustering 7: intrinsic vs. extrinsic evaluation
- Clustering 8: alignment and pair-based evaluation
- Clustering 9: image representation
IR15 Web Search and PageRank
- Web search 1: more data = higher precision
- Web search 2: big data beats clever algorithms
- Web search 3: introduction to PageRank
- Web search 4: PageRank algorithm: how it works
- Web search 5: PageRank at convergence
- Web search 6: PageRank using MapReduce
- Web search 7: sink nodes in PageRank
- Web search 8: hubs and authorities
- Web search 9: link spam
- Web search 10: anchor text
IR7 Inverted Indexing
- Indexing 1: what makes google fast
- Indexing 2: inverted index
- Indexing 3: sparseness and linear merge
- Indexing 4: phrases and proximity
- Indexing 5: XML, structure and metadata
- Indexing 6: delta encoding (compression)
- Indexing 7: v-byte encoding (compression)
- Indexing 8: doc-at-a-time query execution
- Indexing 9: doc-at-a-time worst case
- Indexing 10: term-at-a-time query execution
- Indexing 11: query execution tradeoffs
- Indexing 12: expected cost of execution
- Indexing 13: heuristics for faster search
- Indexing 14: structured query execution
- Indexing 15: index construction
- Indexing 16: MapReduce
- Indexing 17: distributed search
IR13 Evaluating Search Engines
IR13 Evaluating Search Engines
- Evaluation 1: overview
- Evaluation 2: research hypotheses
- Evaluation 3: effectiveness vs. efficiency
- Evaluation 4: Cranfield paradigm
- Evaluation 5: relevance judgments
- Evaluation 6: precision and recall
- Evaluation 7: why we can’t use accuracy
- Evaluation 8: F-measure
- Evaluation 9: when recall/precision is misleading
- Evaluation 10: recall and precision over ranks
- Evaluation 11: interpolated recall-precision plot
- Evaluation 12: mean average precision
- Evaluation 13: MAP vs NDCG
- Evaluation 14: query logs and click deviation
- Evaluation 15: binary preference and Kendall tau
- Evaluation 16: hypothesis testing
- Evaluation 17: statistical significance test
- Evaluation 18: the sign test
- Evaluation 19: training / testing splits
IR10 Crawling the Web
- Web crawling 1: sources of data
- Web crawling 2: blogs, tweets, news feeds
- Web crawling 3: the algorithm
- Web crawling 4: inside an HTTP request
- Web crawling 5: robots.txt
- Web crawling 6: keeping index fresh
This is great learning materials. Thanks. Just looked at the Generalization Error Lecture. Great explanation.