{"id":10765,"date":"2020-03-25T13:48:45","date_gmt":"2020-03-25T13:48:45","guid":{"rendered":"http:\/\/blog.bachi.net\/?p=10765"},"modified":"2020-05-04T14:42:52","modified_gmt":"2020-05-04T14:42:52","slug":"victor-lavrenko-k-means-clustering","status":"publish","type":"post","link":"https:\/\/blog.bachi.net\/?p=10765","title":{"rendered":"Victor Lavrenko: YouTube"},"content":{"rendered":"<p><!-- ------------------------------------------------------------------------------- --><\/p>\n<h2>Naive Bayes Classifier<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/playlist?list=PLBv09BD7ez_6CxkuiFTbL3jsn2Qd1IU7B\">Naive Bayes Classifier<\/a><\/p>\n<ul>\n<li>IAML5.1: Overview<\/li>\n<li>IAML5.2: Bayesian classification<\/li>\n<li>IAML5.3: Class model and the prior<\/li>\n<li>IAML5.4: Role of denominator in Naive Bayes<\/li>\n<li>IAML5.5: Probabilistic classifiers: generative vs discriminative<\/li>\n<li>IAML5.6: Independence assumption in Naive Bayes<\/li>\n<li>IAML5.7: Mutual independence vs conditional independence<\/li>\n<li>IAML5.8: Naive Bayes for real-valued data<\/li>\n<li>IAML5.9: Gaussian Naive Bayes classifier<\/li>\n<li>IAML5.10: Naive Bayes decision boundary<\/li>\n<li>IAML5.11: Example where Naive Bayes fails<\/li>\n<li>IAML5.12: Naive Bayes for spam detection<\/li>\n<li>IAML5.13: The zero-frequency problem<\/li>\n<li>IAML5.14: Missing values in Naive Bayes<\/li>\n<\/ul>\n<p><!-- ------------------------------------------------------------------------------- --><\/p>\n<h2>Decision Tree Learning<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/playlist?list=PLBv09BD7ez_4_UoYeGrzvqveIR_USBEKD\">Decision Tree Learning<\/a><\/p>\n<ul>\n<li>IAML7.1 Decision Trees: an introduction<\/li>\n<li>IAML7.2 Decision tree example<\/li>\n<li>IAML7.3 Quinlan&#8217;s ID3 algorithm<\/li>\n<li>IAML7.4 Decision tree: split purity<\/li>\n<li>IAML7.5 Decision tree entropy<\/li>\n<li>IAML7.6 Information gain<\/li>\n<li>IAML7.7 Overfitting in decision trees<\/li>\n<li>IAML7.8 Decision tree pruning<\/li>\n<li>IAML7.9 Information gain ratio<\/li>\n<li>IAML7.10 Decision trees are DNF formulas<\/li>\n<li>IAML7.11 Decision trees and real-valued data<\/li>\n<li>IAML7.12 Decision tree regression<\/li>\n<li>IAML7.13 Pros and cons of decision trees<\/li>\n<li>IAML7.14 Random forest algorithm<\/li>\n<li>IAML7.15 Summary<\/li>\n<\/ul>\n<p><!-- ------------------------------------------------------------------------------- --><\/p>\n<h2>Generalization and Evaluation<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/playlist?list=PLBv09BD7ez_50pj5kcKFYee7QPcg3ImCV\">Generalization and Evaluation<\/a><\/p>\n<ul>\n<li>IAML8.1 Generalization in machine learning<\/li>\n<li>IAML8.2 Overfitting and underfitting<\/li>\n<li>IAML8.3 Examples of overfitting and underfitting<\/li>\n<li>IAML8.4 How to control overfitting<\/li>\n<li>IAML8.5 Generalization error<\/li>\n<li>IAML8.6 Estimating the generalization error<\/li>\n<li>IAML8.7 Confidence interval for generalization<\/li>\n<li>IAML8.8 Why we need validation sets<\/li>\n<li>IAML8.9 Cross-validation<\/li>\n<li>IAML8.10 Leave-one-out cross-validation<\/li>\n<li>IAML8.11 Stratified sampling<\/li>\n<li>IAML8.12 Evaluating classification and regression<\/li>\n<li>IAML8.13 False positives and false negatives<\/li>\n<li>IAML8.14 Classification error and accuracy<\/li>\n<li>IAML8.15 When classification error is wrong<\/li>\n<li>IAML8.16 Recall, precision, miss and false alarm<\/li>\n<li>IAML8.17 Classification cost and utility<\/li>\n<li>IAML8.18 Receiver Operating Characteristic (ROC) curve<\/li>\n<li>IAML8.19 Evaluating regression: MSE, MAE, CC<\/li>\n<li>IAML8.20 Mean squared error and outliers<\/li>\n<li>IAML8.21 Mean absolute error (MAE)<\/li>\n<li>IAML8.22 Correlation coefficient<\/li>\n<\/ul>\n<p><!-- ------------------------------------------------------------------------------- --><\/p>\n<h2>k-Nearest Neighbor Algorithm<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/playlist?list=PLBv09BD7ez_48heon5Az-TsyoXVYOJtDZ\">k-Nearest Neighbor Algorithm<\/a><\/p>\n<ul>\n<li>kNN.1 Overview<\/li>\n<li>kNN.2 Intuition for the nearest-neighbor method<\/li>\n<li>kNN.3 Voronoi cells and decision boundary<\/li>\n<li>kNN.4 Sensitivity to outliers<\/li>\n<li>kNN.5 Nearest-neighbor classification algorithm<\/li>\n<li>kNN.6 MNIST digit recognition<\/li>\n<li>kNN.7 Nearest-neighbor regression algorithm<\/li>\n<li>kNN.8 Nearest-neighbor regression example<\/li>\n<li>kNN.9 Number of nearest neighbors to use<\/li>\n<li>kNN.10 Similarity \/ distance measures<\/li>\n<li>kNN.11 Breaking ties between nearest neighbors<\/li>\n<li>kNN.12 Parzen windows, kernels and SVM<\/li>\n<li>kNN.13 Pros and cons of nearest-neighbor methods<\/li>\n<li>kNN.14 Computational complexity of finding nearest-neighbors<\/li>\n<li>kNN.15 K-d tree algorithm<\/li>\n<li>kNN.16 Locality sensitive hashing (LSH)<\/li>\n<li>kNN.17 Inverted index<\/li>\n<\/ul>\n<p><!-- ------------------------------------------------------------------------------- --><\/p>\n<h2>K-means Clustering<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/playlist?list=PLBv09BD7ez_6cgkSUAqBXENXEhCkb_2wl\">K-means Clustering<\/a><\/p>\n<ul>\n<li>Clustering 1: monothetic vs. polythetic<\/li>\n<li>Clustering 2: soft vs. hard clustering<\/li>\n<li>Clustering 3: overview of methods<\/li>\n<li>Clustering 4: K-means clustering: how it works<\/li>\n<li>Clustering 5: K-means objective and convergence<\/li>\n<li>Clustering 6: how many clusters?<\/li>\n<li>Clustering 7: intrinsic vs. extrinsic evaluation<\/li>\n<li>Clustering 8: alignment and pair-based evaluation<\/li>\n<li>Clustering 9: image representation<\/li>\n<\/ul>\n<p><!-- ------------------------------------------------------------------------------- --><\/p>\n<h2>IR15 Web Search and PageRank<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/playlist?list=PLBv09BD7ez_6b7342XmYBMzPXwJ2YAWRN\">IR15 Web Search and PageRank<\/a><\/p>\n<ul>\n<li>Web search 1: more data = higher precision<\/li>\n<li>Web search 2: big data beats clever algorithms<\/li>\n<li>Web search 3: introduction to PageRank<\/li>\n<li>Web search 4: PageRank algorithm: how it works<\/li>\n<li>Web search 5: PageRank at convergence<\/li>\n<li>Web search 6: PageRank using MapReduce<\/li>\n<li>Web search 7: sink nodes in PageRank<\/li>\n<li>Web search 8: hubs and authorities<\/li>\n<li>Web search 9: link spam<\/li>\n<li>Web search 10: anchor text<\/li>\n<\/ul>\n<p><!-- ------------------------------------------------------------------------------- --><\/p>\n<h2>IR7 Inverted Indexing<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/playlist?list=PLBv09BD7ez_448q9kRfZRxYb3cbeEanRb\">IR7 Inverted Indexing<\/a><\/p>\n<ul>\n<li>Indexing 1: what makes google fast<\/li>\n<li>Indexing 2: inverted index<\/li>\n<li>Indexing 3: sparseness and linear merge<\/li>\n<li>Indexing 4: phrases and proximity<\/li>\n<li>Indexing 5: XML, structure and metadata<\/li>\n<li>Indexing 6: delta encoding (compression)<\/li>\n<li>Indexing 7: v-byte encoding (compression)<\/li>\n<li>Indexing 8: doc-at-a-time query execution<\/li>\n<li>Indexing 9: doc-at-a-time worst case<\/li>\n<li>Indexing 10: term-at-a-time query execution<\/li>\n<li>Indexing 11: query execution tradeoffs<\/li>\n<li>Indexing 12: expected cost of execution<\/li>\n<li>Indexing 13: heuristics for faster search<\/li>\n<li>Indexing 14: structured query execution<\/li>\n<li>Indexing 15: index construction<\/li>\n<li>Indexing 16: MapReduce<\/li>\n<li>Indexing 17: distributed search<\/li>\n<\/ul>\n<p><!-- ------------------------------------------------------------------------------- --><\/p>\n<h2>IR13 Evaluating Search Engines<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/playlist?list=PLBv09BD7ez_6nqE9YU9bQXpjJ5jJ1Kgr9\">IR13 Evaluating Search Engines<\/a><\/p>\n<ul>\n<li>Evaluation 1: overview<\/li>\n<li>Evaluation 2: research hypotheses<\/li>\n<li>Evaluation 3: effectiveness vs. efficiency<\/li>\n<li>Evaluation 4: Cranfield paradigm<\/li>\n<li>Evaluation 5: relevance judgments<\/li>\n<li>Evaluation 6: precision and recall<\/li>\n<li>Evaluation 7: why we can&#8217;t use accuracy<\/li>\n<li>Evaluation 8: F-measure<\/li>\n<li>Evaluation 9: when recall\/precision is misleading<\/li>\n<li>Evaluation 10: recall and precision over ranks<\/li>\n<li>Evaluation 11: interpolated recall-precision plot<\/li>\n<li>Evaluation 12: mean average precision<\/li>\n<li>Evaluation 13: MAP vs NDCG<\/li>\n<li>Evaluation 14: query logs and click deviation<\/li>\n<li>Evaluation 15: binary preference and Kendall tau<\/li>\n<li>Evaluation 16: hypothesis testing<\/li>\n<li>Evaluation 17: statistical significance test<\/li>\n<li>Evaluation 18: the sign test<\/li>\n<li>Evaluation 19: training \/ testing splits<\/li>\n<\/ul>\n<p><!-- ------------------------------------------------------------------------------- --><\/p>\n<h2>IR10 Crawling the Web<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/playlist?list=PLBv09BD7ez_5_6dtunw6nNqyQwvLeixNz\">IR10 Crawling the Web<\/a><\/p>\n<ul>\n<li>Web crawling 1: sources of data<\/li>\n<li>Web crawling 2: blogs, tweets, news feeds<\/li>\n<li>Web crawling 3: the algorithm<\/li>\n<li>Web crawling 4: inside an HTTP request<\/li>\n<li>Web crawling 5: robots.txt<\/li>\n<li>Web crawling 6: keeping index fresh<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Naive Bayes Classifier Naive Bayes Classifier IAML5.1: Overview IAML5.2: Bayesian classification IAML5.3: Class model and the prior IAML5.4: Role of denominator in Naive Bayes IAML5.5: Probabilistic classifiers: generative vs discriminative IAML5.6: Independence assumption in Naive Bayes IAML5.7: Mutual independence vs conditional independence IAML5.8: Naive Bayes for real-valued data IAML5.9: Gaussian Naive Bayes classifier IAML5.10: Naive [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-10765","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.bachi.net\/index.php?rest_route=\/wp\/v2\/posts\/10765","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.bachi.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.bachi.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.bachi.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.bachi.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10765"}],"version-history":[{"count":7,"href":"https:\/\/blog.bachi.net\/index.php?rest_route=\/wp\/v2\/posts\/10765\/revisions"}],"predecessor-version":[{"id":11031,"href":"https:\/\/blog.bachi.net\/index.php?rest_route=\/wp\/v2\/posts\/10765\/revisions\/11031"}],"wp:attachment":[{"href":"https:\/\/blog.bachi.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10765"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.bachi.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10765"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.bachi.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10765"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}