Victor Lavrenko: YouTube

K-means Clustering

K-means Clustering

  • Clustering 1: monothetic vs. polythetic
  • Clustering 2: soft vs. hard clustering
  • Clustering 3: overview of methods
  • Clustering 4: K-means clustering: how it works
  • Clustering 5: K-means objective and convergence
  • Clustering 6: how many clusters?
  • Clustering 7: intrinsic vs. extrinsic evaluation
  • Clustering 8: alignment and pair-based evaluation
  • Clustering 9: image representation

IR15 Web Search and PageRank

IR15 Web Search and PageRank

  • Web search 1: more data = higher precision
  • Web search 2: big data beats clever algorithms
  • Web search 3: introduction to PageRank
  • Web search 4: PageRank algorithm: how it works
  • Web search 5: PageRank at convergence
  • Web search 6: PageRank using MapReduce
  • Web search 7: sink nodes in PageRank
  • Web search 8: hubs and authorities
  • Web search 9: link spam
  • Web search 10: anchor text

IR7 Inverted Indexing

IR7 Inverted Indexing

  • Indexing 1: what makes google fast
  • Indexing 2: inverted index
  • Indexing 3: sparseness and linear merge
  • Indexing 4: phrases and proximity
  • Indexing 5: XML, structure and metadata
  • Indexing 6: delta encoding (compression)
  • Indexing 7: v-byte encoding (compression)
  • Indexing 8: doc-at-a-time query execution
  • Indexing 9: doc-at-a-time worst case
  • Indexing 10: term-at-a-time query execution
  • Indexing 11: query execution tradeoffs
  • Indexing 12: expected cost of execution
  • Indexing 13: heuristics for faster search
  • Indexing 14: structured query execution
  • Indexing 15: index construction
  • Indexing 16: MapReduce
  • Indexing 17: distributed search

IR13 Evaluating Search Engines

IR13 Evaluating Search Engines

  • Evaluation 1: overview
  • Evaluation 2: research hypotheses
  • Evaluation 3: effectiveness vs. efficiency
  • Evaluation 4: Cranfield paradigm
  • Evaluation 5: relevance judgments
  • Evaluation 6: precision and recall
  • Evaluation 7: why we can’t use accuracy
  • Evaluation 8: F-measure
  • Evaluation 9: when recall/precision is misleading
  • Evaluation 10: recall and precision over ranks
  • Evaluation 11: interpolated recall-precision plot
  • Evaluation 12: mean average precision
  • Evaluation 13: MAP vs NDCG
  • Evaluation 14: query logs and click deviation
  • Evaluation 15: binary preference and Kendall tau
  • Evaluation 16: hypothesis testing
  • Evaluation 17: statistical significance test
  • Evaluation 18: the sign test
  • Evaluation 19: training / testing splits

IR10 Crawling the Web

IR10 Crawling the Web

  • Web crawling 1: sources of data
  • Web crawling 2: blogs, tweets, news feeds
  • Web crawling 3: the algorithm
  • Web crawling 4: inside an HTTP request
  • Web crawling 5: robots.txt
  • Web crawling 6: keeping index fresh

Leave a Reply

Your email address will not be published. Required fields are marked *