CS 4422/7263 - Information Retrieval (2021 Fall)

Reading materials:
  • IIR: Introduction to Information Retrieval
  • SE: Search Engines - Information Retrieval in Practice

Schedule

Week of Topics Lecture / Reading Assingment
Aug. 16th
  • IR introduction
    • Search engines
    • Document retrieval pipeline
    • Challenges in IR
  • Crawling basics
    • HTTP Request
    • Web exploration strategies
    • Politeness policy
    • Duplicates
[slidess:IR-intro]
[slides:crawler]
[SE] chapter 1
[IIR] chapter 20
Aug. 23rd
  • Text processing
    • Zipf's law
    • Heap's Law
    • Regular expression
    • Parsing structured documents
    • Tokenization / Lemmatization / Stemming
  • Indexing Basics
    • Inverted index
    • Dicionary
  • Index construction
    • Index compression
    • Sort-based, Merge sort
    • Parallelism and distribution
[slides:text-processing]
Porter's Algorithm
[IIR] ch. 2
PA1
Aug. 30th
  • Search with inverted-index
  • Set-based retrieval models
    • Boolean retrieval
    • Scoring docuemnts
  • Algebraic retrieval models
    • Document-Term matrix
    • Term weighting using TF.IDF
    • Vector space models (VSM)
    • Cosine similarity
    • Latent semantic Analysis (LSA)
[slides:indexing]
[slides:retrieval-model-I]
[IIR] ch. 4, 5
PS1
Sep. 6th
No class on 6th
  • Elasticsearch
[slides:elasticsearch]
PA2
Sep. 13th
  • Probabilistic retrieval models
    • Bayes' rule
    • Binary independence model (BIM)
    • Okapi BM25
    • Language models
    • Query-likelihood model
    • Smoothing techniques
[slides:retrieval-model-II]
[IIR] ch. 6
Sep. 20th
  • Evaluation metrics for IR
    • confusion matrix
    • precision, recall, F-score
    • precision@K, AP, R-prec
    • ROC, Precision-recall curves
    • nDCG
  • Test collections
    • Cranfield paradigm
    • TREC
[slides:evaluation]
TREC evaluation measures
[IIR] ch. 8
PS2
Sep. 27th
  • Link graph
    • Webgraph
    • Markov Chains
    • PageRank
  • Information extraction and knowledge graph
[slides:link-analysis]
[slides:IE-KG]
[slides:KDD20' DM-KG]
[IIR] ch. 21
Oct. 4th
  • Relevance feedback and query transformation
    • spell checking and suggestions
    • query expansion
    • relevance feedback
    • relevance feedback
    • controlled vocabulary
[slides:query-manipulation],
Survey of Automatic Spelling Correction
[IIR] ch. 9
PS3
Oct. 11th
  • Text classification
    • ML approaches for text
    • Naive Bayes classifier
    • kNN, Rocchio
    • Decision tree
[slides:text-classification]
[slides:Support Vector Machines]
[YT: Information theory A B C]
[IIR] ch. 14, 15
PA3
Oct. 18th
  • Document clustering
    • clustering
    • topic modeling
    • LDA, PCA
[slides:clustering]
[IIR] ch. 16, 17
PS4
Oct. 25th
  • Recommender Systems
    • collaborative filtering
    • content-based filtering
[slides:Recommenders]
Nov. 1st
  • Conversational Recommender Systems (CRS)
[slides:CRS]
Nov. 8th
  • Introduction to Neural Networks
    • Word vectors
    • Backpropagation
    • Neuron implementation
  • NN architectures
    • CNN, RNN, attention mechanism, transformer
    • General neural language models
[slides:WordVectors]
[slides:NN-intro]
[slides:NN-models-PyTorch]
PS5
Nov. 15th
  • Traditional Learning to rank (L2R) methods
  • NN for IR
    • Semantic matching
    • Query transformation
    • Quesion-answering
    • Document summarization
    • Named entity recognition
  • Conversational Information Retrieval (CIR)
[slides:NN4IR]
[slides:CIR]
Nov. 22th No classes
Nov. 29th
  • Information Retrieval Summary
Final exam question pool
(Below is for CS7263 only)
IR/NLP literature review
Presentation slides
Dec. 6th
Final Exam Week
We follow the University Final Exam Schedule:
  • CS4422: December 13, 2021 (1:00 PM - 3:00 PM)
  • CS7263: December 8, 2021 (8:30 PM - 10:30 PM)