Text analysis

  1. Text statistics. [slides] [nbviewer] [ipynb] [vldb.txt]
  2. Good_Turing Smoothing. [nbviewer] [ipynb] . Reading: Good-Turing Smoothing Without Tears
  3. Boolean retrieval model Slides (pdf)
  4. Vector space model. [slides] [nbviewer] [ipynb]
  5. Word embedding [nbviewer] [ipynb] [WS3535 nbviewer] [WS353 ipynb] WS353-Sim.txt . Readings: Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality , Glove
  6. Classification Naive Bayes slides (pdf), [nbviewer], [vldb_train]. [icse_train]. [vldb_test]. [icse_test].
  7. Evaluation. slides. chapter 8
  8. Languagde modelling slides . chapter 12
  9. Word co-occurrence. [nbviewer] [ipynb] Reading: O Levy et al. Neural Word Embedding as Implicit Matrix Factorization
  10. LSI (Latent Semantic Indexing) and SVD(Singular Value Decomposition) [slides] chapter 18 IIR web page
  11. Clustering [K-means Slides] [HAC Slides] [IIR chapter 16] IIR chapter 17
  12. Text processing basics (slides) .

Graph Analysis

  • Graph embedding [nbviewer] [ipynb] . Reading: DeepWalk paper , ShortWalk .

    Search Engine Construction


    Text Book

    Other reference books: