Instructor: Professor Jianguo Lu.
email: jlu at u windsor. Office: LT 5111. Office hours: Tuesday Thursday 12:00- 1:00.
- Lecture time and place: Monday and Wednesday 10:00-11:20 DH 253
Hands-on tutorials are about 30 minutes long. Please post your slides and detailed instructions on your web page.
- Installing a web server (tomcat) (Shane)
- Run pagerank algorithm on large scale (Liu and Li)
You are reuqired to submit your project reports for each stage. For the final project report,
it should not exceed four pages long, using the ACM sig-alternate.cls
formatting style. Here is a sample latex file
using this style.
In the report you describe the details of your project, including your data, experiments, results, and your analysis. Write the link to your web site, which is a demo of your search engine, and contains more details of your project. You can consider to report on the following aspects. Note that you are not REQUIRED to do all those subtopics. You can select and focus on a few of them, and report on the topics that you have done.
- Indexing and searching: how do you index your data? Any changes to the default setting? why applying those changes? any improvment on searching experience?
- Data: Which data you use? For Citeseer data, do you use the meta data to improve your result? do you use the citation network? Do you use other data?
- relevance ranking: what is the relevance function you used(tf-idf, their variants)? Have you changed any relevance functions? Which relevant faunction looks better?
- PageRank: how do you implement the page ranking algorithm? scalability issues? interesting results?
- classification: NaiveBayes? how do you normalize the text? have you selected the features? using mutual information or chi square? the impact of feature size on F1? precision/recall/F1? plot F1 as a function of number of features.
- Do you cluster the search result? which algorithm(s) do you use? what is the evaluation if you have?
- How do you run SVD? How is the scalability?
- [IIR] Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schutze. Cambridge University Press, 2008. book website
Other reference books:
- [SE] Search Engines: Information Retrieval in Practice, by Bruce Croft, Donald Metzler and Trevor Strohman.
- [MIR] Modern Information Retrieval, by R. Baeza-Yates and B. Ribeiro-Neto.
- [LA] Lucene in Action , Michael McCandless, Erik Hatcher, and Otis Gospodneti. 2010.
- [SA] Solr in Action , Trey Grainger and Timothy Potter. sample chapter one , sample chapter three.
- [MMD]Anand Rajaraman and Jeff Ullman, Mining of massive datasets , 2013.