TU/e

2ID26 Web Information Retrieval and Data Mining


Last update: 19 Aug 2014; if you notice any outdated or wrong information on this webpage, please e-mail to 2ID26.Teachers@gmail.com

In 2015-2016 academic year 2ID26 will become 2IMW15

Announcements:

  • Please mind the important deadlines of September 18th and October 18th and plan your studies accordingly.
Program: BIS, CSE, ES, EIT-SDE

Course Info: 5 ECTS, OWInfo Please note that the list of topics is indicative and some adjustments are possible during the execution of the course.

Lecturer: Mykola Pechenizkiy
Contacting:
via e-mail: 2ID26.Teachers@gmail.com with a meaningful subject;
in person:
  • option 1: please, do not hesitate to approach me during the lecture breaks on Mondays, Wednesdays and Thursdays in Paviljoen U46 or B2 correspondingly;
  • option 2: on Mondays 10.00 - 12.00 I have office hours in MF7.099 dedicated for the educational activities; list with available time lots is hanging by the door;
  • option 3: if you cannot make it during the lecture breaks or my office hours, please send a meeting request to 2ID26.Teachers@gmail.com indicating your availability for the corresponding period.
Course Materials: Handouts, reading materials and guidelines will be available via Sakai Learning Management System. Please login to Sakai using your TU/e login and password, and join 2ID26.

Course Syllabus

Date, Time, and Room Lecture Title and Contents Introduction to IR book
(draft available online)

1 Sep 2014
Monday
15:45 - 17:30
Pav. u46
Lecture 1: Introduction to the course
  • Basic IR terminology, ideas, architecture
  • Course overview, practicalities
  • Overview of possible group project assignments
Ch. 1
2 Sep 2014
Tuesday
15:45 - 17:30
AUD 2
Lecture 2: Boolean IR and Document indexing
  • Boolean information retrieval
  • Inverted, skip and positional index
Ch. 1,
Ch. 2,
Ch. 3, (optional)
Ch. 4, (optional)
8 Sep 2014
Monday
13:45-–15:30
Pav. u46
Lecture 3: Vector space retrieval
  • From Boolean to vector space retrieval
  • Latent-Concept Models
  • Relevance feedback and query expansion
Ch. 6,
Ch. 18,
Ch. 9
9 Sep 2014
Tuesday
15:45 - 17:30
AUD 2
Lecture 4: Data mining for IR. Classification
  • Naive Bayes, Nearest Neighbour, Decision tree learning
  • Ensemble learning
  • Evaluation of classification
Ch. 13, Ch. 14, & Ch. 15
(or IDM: Ch. 4)
15 Sep 2014
Monday
13:45 - 15:30
Pav. u46
Lecture 5: Clustering and data/dimensionality reduction
  • Partitioning (kMeans) vs. hierarchical (AHC) clustering
  • Density-based clustering (DBSCAN)
  • Sampling, feature selection and feature transformation approaches
Ch. 16 & Ch. 17
(or IDM: Ch. 5),
Ch. 18
16 Sep 2014
Tuesday
15:45-–17:30
AUD 2
Lecture 6: Peculiarities of Classification and Clustering in IR/AS
  • Availability and independence of labels
  • Semi-supervised learning
  • Labeling of clustering, evaluation of clustering in general vs. in IR
  • Drifting data
Links to reading material
18 Sep 2014 Deadline: Submit your group project proposal in Sakai. E-mail a copy to 2ID26.Teachers@gmail.com cc-ing everyone in your project group.
22 Sep 2014
Monday
13:45 - 15:30
Pav. u46
Lecture 7: Link mining for Information retrieval
  • Web spam, SEO
  • Google’s Pagerank, Hub and authorities (HITS)
  • Link mining for better ranking on SERP.
Ch. 19,
Ch. 21
23 Sep 2014

Tuesday
15:45 - 17:30
AUD 2
Lecture 8: Web usage mining
  • Association analysis: Itemset mining, Apriori principle
  • Usage data and its utility for IR
IDM: Ch. 6
29 Sep 2014
Monday
13:45-–15:30
Pav. u46
Lecture 9: Personalization with user modelling
  • Basic ideas and the current state-of-the-art
  • Adaptive news access as an example
  • Challenges being tackled in research community
Links to reading material
30 Sep 2014
Tuesday
15:45-–17:30
AUD 2
Lecture 10: IR Evaluation
  • Basic evaluation principles
  • Metrics, experimentation protocols, benchmarking
Ch. 8,
6 Oct 2014
Monday
13:45 - 15:30
Pav. u46
Lecture 11: Probabilistic IR
  • Probability ranking principle
  • Language models
  • Understanding the commonalities and differences of IR models
Ch. 11,
Ch. 12,
7 Oct 2014
Tuesday
15:45 - 17:30
AUD 2
Lecture 12: MultiMedia retrieval
  • Automatic content based analysis
  • GEMINI and time-series mining view
  • Semantic gap
Handouts and Introduction to the Multimedia Retrieval book
10 Oct 2014
Friday
Trial exam is available on Sakai. Complete it any time but until 21:00, Oct 13th. On Oct 14th we will discuss common mistakes (if any).
13 Oct 2014
Monday
13:45 - 15:30
Pav. u46
Lecture 13: Past, Present and Future of IR: Closing Lecture
  • Brief summary of the course and not covered topics
  • Advanced R&D issues in IR, current trends
  • Info on project poster and demo presentation, project deliverables
Links to reading material
14 Oct 2014
Tuesday
15:45 - 17:30
AUD 2
QA session:
  • Come with questions if any (do the trial exam by Monday)
  • Feedback on trial exam
  • Q&A session
Links to reading material
17 Oct 2014
17:30
Deadline: Submit your group project report. E-mail a copy to 2ID26.Teachers@gmail.com cc-ing everyone in your project group.
  • Detailed instructions on deliverables and how the projects will be evaluated can be found in Sakai.
  • Important: group report, besides overall architecture and achievements should contain clearly identifiable DM part and IR part for each group member.
20 Oct 2014
12:45 - 15:30
Pav. u46
Groups 1-6 project presentations:
  • Demo plus poster presentation,
  • Group project grade contributes 70% to your final course grade.
21 Oct 2014
15:45 - 18:30
AUD 2
Groups 7-12 project presentations:
  • Demo plus poster presentation,
  • Group project grade contributes 70% to your final course grade.
31 Oct 2014
9:00-12:00, location to be confirmed
2XD26 Partial Exam
  • It is necessary to bring a laptop that has an access to TUe network
  • Do not forget to register on OWInfo for 2XD26 (partial exam grade) and 2ID26 (course grade).
  • The results will be available by Nov 14.
  • You can come and discuss your results on Nov 17, 10.00-12.00
28 January 2014
13:30-16:30
Delayed group project presentations. This means the 1st attempt failed disregarding whether you used it or not. And you make the 2nd attempt.

Colour agenda:

Regular lectures

Partial exam and group project related activities/deadlines

Modes of study and evaluation

  • 7 weeks x 2 face-to-face lectures
  • self-study of the literature
  • Project assignment (includes group and individual work)
    • literature study, WIR system development and evaluation;
    • must include 1 ML and 1 IR individual assignment for each member of the project team;
      • development/implementation/evaluation of elements of machine learning and information retrieval modules.
    • In week 8 we have poster presentation + demo of the group project work.
    • Final report (main part about 10 pages + appendixes) - must be submitted by deadline before your presentation
  • 2XD26 partial exam (registration required)
  • Final grade (100 points) = 2XD26 partial exam (30 points) + Group Project (70 points)
  • Group Project grade (70 points) = ML individual assignment (15 points) + IR individual assignment (15 points) + Group work including besides the quality of the project output as a whole (15 points) also the quality of the report (10 points), poster + demo presentation (10 points) and evaluation of other projects (5 points).

Handouts and course materials are available in Sakai Learning Management System.

Remarks:

  • IIR: Introduction to Information Retrieval book (by Manning, Raghavan and Schütze), accessible online from here.
  • IDM: Introduction to Data Mining book (by Tan, Steinbach, Kumar), accessible online from here.
  • Please note that this schedule is indicative and some changes may be still possible. We may accumulate a delay of one lecture; in this case .
  • Please note that Probabilistic IR and Multimedia IR are moved to the very end of the course for a simple reason; usually these topics are not popular in group projects and they are not prerequisite for studying other topics.
  • Last update: 19 Aug 2014; if you notice any outdated or wrong information on this webpage, please e-mail to 2ID26.Teachers@gmail.com