TU/e

2IMW15 Web Information Retrieval and Data Mining


Last update: 30 Aug 2015; if you notice any outdated or wrong information on this webpage, please e-mail to 2imw15.teachers@gmail.com

Announcements:
  • We've got lager lecture rooms. Sep 2nd, 8.45 - 10.30 in AUD 08, 10.45-12:30 in IPO 0.98. Sep 7th we move to PAV B2.
  • Please note that August 31st we have only 1 lecture, 13.45 - 15.30 because of the official opening of the 2015-2016 academic year. The second lecture will take place on September 2nd, 8.45 - 10.30 in AUD 08.
  • Please mind the important group project deadlines of September 14th and October 26th and plan your studies accordingly.
  • Please note that there is no resit for the partial exam (30% of your final grade) of October 14th.
Program: BIS, CSE, BPI, DSE, ES, EIT-SDE

Course Info: 5 ECTS, OWInfo Please note that the list of topics is indicative and some adjustments are possible during the execution of the course.

Lecturer: Mykola Pechenizkiy

Contacting:
in person:
  • option 1: please, do not hesitate to approach me during the lecture breaks and after the lectures on Mondays and Wednesdays in IPO 0.98, 14;
  • option 2: on Mondays 9:30 - 11.00 I have office hours in MF7.099 dedicated for the educational activities; you may come without making an appointment;
  • option 3: upon a request we can organize question-answering sessions on Wednesdays 9.00 - 10.30 in IPO 0.98.
  • option 4: in an unlikely case that options 1-3 do not work for you, please e-mail me a meeting request to 2imw15.teachers@gmail.com indicating your availability for the corresponding period.
via e-mail:
  • 2imw15.teachers@gmail.com with a meaningful subject; I typically reply to e-mails within 48 hours. If you have not received a reply it is likely I did not receive it; please resend
Course Materials: Handouts, reading materials and guidelines will be available via Sakai Learning Management System. Please login to Sakai using your TU/e login and password, and join 2IMW15 .

Course contents and schedule: Please note that there could be deviations from the current plan (depends on our pace).

Date, Time, and Room Lecture Title and Contents Introduction to IR book
(draft available online)

31 Aug 2015
Monday
13:45 - 15:30
PAV B2
Lecture 1: Introduction to the course
  • Basic IR terminology, ideas, architecture
  • Course overview, practicalities
  • Overview of possible group project assignments
Ch. 1
31 Aug 2015
Monday
15:45 - 17:30
No lecture: start of the academic year.
  • The lecture is rescheduled to 2 Sep 8.45 - 10.30, IPO 0.98
2 Sep 2015
Wednesday
8:45 - 10:30
AUD 08
Lecture 2: Boolean IR and Document indexing
  • Boolean information retrieval
  • Inverted, skip and positional index
Ch. 1,
Ch. 2,
Ch. 3, (optional)
Ch. 4, (optional)
2 Sep 2015
Wednesday
10:45 - 12:30
IPO 0.98
Lecture 3: Vector space retrieval
  • From Boolean to vector space retrieval
  • Latent-Concept Models
  • Relevance feedback and query expansion
Ch. 6,
Ch. 18,
Ch. 9
7 Sep 2015
Monday
13:45-15:30
PAV B2
Lecture 4: Probabilistic IR
  • Probability ranking principle
  • Language models
  • Understanding the commonalities and differences of IR models
Ch. 11,
Ch. 12,
7 Sep 2015
Monday
15:45 - 17:30
PAV B2
Lecture 5: IR Evaluation: basics
  • Basic evaluation principles
  • Metrics, experimentation protocols, benchmarking
  • Experimentation culture in IR, R&D
Ch. 8,
9 Sep 2015
Wednesday
10:45 - 12:30
No lecture: unsupervised group work on project proposals
  • You have 2-4 hours of dedicated in-class time (IPO 0.98 is available from 8:45) to work on this
  • In Sakai\Resources\Project there are templates and examples
14 Sep 2015
9:30 (no meeting)
Deadline: Submit your group project proposal - e-mail to 2imw15.teachers@gmail.com cc-ing everyone in your project group.
14 Sep 2015
Monday
13:45 - 15:30
PAV B2
Lecture 6: Classification: Basic algorithms
  • Naive Bayes, Nearest Neighbour, Decision tree learning, SVM
  • Ensemble learning
  • Evaluation of classification
Ch. 13, Ch. 14, & Ch. 15
(or IDM: Ch. 4)
14 Sep 2015
Monday
15:45 - 17:30
PAV B2
Lecture 7: Use of classification in IR
  • Application aspects of classification
  • Retrieval models, relevance feedback and classification
  • Cost-sensitive classification
  • Connections to learning to rank
Links to reading material
16 Sep 2015
Wednesday
9:00 - 10:30
IPO 0.98
Feedback on group proposals: general and/or per group
16 Sep 2015
Wednesday
10:45 - 12:30
IPO 0.98
Lecture 8: Learning to Rank in IR
  • General framework
  • Popular approaches
  • Evaluation
LTR in IR book,
21 Sep 2014
Monday
13:45 - 15:30
PAV B2
Lecture 9: Pattern mining, clustering and data/dimensionality reduction in IR
  • Partitioning (kMeans) vs. hierarchical (AHC) clustering, density-based clustering (DBSCAN)
  • Evaluation of clustering, cluster labeling
  • Web usage mining - frequent pattern mining
  • Sampling, feature selection and feature transformation approaches
Ch. 16 & Ch. 17
(or IDM: Ch. 5),
Ch. 18
IDM: Ch. 6
21 Sep 2015
Monday
15:45 - 17:30
PAV B2
Lecture 10: Link mining for Information retrieval
  • Web spam, SEO
  • Google’s Pagerank, Hub and authorities (HITS)
  • Link mining for better ranking on SERP.
Ch. 21
23 Sep 2013
Wednesday
10:45 – 12:30
Helix 1
Lecture 11: Search Engines.
  • Web dragons and beyond
  • Web search basics
  • SE architecture and Web crawling
Ch. 19,
Ch. 20,
28 Sep 2015
Monday
13:45-–15:30
PAV B2
Lecture 12: Personalization with user modelling
  • Dealing with a variety of information needs and information resources
  • Social Search and Personalization
  • IR and recommender systems: popular content- and collaborative-based approaches, plus various hybrids
  • Adaptive news access as an example
  • Design issues, issues of trust and vulnerability
Links to reading material
28 Sep 2015
Monday
15:45 - 17:30
PAV B2
Lecture 13: Brief summary of the course and (not) covered topics
  • Matching, ranking, filtering, and use of user signal revisited
  • Advanced R&D issues in IR, current trends
30 Sep 2015
Wednesday
10:45 - 12:30
IPO 0.98
FAQ: Partial exam and group project deliverables and presentation
14 Oct 2015
Wednesday
9:00-12:00
location tbc
Partial Exam. There is no resit for the partial exam.
  • It is necessary to bring a laptop that has an access to TUe network
  • The results will be available by October 28.
  • You can come and discuss your results on November 5th, 10.00-12.00.
26 Oct 2015
8:30 (am)
Deadline: Submit your group project report - send e-mail to 2imw15.teachers@gmail.com cc-ing everyone in your project group. I would highly appreciate if you submit it before the weekend.
  • Detailed instructions on deliverables and how the projects will be evaluated can be found in Sakai.
  • Important: group report, besides overall architecture and achievements should contain clearly identifiable DM part and IR part for each group member.
28 Oct 2015
9:00 - 12:30
AUD (tbc)
Groups 1-10 project presentations:
  • Demo plus poster presentation,
  • Group project grade contributes 70% to your final course grade.
28 Oct 2015
13:30 - 17:00
AUD (tbc)
Groups 11-20 project presentations:
  • Demo plus poster presentation,
  • Group project grade contributes 70% to your final course grade.
29 Oct 2015
17:30
Deadline: Submit your peer-evaluation summary - send e-mail to 2imw15.teachers@gmail.com cc-ing everyone in your project group.
  • Detailed instructions on peer evaluation can be found in Sakai.
  • This is an optional activity. You can bonus points for the evaluation.
15 Jan 2016
13:30
Deadline for delayed projects: Submit your group project report - send e-mail to 2imw15.teachers@gmail.com cc-ing everyone in your project group.
  • Detailed instructions on deliverables and how the projects will be evaluated can be found in Sakai.
  • Important: group report, besides overall architecture and achievements should contain clearly identifiable DM part and IR part for each group member.
20 Jan 2016
18:00-21:00
Delayed group project presentations. This means the 1st attempt failed disregarding whether you used it or not. And you make the 2nd attempt.

Colour agenda:

Regular lectures

Partial exam and group project presentations and related activities

Important deadlines

Modes of study and evaluation

  • 13 face-to-face lectures
  • Face-to-face question-answering sessions and group project guidance upon a request.
  • Self-study of the literature (for the project and for the exam)
  • Project work (includes group and individual parts)
    • literature study, WebIR system development and evaluation;
    • must include 1 ML/DM and 1 IR individual assignment for each member of the project team;
  • Partial exam (no registration required)
  • Final exam = group project presentation (registration is required).
    • Oct 28th we have poster presentation + demo of the group project work.
    • Final report (the main part about 10-15 pages + appendixes) - must be submitted by deadline of Oct 26th, i.e. before your presentation
    • Group Project grade (max: 70 points + 7 bonus) = ML individual assignment (15 points) + IR individual assignment (15 points) + Group work including besides the quality of the project output as a whole (15 points) also the quality of the report (10 points), poster + demo presentation and discussion (15 points) and evaluation of other projects (7 bonus points).
  • Final grade (100 points) = Partial exam (30 points) + Group Project (70 points)

Handouts and course materials are available in Sakai Learning Management System. Log in with your TU/e account and join 2imw15

Remarks:

  • IIR: Introduction to Information Retrieval book (by Manning, Raghavan and Schütze), accessible online from here.
  • IDM: Introduction to Data Mining book (by Tan, Steinbach, Kumar), accessible online from here.
  • Please note that this schedule is indicative and some changes may be still possible. We may accumulate some delay or be ahead of schedule depending on our pace.
  • Last update: 30 Aug 2015; if you notice any outdated or wrong information on this webpage, please e-mail to 2imw15.teachers@gmail.com