TU/e

2IID0. Web Analytics

(2014-2015, Semester A, Quartile 2)


Last update: 17 Nov 2014; if you notice any outdated or likely wrong information on this webpage, please e-mail to 2IID0.Teachers@gmail.com

Announcements:
  • 17.11.14: No lecture 8.45-10.30 on Nov 19th. Please come directly to the instructions starting at 10.45 in Helix 1.
  • 17.11.14: Joep Fennema will help you with the first homework on bidding competition.
  • 17.11.14: Guest lecture on Dec 3rd: "Persuasion Profiling in Data Streams" by Maurits Kaptein
  • 9.11.14: We start the course with an invited talk by Thijs Putman from StudyPortals.eu
    We look forward to seeing you on Monday, November 10th, 2014 in Pav m23, starting 13.45!
  • 1.09.14: This course is meant for the 3rd year bachelor Web Science (both major and Bachelor college), Software Science and Web Technology programs.
  • 1.09.14: Course and examination information and registration on OWInfo.
Lectures: Mykola Pechenizkiy
Instructions: Mykola Pechenizkiy and Joaquin Vanschoren
Contacting teachers:
via e-mail:
  • Send all correspondence to 2IID0.Teachers@gmail.com with a meaningful subject; it is fine to start with Hi, Hello or Dear FirstName.
  • Please do not send requests to our personal e-mails. There is also no need cc personal e-mails addresses of the teachers.
  • We will try to answer all your requests as soon as possible. However, if you have not received a reply within 3 working days please do not hesitate to resend your request.
in person:
  • option 1: please, do not hesitate to approach the teachers during the lecture breaks on Mondays in Pav m23. 13.45 - 15.30 and Wednesdays in Pav m23 8:45-10:30 and during the instructions on Wednesdays in Helix 1. 10.45-12.30;
  • option 2: on Mondays 10.00 - 12.00 we have office hours in MF7.099 dedicated for the educational activities;
  • option 3: if you cannot make it during the lecture breaks or our dedicated office hours, please send a meeting request to 2IID0.Teachers@gmail.com indicating your availability for the corresponding period;
Modes of study and evaluation:
  • 8 weeks x 2 times per week lectures
  • 8 weeks x 1 time per week instructions (all students are in the same instructions group)
  • Self-study of the literature
  • 4 Homeworks; done in groups of 4. Please form groups by the end of week1.
  • Question answering sessions
  • Written exam.

Final grade:

  • 50% homeworks (2IID2) and 50% written exam (2IID1).
  • You have to get at least 5.5 as a grand average to pass the course. An additional constraint imposed by the Bachelor college is that you need to score at least 5.0 for the exam and at least 5.0 for the homeworks to pass the course.
Course Materials:
  • There is no single text book that covers the topics you will study in this course. However, there are several good book chapters that cover some of the topics. These chapters are available online:
    • MMDS: Mining of Massive Datasets (by Rajaraman, Ullman, Leskovec) accessible online from here.
    • IDM: Introduction to Data Mining book (by Tan, Steinbach, Kumar), chapters accessible online from here.
    • NCM: Networks, Crowds, and Markets: Reasoning About a Highly Connected World (by Easley and Kleinberg), accessible online from here.
    For each covered topic the corresponding book chapter(s) or other reading will be suggested.
  • Lecture slides, reading materials, homework description and guidelines will be available via Sakai Learning Management System. Please register using your TU/e login, and join 2IID0.
  • Submission of the homeworks will be done via Sakai as well.

Course Syllabus:

Please note that this schedule is indicative and changes may be possible as the course progresses.

Date, Time, and Room Lecture Title and Contents
10 Nov 2014
Monday
13:45-15:30
Pav m23
Week1 Guest Lecture: Web as experimentation platform
  • Invited lecture by Thijs Putman (StudyPortals B.V)
  • A/B testing at MastersPortal.eu
12 Nov 2014
Wednesday
8:45-10:30
Pav m23
Week1 Lecture: Introduction to the course
  • Motivation and historical perspective on the development of web analytics
  • Web analytics ecosystem(s)
  • Overview of the covered topics.
12 Nov 2014
Wednesday
10:45-12:30
Helix 1
Week1 Instructions: overview of the course practicalities
  • Brief overview of homeworks and final (written) exam
  • Grading policies
  • Overview of the covered topics (cont.)
17 Nov 2014
Monday
13:45-15:30
Pav m23
Week2 Lecture: Computational advertisement
  • Display and paid search advertising
  • Ad Auctions
  • Conversion attribution
19 Nov 2014
Wednesday
8:45-10:30
Pav m23
Week2 Lecture: Computational advertisement
  • Click prediction related problem formulations
  • Ad to content/context matching
  • Traffic volume prediction
19 Nov 2014
Wednesday
10:45-12:30
Helix 1
Week2 Instructions: Second-price auction
  • Bidding strategies
  • Simulation tool used for Homework 1.
  • Starting to work on the homework.
24 Nov 2014
Monday
13:45-15:30
Pav m23
Week3 Lecture: Predictive modeling. Classification
  • Generative and discriminative models, ensembles
  • Ideas for improvement
  • Variety of application settings; active learning and semi-supervised learning
26 Nov 2014
Wednesday
8:45-10:30
Pav m23
Week3 Lecture: Application of classification techniques.
  • Web content and spam classification, user modeling
  • How good is it? What are we optimizing for? Evaluation aspects
  • Cost-sensitive classification
26 Nov 2014
Wednesday
10:45-12:30
Helix 1
Week3 Instructions: Classification techniques
  • CTR prediction with WEKA and OpenML environment used for Homework 2.
  • Practicing Naive Bayes, Decision trees and other classification techniques
  • Experiencing class imbalance and cost-sensitive classifier learning
28 Nov 2014
Friday
16:00.
Deadline: submit your solution and report for Homework1 (via Sakai, copy to 2IID0.Teachers@gmail.com)
1 Dec 2014
Monday
13:45-15:30
Pav m23
Week4 Lecture: Computational challenges.
  • Mining data steams. Examples of computing popularity of pages, queries
  • Predicting with thousands of models
  • Distributed pattern mining
  • Dimensionality reduction and sampling
3 Dec 2014
Wednesday
8:45-10:30
Pav m23
Week4 Lecture: Persuation of users
  • 1st hour: Invited talk: "Persuasion Profiling in Data Streams" by Maurits Kaptein
  • 2nd hour: Utility of Web analytics
  • Predictive models vs. explanatory models and methodological issues of knowledge discovery
  • Causal discovery and targeted learning
  • Predicting causal effect, mining data from A/B testing
3 Dec 2014
Wednesday
10:45-12:30
Helix 1
Week4 Instructions: Distributed analytics
  • Brief introduction to Hadoop stack
  • Examplrd of writing map-reduce programs
  • Introduction to Homework 3.
  • Feedback on Homework 1.
5 Dec 2014
Friday
16:00.
Deadline: submit your solution and report for Homework2 (via Sakai, copy to 2IID0.Teachers@gmail.com)
8 Dec 2014
Monday
13:45-15:30
Pav m23
Week5 Lecture: Clustering techniques
  • kMeans, AHC, DBScan and their applications
  • Evaluation of clustering
10 Dec 2014
Wednesday
8:45-10:30
Pav m23
Week5 Lecture: Computing similarities
  • Similarity in metric spaces.
  • Similarity in high-dimensional and sparse data.
  • Matching sequential and time-series data
  • Finding similar nodes in a (labeled) graph
10 Dec 2014
Wednesday
10:45-12:30
Helix 1
Week5 Instructions: Clustering techniques
  • Continuation of Homework 3.
  • Clustering tweets. Cluster labeling.
  • Feedback on Homework 2.
12 Dec 2014
Friday
16:00.
Deadline: submit your solution and report for Homework2 (via Sakai, copy to 2IID0.Teachers@gmail.com)
15 Dec 2014
Monday
13:45-15:30
Pav m23
Week6 Lecture: Recommender systems
  • Content-based, collaborative-based, and hybrid approaches
  • Problems of biased data,
  • Exploration-exploitation principle
  • Recommenders on Netflix, LinkedIn, Booking.com
17 Dec 2014
Wednesday
8:45-10:30
Pav m23
Week6 Lecture: Social network analytics
  • Example of simple analytics on MSN messenger data
  • Properties of large-scale networks (degree, diameter, centrality, clustering)
  • Graph sampling
17 Dec 2014
Wednesday
10:45-12:30
Helix 1
Week6 Instructions: SNA and SMA
19 Dec 2014
Friday
16:00.
Deadline: submit your solution and report for Homework3 (via Sakai, copy to 2IID0.Teachers@gmail.com)
5 Jan 2015
Monday
13:45-15:30
Pav m23
Week7 Lecture: Heterogeneous network analytics
  • How networks form and grow: rich-gets-richer, community-guided attachment, Kronecker graphs
  • Influence propagation, viral marketing, acceptance behavior, general contagion model
  • PageRank and HITS, top influencing nodes, ambassadors, etc
5 Jan 2014
Monday
21:00.
Deadline: submit your solution and report for Homework4 (via Sakai, copy to 2IID0.Teachers@gmail.com)
7 Jan 2015
Wednesday
8:45-10:30
Pav m23
Week7 Lecture: Heterogeneous network analytics
  • Querying and clustering heterogeneous networks
  • Community mining
  • Heterogeneous network (re)construction: information extraction, linking and classification
7 Jan 2015
Wednesday
10:45-12:30
Helix 1
Week7 Instructions slot: Trial exam (optional)
  • Feedback on Homework4
  • If we need more time for feedback on homeworks, we may suggest to write trial exam at home and submit it by e-mail.
12 Jan 2014
Monday
13:45-15:30
Pav m23
Week8 Closing lecture: Summary of the covered topic
  • Ecosystems and (business) problem formulations
  • Data science approach to address these problems
  • Typical KDD problem formulations in Web analytics
  • Major computing paradigms
  • Future of Web analytics
14 Jan 2014
Wednesday
8:45-10:30
Pav m23
Week8 Lecture slot: Solutions and feedback on trial exam
    QA session. Try to e-mail your questions in advance, we will group them
14 Jan 2014
Wednesday
10:45-12:30
Helix 1
Week8: Instructions: QA session
  • Feedback on trial exam
  • QA: try to e-mail your questions in advance.
30 Jan 2015
Friday
9:00-12:00
Place t.b.a.
FINAL EXAM
  • Do not forget to register for the exam.
  • The results will be available by Feb 15.
  • You can come and check your results Feb 16, 10.00-12.00
  • Second attempt: 8 Apr 2015, 18:00-21:00

Colour agenda:

Lectures

Instructions

Deadlines for submitting homeworks and final exam

Handouts and course materials will be available with Sakai or other Learning Management System.

Remarks:

  • MMDS: Mining of Massive Datasets (by Rajaraman, Ullman, Leskovec) accessible online from here.
  • IDM: Introduction to Data Mining book (by Tan, Steinbach, Kumar), chapters accessible online from here.
  • NCM: Networks, Crowds, and Markets: Reasoning About a Highly Connected World (by Easley and Kleinberg), accessible online from here.
  • Please notice that this schedule is indicative and some changes may be still possible.
  • Last update: 17 Nov 2014; if you notice any outdated or wrong information on this webpage, please e-mail to 2IID0.Teachers@gmail.com