Joaquin Vanschoren

Welcome. I am assistant professor of Machine Learning at the Eindhoven University of Technology. My research focuses on the automation of machine learning and networked science. I founded OpenML.org, a collaborative machine learning platform where scientists can automatically log and share data, code, and experiments, and which automatically learns from all this data to help people perform machine learning better and easier. My other passion is large-scale data analysis on all types of data (social, streams, geo-spatial, sensors, networks, text).

Curriculum Vitae

News

2016.12.09 – Invited Talk @ NIPS 2016 - Challenges in Machine Learning Workshop

This session will be about Gaming and Education. Looking forward to seeing you at NIPS, and thanks to Isabelle Guyon and the other organizers.

2016.11.11 – Keynote Talk @ Dutch Society for Pattern Recognition

A very inspirational event with many examples of machine learning on medical data. Thanks to Veronika Cheplygina for inviting me!

2016.11.09 – OpenML won the Dutch Data Prize!

Thanks so much to the organizers for stimulating open science through this award, and thanks to the fantastic OpenML team for making it all happen!

2016.10.28 – Open Science Radio has interviewed us (me and Heidi Seibold) about OpenML. Have a listen!

Thanks to Matthias Fromm and Konrad Förstner for running a super-interesting podcast, and for giving us the oportunity to talk about OpenML!

2016.06.22 – Talk @ IBM Watson Research Center, NY

Thanks to Meinolf Sellmann, Horst Samulowitz and Josep Pon for a great day and interesting discussions at IBM.

2015.12.16 – Invited Talk @ Data@Sheffield [Slides]

A tutorial on OpenML targetted at scientists from many domains, at the Open Data Science @ Sheffield workshop and Data Hide event. Many, many thanks to Neil Lawrence and the Open Data Science Initiative for a splendid visit and engaging discussions.

2015.11.17 – Talk @ High Tech Campus Technology Seminar

Short introduction of OpenML, with applications in Healthcare, at the High Tech Campus Eindhoven.

2015.10.22 – Horizon Talk @ IDA 2015 [Slides]

In this Horizon talk, I proposed the idea of a data science collaboratory, where scientists across domains can collaborate effortlessly using each other's data and code. Joint work with Bernd Bischl, Frank Hutter, Michele Sebag, Balazs Kegl, Matthias Schmid, Giulio Napolitano, Katy Wolstencroft, Alan R. Williams, and Neil Lawrence.

2015.08.10 – Invited Talk @ RGU IDEA Seminar

I had the opportunity to present OpenML to the Robert Gordon University CS department and BCS Aberdeen. Thanks to Daniel C. Doolan and Farzan Majdani who made my visit possible. Thanks to Norman Bain for the video.

2015.07.21 – Invited Talk @ Statistical Computing 2015

On networked science, OpenML and using OpenML from statistical environments such as R. Followed by a hands-on tutorial by Giuseppe Casalicchio and Bernd Bischl. Thanks to Matthias Schmid.

2015.07.11 – Invited Talk @ ICML 2015 - AutoML Workshop [Slides]

On OpenML and building systems that learn from machine learning experiments, to assist people while analyzing data, or automate the process altogether. Thanks to Balazs Kegl and Frank Hutter.

2014.10.20 – Successful OpenML 2014 Workshop @ TU/e

Including a 4-day hackathon and great presentations. All presentations archived by the TIB (German National Library for Science and Technology).

2014.08.11 – KDnuggets discusses OpenML

Nice article by Ran Bi.

2014.07.04 – Invited Talk @ ECDA 2014 [Slides]

On open science, machine learning, OpenML and the benefits it brings for machine learning research, individual scientists, as well as students and practitioners.

2014.06.17 – Talk @ VIPx Eindhoven

On designed serendipity, or how discoveries are made by openly sharing data and ideas.

2013.09.20 – Invited Talk @ CLADAG 2013

Presenting the first beta version of OpenML. Thanks to John Shawe-Taylor and the PASCAL 2 Network.

2012.10.23 – HARVEST Grant from the PASCAL 2 Network.

The funding received will support work on OpenML, a new system to automatically share and reuse reproducible machine learning experiments. Together with Bernd Bischl, Luis Torgo, KNIME and RapidMiner.

2012.09.26 – Quest magazine covers our large-scale sensor data analysis research.

BiGGrid also interviewed me and made a nice video.

2012.06.26 – Free Competition research grant from the Dutch Scientific Research Foundation.

The funding received will support work on Massively Collaborative Data Mining. Master student Jan N. van Rijn will start his PhD on this topic.

2012.04.12 – Invited Talk @ Dutch Hadoop User Group (NL-HUG)

Presenting work on large-scale sensor data analysis using Hadoop.

2010.12.07 – Best Application Award @ SARA Hadoop training program

For programming Hadoop procedures for terabyte-scale sensor data analysis.

2009.09.11 – Best Demo Award @ ECMLPKDD 2009

For a demonstration of Experiment Databases for machine learning. Together with Hendrik Blockeel.

Research

Publications

Also see Google Scholar

Journals and journal proceedings

  1. Mantovani, R.G., Horvath, T., Cerri, R., Carvalho, A.P.L.F., Vanschoren, J. Hyper-parameter Tuning of a Decision Tree Induction Algorithm. Brazilian Conference on Intelligent Systems (BRACIS 2016)
  2. Eerikainen, L.M., Vanschoren, J., Rooijakkers, M.J., Vullings, R., Aarts, R.M. Reduction of false arrhythmia alarms using signal selection and machine learning. Physiological Measurement, 37 (8), 1204- 1216, 2016
  3. Bischl, B., Kerschke, P., Kotthoff, L., Lindauer, M., Malitsky, Y., Frechette, A., Hoos, H., Hutter, F., Leyton-Brown, K., Tierney, K., Vanschoren, J. ASlib: A Benchmark Library for Algorithm Selection. Artificial Intelligence, 237, 41-58, 2016
  4. Gao, B., Berendt, B. and Vanschoren, J. Towards understanding online sentiment expression - An interdisciplinary approach with subgroup comparison and visualization. Social Network Analysis and Mining, 6 (1), 68:1-68:16, 2016
  5. van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J. Having a Blast: Meta-Learning and Heterogeneous Ensembles for Data Streams IEEE Proceedings of ICDM 2015
  6. Vanschoren, J., Bischl, B., Hutter, F., Sebag, M., Kegl, B., Schmid, M., Napolitano, G., Wolstencroft, K., Williams, A.R, Lawrence, N Towards a Data Science Collaboratory Advances in Intelligent Data Analysis XIV (IDA 2015), Lecture Notes in Computer Science 9385, XIX-XXI
  7. van Rijn, J.N., Abdulrahman, S.M., Brazdil, P. and Vanschoren, J. Fast Algorithm Selection Using Learning Curves Advances in Intelligent Data Analysis XIV (IDA 2015), Lecture Notes in Computer Science 9385, 298-309
  8. Vanschoren, J,. van Rijn, J.N. and Bischl, B. Taking machine learning research online with OpenML JMLR Workshop and Conference Proceedings (BigMine 2015), 41, 1-4, 2015
  9. Eerikainen, L.M., Vanschoren, J., Rooijakkers, M.J., Vullings, R., Aarts, R.M. Decreasing the False Alarm Rate of Arrhythmias in Intensive Care Using a Machine Learning Approach IEEE Computing in Cardiology, 42, 293-297, 2015
  10. Gao, B., Berendt, B. and Vanschoren, J. Who is more positive in private? Analyzing sentiment differences across privacy levels and demographic factors in Facebook chats and posts IEEE/ACM Proceedings of ASONAM 2015, 605-610
  11. Mantovani, R.G., Rossi, A.L.D., Vanschoren, J., Bischl, B. and Carvalho, A.C.P.L.F. To tune or not to tune: Recommending when to adjust SVM hyper-parameters via meta-learning IEEE Proceedings of IJCNN 2015, 1-8
  12. Mantovani, R.G., Rossi, A.L.D., Vanschoren, J., Bischl, B. and Carvalho, A.C.P.L.F. Effectiveness of Random Search in SVM hyper-parameter tuning IEEE Proceedings of IJCNN 2015, 1-8
  13. van Rijn, J.N., Holmes, G., Pfahringer, B. and Vanschoren, J. Algorithm Selection on Data Streams. Proceedings of Discovery Science 2014. Lecture Notes in Computer Science 8777, 325-336.
  14. Vanschoren, J., van Rijn, J.N., Bischl, B. and Torgo, L. OpenML: networked science in machine learning. ACM SIGKDD Explorations, 15 (2), 49-60, 2013
  15. Serban, F.*, Vanschoren, J.*, Kietz, J.U. and Bernstein, A. A Survey of Intelligent Assistants for Data Analysis. ACM Computing Surveys, 45 (3), Art. 31, 2013
  16. Vanschoren, J., Blockeel, H., Pfahringer, B. and Holmes, G. Experiment Databases: A new way to share, organize and learn from experiments. Machine Learning, 87(2), 127-158, 2012
  17. van Rijn, J., Bischl, B., Torgo, L., Gao, B., Umaashankar, V., Fischer, S., Winter, P., Wiswedel, B., Berthold, M.R., and Vanschoren, J. OpenML: A Collaborative Science Platform. Proceedings of ECMLPKDD 2013, Lecture Notes in Computer Science 8190, 645-649
  18. Reuttemann, P., Vanschoren, J. Scientific Workflow Management with ADAMS. Proceedings of ECMLPKDD 2012, Lecture Notes in Computer Science 7524, 833-837
  19. Vespier, U., Knobbe, A.J., Nijssen, S., Vanschoren, J. MDL-Based Analysis of Time Series at Multiple Time-Scales. Proceedings of ECMLPKDD 2012, Lecture Notes in Computer Science 7524, 371-386
  20. Leite, R., Brazdil P., Vanschoren, J. Selecting Classification Algorithms with Active Testing. Proceedings of MLDM 2012, Lecture Notes in Computer Science 7376, 117-131
  21. Vespier, U., Knobbe, A., Vanschoren, J., Miao, S., Koopman, A., Obladen, B., and Bosma, C. Traffic Events Modeling for Structural Health Monitoring. Proceedings of IDA 2011, Lecture Notes in Computer Science 7014, 276-387
  22. Vanschoren, J., Blockeel, H. A community-based platform for machine learning experimentation. Proceedings of ECMLPKDD 2009, Lecture Notes In Computer Science 5782, 750-754
  23. Vanschoren, J., Pfahringer, B., Holmes, G. Learning from the past with experiment databases. Proceedings of PRICAI 2008, Lecture Notes in Artificial Intelligence 5351, 485-496
  24. Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G. Organizing the world's machine learning information. Proceedings of ISOLA 2008, Communications in Computer and Information Science, 17, 693-708
  25. Vanschoren, J., Blockeel, H. Investigating classifier learning behavior with experiment databases. Proceedings of GfKL 2008, Data Analysis, Machine Learning and Applications, 421-428
  26. Blockeel, H.*, Vanschoren, J.* Experiment databases: Towards an improved experimental methodology in machine learning. Proceedings of ECMLPKDD 2007, Lecture Notes in Computer Science 4702, 6-17
  27. (* Joint first author)

Peer reviewed conference and workshop proceedings

  1. Zhang, C., van Wissen, A., Lakens, D., Vanschoren, J., de Ruyter, B.E.R., IJsselsteijn, W.A. Anticipating habit formation: a psychological computing approach to behavior change support. UbiComp Adjunct, 2016: 1247-1254
  2. Bischl, B., Bossek, J., Casalicchio, G., Hofner, B., Kerschke, P., Kirchhoff, D., Lang, M., Seibold, H., Vanschoren, J. Connecting R to the OpenML project for Open Machine Learning. useR Conference 2016
  3. Abdulrahman, S, Brazdil, P., van Rijn, J.N., Vanschoren, J. Algorithm Selection via Meta-learning and Sample-based Active Testing. MetaSel Workshop @ PKDD/ECML 2015, CEUR Workshop Proceedings 1455, 55-66
  4. Mantovani, R.G., Rossi, A.L.D., Vanschoren, J., Carvalho, A.C.P.L.F. Meta-learning Recommendation of Default Hyper-parameter Values for SVMs in Classification Tasks MetaSel Workshop @ PKDD/ECML 2015, CEUR Workshop Proceedings 1455, 80-92
  5. van Rijn, J.N., Vanschoren, J. Sharing RapidMiner Workflows and Experiments with OpenML. MetaSel Workshop @ PKDD/ECML 2015, CEUR Workshop Proceedings 1455, 93-103
  6. Vukicevic, M., Radovanovic, S., Vanschoren, J., Napolitano, G., Delibasic, B. Towards a Collaborative Platform for Advanced Meta-Learning in Healthcare Predictive Analytics. MetaSel Workshop @ PKDD/ECML 2015, CEUR Workshop Proceedings 1455, 112-114
  7. Knobbe A.J., Meeng M. Vanschoren J., Rees Jones S., Merlo Penning S. Reconstructing Medieval Social Networks from English and Latin Charters. Population Reconstruction 2014
  8. van Rijn, J.N., Holmes, G., Pfahringer, B. and Vanschoren, J. Towards Meta-learning on Data Streams. MetaSel Workshop @ ECAI 2014, CEUR Workshop Proceedings, 1201, 37-38
  9. Vanschoren, J., Braun, M. and Ong, C.S. Open science in machine learning. Proceedings of CLADAG 2013, 462-465.
  10. van Rijn, J., Umaashankar, V., Fischer, S., Bischl, B., Torgo, L., Gao, B., Winter, P., Wiswedel, B., Berthold, M.R., and Vanschoren, J. A RapidMiner extension for Open Machine Learning. Proceedings of RCOMM 2013, 59-70.
  11. van Rijn, J. and Vanschoren, J. OpenML: An Open Science Platform for Machine Learning. Machine Learning Conference of Belgium and The Netherlands 2013, 99-100
  12. Miao S., Vespier U., Vanschoren J. Knobbe A.J., De Gouveia da Costa Cachucho R.E. Modeling Sensor Dependencies between Multiple Sensor Types. Machine Learning Conference of Belgium and The Netherlands 2013, p. 66-73
  13. Vanschoren, J. The Experiment Database for machine learning. PlanLearn Workshop @ ECAI 2012, CEUR Workshop Proceedings, 950, 30-37
  14. Leite, R., Brazdil P., Vanschoren, J. Selecting Classification Algorithms with Active Testing on Similar Datasets. PlanLearn Workshop @ ECAI 2012, CEUR Workshop Proceedings, 950, 30-37
  15. Vespier, U., Knobbe, A., Nijssen, S., Vanschoren, S. MDL-Based Identification of Relevant Temporal Scales in Time Series. Workshop on Information Theoretic Methods in Science and Engineering, WITMSE 2012
  16. Gao, B. and Vanschoren, J. Visualizations of Machine Learning Behavior with Dimensionality Reduction Techniques. Machine Learning Conference of Belgium and The Netherlands 2011, 35-42.
  17. Miao, S., Knobbe, A., Vanschoren, J., Vespier, U., Koopman, A., Cachucho, R., Chen, X. A Range of Data Mining Techniques to Correlate Multiple Sensor Types. Dutch-Belgian Database Day 2011, Art.5
  18. Vanschoren, J., Soldatova, S. Exposé: An Ontology for Data Mining Experiments. Workshop on Third Generation Data Mining @ ECMLPKDD 2010, 31-46
  19. Vanschoren, J., Soldatova, S. Collaborative Meta-Learning. Planning to Learn workshop @ ECAI 2010, 37-46
  20. Vanschoren, J., Blockeel, H. Stand on the shoulders of giants: towards a portal for collaborative experimentation in data mining. 3rd Generation Data Mining Workshop @ ECMLPKDD 2009, 88-99
  21. Bauzá, M., Vanschoren, J., Funes, M.P., Barrera, G.M., López De Luise, D. Sistema de Autentificación Facial. Congreso de Inteligencia Computacional Aplicada (CICA) 2009
  22. Vanschoren, J. Experiment databases for machine learning. NIPS Workshop on Machine Learning Open Source Software @ NIPS 2008
  23. Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G. Experiment databases: Creating a new platform for meta-learning research. Planning to Learn Workshop @ ICML 2008, 10-15
  24. Vanschoren, J., Van Assche, A., Vens, C., Blockeel, H. Meta-learning from experiment databases: An illustration. Machine Learning Conference of Belgium and The Netherlands 2007, 120-127
  25. Vanschoren, J., Blockeel, H. Towards understanding learning behavior. Machine Learning Conference of Belgium and The Netherlands 2006, 89-96

Book chapters

  1. Lawrynowicz, A., Esteves, D., Panov. P., Soru, T., Dzeroski, S., Vanschoren, J An Algorithm, Implementation and Execution Ontology Design Pattern. In: Studies on the Semantic Web (forthcoming), 2016
  2. Vanschoren, J., Vespier, U., Miao, S., Cachucho, R. and Knobbe, A. Large-scale sensor network analysis. In: Big Data Management, Technologies, and Applications (W-C. Hu, N. Kaabouch, ed.), IGI Global, 2013
  3. Vanschoren, J. Meta-learning architectures. In: Meta-learning in Computational Intelligence (N. Jankowski, W. Duch, K. Grabczewski, ed.), Springer, 2011
  4. Berendt, B., Vanschoren, J. and Gao, B. Datenanalyse und -visualisierung. In: Handbuch Forschungsdatenmanagement (S. Büttner, H-C. Hobohm, L. Müller, ed.), Bock+Herchen, 2011
  5. Vanschoren, J., Blockeel, H. Experiment Databases. In: Inductive Databases and Constraint-Based Data Mining (S. Dzeroski, B. Goethals, P. Panov, ed.), Springer, 2010

Books and proceedings edited

  1. Vanschoren, J., Brazdil, P., Giraud-Carrier, C.G., Kotthoff, L. (Eds.) Proceedings of the 2015 International Workshop on Meta-Learning and Algorithm Selection @ ECMLPKDD CEUR Workshop Proceedings 1455, CEUR 2015
  2. Vanschoren, J., Brazdil, P., Soares, C., Kotthoff, L. (Eds.) Proceedings of the 2014 International Workshop on Meta-Learning and Algorithm Selection @ ECAI CEUR Workshop Proceedings 1201, CEUR 2014
  3. Vanschoren, J., Brazdil, P., Kietz, J-U. (Eds.) Proceedings of the International Workshop on Planning to Learn @ ECAI CEUR Workshop Proceedings 950, CEUR 2012
  4. Vanschoren, J., Duivesteijn, W. (Eds.) The Silver Lining. Proceedings of the International Workshop on Learning from Unexpected Results @ ECMLPKDD Leiden University
  5. van der Putten, P.H.W, Veenman, C., Vanschoren, J., Israel, M., Blockeel, H. (Eds.) Proceedings of the 20th Annual Belgian-Dutch Conference on Machine Learning Leiden University, 2011

Dissertations

  1. Vanschoren, J. Understanding Machine Learning Performance with Experiment Databases PhD Thesis, Katholieke Universiteit Leuven, 2010
  2. Vanschoren, J. A framework for high-level perception MSc Thesis, Katholieke Universiteit Leuven, 2005

Invited Talks

  1. OpenML in research and education Workshop on Challenges in Machine Learning @ NIPS 2016 9 December 2016
  2. Democratizing and Automating Machine Learning Dutch Society for Pattern Recognition 11 November 2016
  3. Collaborative Machine Learning IBM Watson Research Center 22 June 2016
  4. Collaborative Machine Learning Open Data Science Sheffield 16 December 2015
  5. Towards a Data Science Collaboratory (Horizon Talk) Intelligent Data Analysis 2015 22 October 2015
  6. Towards Networked and Automated Machine Learning IDEA Seminar, Robert Gordon University 10 August 2015
  7. OpenML: Networked Science in Machine Learning Statistical Computing 2015 21 July 2015
  8. OpenML: A Foundation for Networked and Automatic Machine Learning AutoML Workshop @ ICML 2015 11 July 2015
  9. OpenML: Networked science in machine learning Université Paris-Saclay, INRIA 4 November 2014
  10. OpenML: Open science in machine learning ECDA 2014 4 July 2014
  11. OpenML: Open science in machine learning TU Dortmund, CS Department 30 January 2014
  12. Open science in machine learning CLADAG 2013 20 September 2013
  13. Data Science and sensor data Dutch Hadoop User Group 12 April 2012

Awards

  1. Best Demo Award ECMLPKDD 2009
  2. Best Application Award SARA Hadoop Day 2010

Service

PhD Jury Membership

  1. Jakub Smid, Charles University Prague, Sep 2016
  2. Bo Gao, Katholieke Universiteit Leuven, Dec 2015

Conference organization

  1. General Chair Learning and Intelligent Optimization Conference (LION 2016)
  2. Associate Chair European Conference on Machine Learning (ECMLPKDD 2013)
  3. Program Chair Machine Learning Conference of Belgium and the Netherlands (Benelearn 2011)
  4. Program Chair Machine Learning Conference of Belgium and the Netherlands (Benelearn 2010)

Workshop chair

  1. Configuration and Selection of Algorithms (COSEAL 2016)
  2. Open Machine Learning Developer Workshop (OpenMLdev 2016)
  3. Automatic Machine Learning Workshop (AutoML)
  4. Open Machine Learning @ Lorentz Center (OpenML 2016)
  5. Open Machine Learning (OpenML 2015)
  6. Metalearning and Algorithm Selection @ ECMLPKDD 2015 (MetaSel 2015)
  7. Open Machine Learning (OpenML 2014)
  8. Metalearning and Algorithm Selection @ ECAI 2014 (MetaSel 2014)
  9. The Silver Lining, Learning from Unexpected Results @ ECMLPKDD 2012 (Silver 2012)
  10. Planning to Learn @ ECAI 2012 (PlanLearn 2012)

Journal referee

  • Machine Learning Journal (MLJ)
  • Journal of Machine Learning Research (JMLR)
  • Data Mining and Knowledge Discovery (DaMi)
  • Semantic Web Journal (SWJ)
  • Computational Intelligence (COIN)

Programme committee member

  • ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016)
  • European Conference on Machine Learning (ECMLPKDD 2012-2015)
  • Extended Semantic Web Conference (ESWC 2011 2015)
  • European Conference on Artificial Intelligence (ECAI 2014)
  • Knowledge Discovery and Information Retrieval (KDIR 2010-2012)

Research visits

  • Robert Gordon University, Aberdeen, UK (August 9-12, 2015)
  • University of Bournemouth, UK (February 16-19, 2015)
  • INRIA-Saclay, Paris, France (November 3-7, 2014)
  • University of Dortmund, Germany (January 27-31, 2014)
  • University of Waikato, New Zealand (February-March 2011)
  • Universities of Geneva and Zurich, Switzerland (June 14-18, 2010)
  • University of Porto, Portugal (June 7-11, 2010)
  • University of Aberystwyth, UK (July-August, 2009)
  • Jozef Stefan Institute, Slovenia (July 4-11, 2009)
  • University of Waikato, New Zealand (March-June, 2008)
  • University of Indiana, USA (August 2004)

Web Technology

The web today is a growing universe of interlinked web pages and web apps, teeming with interactive content. It is the result of the ongoing efforts of an open web community that helps define web technologies, like HTML5, CSS3, Javascript libraries and Web frameworks. This course provides the student with knowledge of and insight into the rapidly evolving field of web technology. The focus is on hands-on experience with a wide variety of these technologies, enabling students to develop their own web applications, from small interactive sites to the next Facebook.

People

  • dr. ir. Joaquin Vanschoren (j.vanschoren@tue.nl) MF 7.104a - Lecturer
  • dr. Natalia Stash (n.v.stash@tue.nl) MF 7.118 - Lecturer, Instructor
  • Henry He (s.he@student.tue.nl) MF 5.141 - Student Assistant
  • Niels de Jong (n.d.jong.1@student.tue.nl@student.tue.nl) MF 5.141 - Student Assistant
  • Robin Mennens (r.j.p.mennens@student.tue.nl) MF 5.141 - Student Assistant
  • Bram Mulders (b.a.mulders@student.tue.nl) MF 5.141 - Student Assistant
  • Anja Syring (a.f.syring@student.tue.nl) MF 5.141 - Student Assistant
We are all here to help you. For questions about the course organization, please contact the course lecturers or instructors directly. The student assistants will also keep dedicated office hours to answer your questions on Thursday, 12u-14u, room MF5.141 (it is nice to email beforehand). For very practical issues (e.g., problems with your localhost setup of hosting), please first contact the student assistants.

Learning objectives

By the end of this course, you should be able to:

  • Create responsive web pages that adapt to the screen on which they are shown (e.g. mobile phones)
  • Program interactive webpages using Javascript
  • Be able to host and maintain a website, and build web applications collaboratively using Git.
  • Download data from REST services all over the web, and include it in your webpage
  • Build functional client-side (frontend) web applications using Javascript
  • Build server-side (backend) web applications (e.g. using Django or Node.js)
  • Build your own REST APIs to make data available to others
  • Design, implement and test your own web application using the appropriate technologies
  • Review existing web applications, and develop web applications as a team.

Required prior knowledge: It is highly recommended to have experience with HTML (e.g., from the course DBL Hypermedia), and proficiency in programming is required. While this course does start from the basics, without prior knowledge you will need to work hard to learn web programming in the first few weeks. We have weekly deadlines to ensure that all students are keeping up with the pace of this course.

Course Structure

The course has the following weekly contact hours:

  • Tuesdays, 10:45 - 12:30: Feedback/Q&A session (Matrix 1.44 and 1.46, Flux 1.09 and 1.11)
  • Fridays, 13:45 - 15:30: Plenary Lectures (Auditorium 5)
  • Fridays, 15:45 - 17:30: Instructions (StudyHub 2)

Self study (Flipped Classroom). Part of the class is self-study: you will receive instructions for material to study (from codecademy), and you can ask questions during the feedback sessions. At the same time, you are expected to apply what you have learned in the individual assignments. In the feedback sessions we will discuss your (partially built) assignments in class in a collaborative fashion, so that you can learn from each other.

Assignments

Self study deadlines: You will need to complete the CodeCademy modules on time, with deadlines in weeks 2 and 4 (see below). There is a grade penalty if you fail to do so. On the other hand, students who complete all modules before 24 November can obtain a grade bonus and can do the team assignment in groups of 2, rather than 4. You can apply for an excemption of the codecademy modules if you submit prior work (e.g. a web app) that show mastery of Javascript and/or Python.

Individual Assignments (40% of the final grade) You will receive 2 assignments that you must turn in individually. The first is a well-design, interactive, responsive website, the second is a complete web app with data storage. You will get 3 weeks for each of these assignments.

Team Assignment (60% of the final grade) You must work in a team to create a modern web application (with frontend, backend, and data storage). The assignment is evaluated on design decisions, choice and use of web technologies, as well as efficiency, usefulness and creativity. Moreover, you are required to collaborate on this web application using Github, so that it is clear how much each of you has contributed to the overall project. Hence, you should all do regular updates to the repository (do not delegate this to one person). We will set up a GitHub organisation for all projects. Note that you will be able to see the work of other teams (and learn from it), but you are not allowed to just copy/steal it (see the note on plagiarism below). We will be able to check this in your commit history.

Peer assessment

The individual assignments will be peer reviewed by your fellow classmates, so that you can learn from each other, and learn how to review other people's work. You will need to submit your own assignment (anonymously), and review the assignments of 3 other students, give constructive critisism and scores on different aspects (a rubric is provided). You get to see all reviews, but not who wrote them, nor the (intermediate) scores.

The assignment will ultimately be graded by the course instructors, however, the peer assessment can influence your grade:

  • If you fail to do the peer assessment, you don't get points for the individual assignment.
  • The peer reviews will be checked by course instructores. If you produce good, helpful reviews, you can gain a bonus point for that assignment (1/25). However, if you produce useless reviews, you can lose points.

Materials

We use Canvas for posting announcements, assignments, lecture slides, etc. When you go to Canvas, login with your TUE credentials. It is your responsibility to keep up to date with postings and activities on Canvas. If you have difficulties with the system, please contact dr. Stash.

Schedule

Tue 3-4 (Q&A,Feedback) Fri 5-6 (Lecture) Fri 7-8 (Instructions)
Nov 15 HTML, CSS. Environment setup. Web architecture, Responsive web design Building and hosting a responsive website
Nov 22 Javascript Javascript and Javascript frameworks Building interactive websites
Nov 29 Python Web Frameworks (Django) Building web apps
Dec 6 Django Building web apps (continued)
Dec 13 Django APIs and REST, handle data with Python Using and building APIs
Dec 20 Midterm presentations (Matrix 1.44) Midterm presentations (Auditorium 5) Midterm presentations (Auditorium 5)
Jan 3 Q&A / Q&A
Jan 10 Q&A / Q&A
Jan 17 Final presentations (Matrix 1.44) Final presentations (Auditorium 5) FInal presentations (Auditorium 5)

Deadlines:

  • Thu Nov 24: CodeCademy modules on 'Learn Javascript' (5h), JQuery (3h)
  • Thu Dec 1: Register your team for the team assignment.
  • Thu Dec 1: Individual assignment 1.
  • Thu Dec 8: CodeCademy module on Python (13h)
  • Thu Jan 5: Individual assignment 2.
  • Mon Jan 16: Team Assignment.

Course Policies

Participation. As this class endeavors to teach professional skills, we ask that students act professionally and treat all course participants with respect. We also encourage you to offer your ideas and thoughts to the class and to question the material presented.

Assignments. Assignments are due at the time and in the manner specified in the assignment description. Late work will lose 33% of its original point-value for each day late, and once solutions are posted or discussed, late submissions will not be accepted.

Plagiarism. Plagiarism and cheating will not be tolerated. University policy will be adhered to in all such cases. There is a difference between collaboration and plagiarism. Plagiarism is the act of using another’s work without giving them credit for it. Collaboration is the exchange of ideas, the debate of issues and the examination of readings among each other that enables you to arrive at your own independent thoughts and designs.

Foundations of Data Mining

Machine learning is the science of making computers act without being explicitly programmed. Instead, algorithms are used to find patterns in data. It is so pervasive today that you probably use it dozens of times a day without knowing it, for instance in web search, speech recognition, and (soon) self-driving cars. It is also a crucial component of data-driven industry (Big Data), scientific discovery, and modern healthcare. In this class, you will learn the foundations of how data mining and machine learning work internally, understand when and how to use key concepts and techniques, and gain hands-on experience in getting them to work for yourself. You'll learn about the theoretical underpinnings of data analysis, and leverage that to quickly and powerfully apply this knowledge to tackle new problems.

This course on Canvas.

This course on OASE.

People

  • dr. ir. Joaquin Vanschoren (j.vanschoren@tue.nl) MF 7.104a - Responsible Lecturer
  • dr. Mykola Pechenizkiy (m.pechenizkiy@tue.nl) MF 7.099 - Lecturer
  • dr. Anne Driemel (a.driemel@tue.nl) MF 7.073 - Lecturer

Learning objectives

By the end of this course, you should be able to:

  • Understand how data mining algorithms algorithms work: how they find patterns in data.
  • Reason about when and how to use them, and apply them successfully in practice.
  • Understand the mathematical foundations of data mining techniques, and use this to derive fundamental properties.
  • Run practical experiments to experience first-hand how data mining algorithms behave on real data.
  • Explore how algorithm parameters and data properties affect the effectiveness of predictive models, and how better models can be built.
  • Formulate data analysis problems in the terminology of data mining.
  • Understand the challenges and common problems that occur when approaching data mining/machine learning problems (such as overfitting, curse of dimensionality) and how to counter these challenges (bias/variance trade-off, dimensionality reduction).

Required prior knowledge: While there are no strict requirements, it is highly recommended to have a working knowledge of statistics, and to have programming experience. Programming is part of the assignments. The course will mostly feature examples from R, but languages such as Python can also be used.

Course Structure

The course has the following weekly contact hours:

  • Monday, 9:30 - 10:30: Q&A session (PAV J17)
  • Mondays, 10:45 - 12:30: Plenary Lectures (PAV J17)
  • Thursdays, 13:45 - 15:30: Plenary Lectures (AUD 16)

Evaluation

There is no exam. Students are evaluated using a series of 3 problem sets, containing both theoretical and practical assignments. Students work in teams of 2 people, and teams are rotated between problem sets.

Materials

We use Canvas for posting announcements, assignments, lecture slides, etc. It is your responsibility to keep up to date with postings and activities, but these will also clearly be announced in class or by email.

Schedule

This schedule is preliminary. The order may change and parts of lectures may be removed (or added).

Feb 1 Introduction to Data Mining
An overview of the field
Vanschoren
Feb 4 Similarity and Distances
Nearest neighbor, Jaccard similarity, Locality sensitive hashing, MinHashing, Nearest Neighbor search
Driemel
Spring break
Feb 15 Clustering
Lloyd's algorithm (kMeans), Gonzales' algorithm
Driemel
Feb 18 Dimensionality Reduction
High-dimensional spaces, Random projections, PCA, Multidimensional scaling
Driemel
Feb 22 Metric embeddings
IsoMap, Frechet’s embedding, Bourgain’s embedding
Driemel
Feb 25 Machine Learning software
Interactive workshop on machine learning with R, Python and OpenML
Vanschoren
Feb 29 Rules and decision trees (Symbolic Learning)
Rule learning, separate-and-conquer, covering algorithm. Growing decision trees, information gain, regularization (pruning). Overfitting and other issues. First-Order rules, inverse deduction.
Vanschoren
Mar 3 Evaluation and optimization
Avoiding overfitting. Cross-validation. ROC analysis, Bias-Variance analysis. Optimizing hyperparameters.
Pechenizkiy
Mar 7 Instance-based learning (Learning by Analogy 1)
k-Nearest Neighbor, Locally weighted regression
Vanschoren
Mar 10 Kernel methods (Learning by Analogy 2)
Linear models, least-squares, Support Vector Machines, maximal margin, Kernel methods.
Vanschoren
Mar 14 Ensemble Learning (Cancelled due to illness)
Bagging, RandomForests, Boosting, AdaBoost
Pechenizkiy
Mar 17 Ensemble Learning (Cancelled due to illness)
Gradient boosting, Stacking
Pechenizkiy
Mar 21 Neural Networks (Connectionist Learning)
The perceptron. Single-layer neural networks.
Vanschoren
Mar 24 Neural Networks (Connectionist Learning)
Multi-layer neural networks, backpropagation. Deep learning, autoencoders.
Vanschoren
Mar 28 No lecture (TU/e closed)
Mar 31 Ensemble Learning (Catch up on cancelled lectures). Vanschoren

Deadlines:

  • Assignment 1a: Feb 18
  • Assignment 1b: Feb 25
  • Assignment 1c: Mar 3
  • Assignment 2a: Mar 10
  • Assignment 2b: Mar 24
  • Assignment 2c: Mar 31
  • Assignment 3: Apr 14

Course Policies

Participation. As this class endeavors to teach professional skills, we ask that students act professionally and treat all course participants with respect. We also encourage you to offer your ideas and thoughts to the class and to question the material presented.

Assignments. Assignments are due at the time and in the manner specified in the assignment description. Late work will lose 33% of its original point-value for each day late, and once solutions are posted or discussed, late submissions will not be accepted.

Plagiarism. Plagiarism and cheating will not be tolerated. University policy will be adhered to in all such cases. There is a difference between collaboration and plagiarism. Plagiarism is the act of using another’s work without giving them credit for it. Collaboration is the exchange of ideas, the debate of issues and the examination of readings among each other that enables you to arrive at your own independent thoughts and designs.

People

Current students

Graduate students and PhD students are the heart of the creative research and development work. At present I’m fortunate to work with the following PhD and Master students:

  • Jan N. Van Rijn, PhD Student, Meta-learning on Stream data and OpenML
  • Rafael Mantovani, PhD Student, Meta-learning and Optimization
  • Chao Zhang, PhD Student, e-Coaching for Continuous Personal Health
  • Bo Gao, PhD Student, Social Networks and Privacy
  • Sjoerd van Bavel, Master Student, Predicting Heat Capacity in Greenhouses, 2016-2017
  • Roy Haanen, Master Student, Predicting Aircraft Performance on Final Approach, 2015-2016

Former Doctoral students

  • Karthik Srinivasan (PDEng), Preventing Burglaries and Other Incidents, TU Eindhoven, 2014-2015.

Former Master students

  • Chung-Kit Lee, Burglary Prediction Model, 2015-2016
  • Hilda F. Bernard, Enhanced Sleepiness Prediction with Improved Algorithm Selection and Hyperparameter optimization, 2015-2016
  • Mikhail Evchenko, Frugal Learning: Applying Machine Learning with Minimal Resources, 2015-2016
  • Kris van Tienhoven, Gamification for OpenML, 2015-2016
  • Ruben Moonen, Object Recognition Framework using information retrieval and machine learning techniques, 2013-2014
  • Anton den Hoed, MapReduce Algorithms for Time Series Data, 2011-2012
  • Mohammed Alaeikhanehshir, Data mining to improve customer service, 2011-2012
  • Thomas De Craemer, Algorithm for a Recommendation Engine, 2010-2011
  • Wouter Deroey, Semi-automated Corpus-based Ontology Population, 2010-2011
  • Xushuang Gao, Active meta-learning, 2009-2010
  • Bo Gao, Advanced visualizations for learning behavior, 2009-2010
  • Jeroen Peelaerts, Visualizing learning behavior, 2007-2008
  • Jan Callewaert, Simulating Biologically Inspired Brood Sorting in Ant-Like Agents, 2005 - 2006
  • Anton Dries, DM_square, Analysis of Data Mining Results Through Data Mining, 2005 - 2006