JBG030 - Data Challenge 1

Data Challenge 1 is part of a four course series in the JADS Bachelor Data Science program offered by TU Eindhoven and University of Tilburg.

The objective of the Data Challenge courses is to teach students how to perform large-scale data-driven analyses themselves, combining the technical skills acquired earlier in the Data Science program with insights gained in methodological courses. Data Challenge 1 is the first course in this series and shall familiarize students with the skills of conducting/executing a large scale analysis on their own. The focus is on learning how to acquire and use tools and libraries for doing data collection, data enrichment, and data analysis in an independent, reproducible manner.

In the first Data Challenge 1, students will get the possibility to apply the methods and techniques acquired during the first year of the program on a large, complex dataset. The students will be given a large, structured dataset, several specific analysis questions about this dataset, and a proposed analysis approach for each question (i.e., particular analysis techniques to apply). The task for the students is to technically realize these analyses by identifying and familiarizing themselves with the right software tools for this analysis, implementing the analysis in a repeatable form, and reflecting on the validity of their results and the suitability of their analysis approach. An important element in this course will be the actual handling of large data being stored in various formats (files, relational databases, object databases, etc.), the pre-processing of data to be usable for the analysis, and the storage of analysis results in a suitable data format.


After taking the course, students are able to

  • independently apply and follow established data science research methods for a given problem and data set
  • access, process, and reason about a large, complex dataset given in various data formats
  • independently find and familiarize themselves with programming languages, libraries, programs and software packages for a specific purpose
  • implement a repeatable data analysis that makes use of existing libraries and programs in a self-chosen technical environment
  • independently validate results of their own and other student’s analyses using scientific techniques
  • present their analysis and their findings in a presentation/report/poster suitable for a given audience

Staff involved