Copyright 2012-2013 - University of Mons (Mathieu Goeminne and Tom Mens) and Eindhoven University of Technology (Bogdan Vasilescu and Alexander Serebrenik).
This replication package accompanies our research article On the variation and specialisation of workload A case study of the Gnome ecosystem community", authored by Bogdan Vasilescu, Alexander Serebrenik, Mathieu Goeminne and Tom Mens, and published in 2013 in the Empirical Software Engineering journal (Springer, ISSN 1382-3256 and ISSN 1573-7616). If you use this replication package as part of a publication, you need to cite this research article in your work.
This webpage provides access to the data and tooling used in the above-mentioned article. All the files described below are part of this archive. Use of this archive is subject to the following licenses:
First, a list of all GNOME projects (gnome_projects-list.csv) was extracted from http://git.gnome.org/browse/. Then, CVSAnalY was used to extract information from the source code repository logs, which was stored in MySQL databases. The databases were queried and raw data files were created per project, aggregating the number of file touches performed by a certain contributor to files pertaining to a certain activity type. The following table illustrates the contents of such a fullLife.csv file for a given GNOME project. The files are available in the data/raw subfolder.
author | code | documentation | build | ... | ||||||||
Bogdan | 0 | 20 | 4 | ... |
Identity matching was performed and a mapping table was constructed (alias2idName.csv). The mapping table contains for each GNOME alias (first column) the identity to which it corresponds (second column). All aliases of the same person map to the same identity name. To ease further processing each identity name was assigned a unique integer identifier according to the idName2integerId.csv mapping table.
The fullLife.csv files were then cleaned to take into account the identity matching. All aliases were replaced by the corresponding integer ids, and if the same person used multiple aliases within the same project (the same fullLife.csv file), all their contributions were aggregated. This resulted in a fullLife_clean.csv file for each GNOME project.
File rules.txt describes the mapping of files to activities.
The Python script extract-metrics.py was used to compute the metrics. The results are available in the data/metrics subfolder.
The statistical analyses were performed and the plots were generated using the R scripts in the scripts subfolder. The names of the figures in the article correspond to the filenames of the R scripts. Some of the scripts refer to the implementation of the T procedure kindly provided by Dr. Frank Konietschke, which is yet to be published (hence not included in our replication package).
For space considerations we publish the p-values as well as the lower and upper bounds of the confidence intervals obtained as a result of applying the T procedure in the replication package (available in the p-values subfolder).