This graph shows the F-score values over three different runs for different categories of models.
This graph shows the F-score values over three different runs for the situation without optional tasks and with optional tasks. For example, it shows that if there are no optional tasks teh F-score may be 100% (4 logs), but adding optional tasks results in a drop of the accuracy to typically 0%.