Data sets with concept drift

This page contains welknown SEA concepts and rotating hyperplane datasets, which we used as part of our study in (Tsymbal et al., 2006).

Data sets in arff format can be downloaded collectively as zip archive (4.2Mb) or individually.

  • sea.arff - dataset with four 'SEA' concepts as described in (Street and Kim, 2001).
    Four datasets with test data for each concept:
    sea_tst1.arff
    sea_tst2.arff
    sea_tst3.arff
    sea_tst4.arff

  • hyperplaneX.arff - datasets with the rotating hyperplane with different parameters of drift; see (Fan, 2004):
    hyperplane1.arffk=2t=0.1
    hyperplane2.arffk=2t=0.5
    hyperplane3.arffk=2t=1.0
    hyperplane4.arffk=5t=0.1
    hyperplane5.arffk=5t=0.5
    hyperplane6.arffk=5t=1.0
    hyperplane7.arffk=8t=0.1
    hyperplane8.arffk=8t=0.5
    hyperplane9.arffk=8t=1.0

  • References

    W. Fan, Systematic data selection to mine concept-drifting data streams, in: KDD'04, 10th International Conference on Knowledge Discovery and Data Mining, Seattle, WA, August 2004, pp. 128-137.

    W. Street, Y. Kim, A streaming ensemble algorithm (SEA) for large- scale classification, in: KDD'01, 7th International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, August 2001, pp. 377-382.

    A. Tsymbal, M. Pechenizkiy, P. Cunningham S. Puuronen, Dynamic integration of classifiers for handling concept drift, Information Fusion, (2006)
    The prepublished version of this paper is available here