Online Discovery of Motifs in Time Series

 

KMOTIF  monitors time series data and discovers meaningful motifs in an online manner.

Below, you can find some extra information:

  About KMOTIF
  KMOTIF finds the top-k most similar motifs in time series data in an online manner. A motif in time series data is the repeated sequences. For example, the following picture shows an example of top-2 motifs discovered in the brain activity data:
   
 

        Figure 1: An example of top-2 motifs in brain activity data. Motif are subsequences with a similar shape.

   

Main results

  Our main research results can be summarized as follows:
  • A lower-bound of wlogw bits is derived for any exact algorithm solving the top-k most similar motifs discovery problem.
  • We propose two algorithms nMotif and kMotif solving the  top-k most similar motifs discovery problem, which are near-optimal in terms of space and are several time faster than the existing algorithm oMotif  proposed by Mueen and Keogh in the specific case k=1 

The comparison of kMotif and nMotif  to the existing algorithm by Mueen and Keogh oMotif is illustrated in the following table:

k Algorithms Space (floats) Time per Update operation
k=1 nMotif O(w^3/2) O(wm)
oMotif O(w) O(wm)
k>1 nMotif O(kw) O(wm+wlogk)
kMotif O(w) O(wm+|V|(k+log|V|))
   

Applications in Brain Activity Tracking

   

       

     
In brain activity a motif may be in the form of the largest connected components (brain areas) preceding a seizure. Therefore, finding the significant motifs can be valuable for predicting the seizure periods. In other words, the strong similarity in motif occurrences in seizure data not only can be used in forecasting the unobserved outcome but also can help to isolate groups of neurons that trigger identical activity during the epileptic attacks [Sauer et al.].

In order to identify significant motifs we need powerful visualization tools which show the similarity structure between the discovered motifs. For instance, we carried out a simple experiment with the brain activity datasets from epileptic patients during epileptic attacks [Yankov et al.]. This dataset consists of 100 time series of length 4087 each. The motif length m is set to 174 (the number of recordings per sec.) and the sliding window length is set to w = 4087.


Figure 2: Motifs discovered by kMoif algorithm in the brain activity data


In  Figure 2 , an example of the top-5 motifs found in the 4th ( Figure 2.a) and the 8th  (Figure 2.b) time series are displayed. Figure c highlights the strong similarity of the top-5 motifs not only in the same time series but also over the different time series. Particularly, Figure c on the left shows the similarity between the top-1 and the top-2 motif detected in Figure a, and Figure c  in the middle shows the similarity between the top-3 motif in Figure a and the top-2 motif in Figure b. Finally, Figure c on the right shows the similarity of the top-1 motif and the top-5 motif detected in b. These interpretable and visually meaningful motifs provide biologists with further insight on the motifs structure. Choosing the most significant one is easier with this powerful visualization.

 
Download the code
The source code was carefully cross checked by the authors of the work. You can download the code  here. Please kindly send us an email for the password to extract the Zip files. If you have any question regarding the usage and the possible bugs that you discover in the code, please also contact us for possible assistant.
Download the data
The datasets we use in our work including EOG, EEG, Insect, Random Walk and the Brain Activity  all can be downloaded on A. Mueen's websites
Paper
You can download the paper here.
Acknowledgements
We would like to thank NWO for their generous funding for our COMPASS project. We  deeply thank A. Mueen and professor  E. Keogh for their released datasets, source code and useful discussion in the early stage of the project. We also thank all the anonymous reviewers for their useful comments which help us improve our work significantly. 
 

Copyright  TU/e 2010  by Hoang Thanh Lam, Toon Calders and Pham Dang Ninh