To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

How can I incorporate context info to the profile vector in Trace Clustering using MCL clustering?

I want to know whether are there any need of  incorporating position information(other than bag of terms) to the profile vector(using case attributes) in Trace Clustering using MCL clustering. I cannot think of an application. I'm grateful if one can clarify this requirement.
Thanks and Regards,
Gayan Buddhika


  • bhompesbhompes Posts: 10
    edited November 2016
    Dear Gayan,

    The perspectives that you need to use depend on your application. It might well be that (as answered in your other question) you want to group cases based on control-flow attributes, or, alteratively, on data attributes, or both.

    What exactly do you mean with position information? Do you mean nformation about the case or about the vector dimensions?

    One application of having control flow information in the profile vectors is when you're using case/trace clustering for outlier detection. In this case, you might want to group (in order to filter) those cases that share a missing activity for example.

    Post edited by bhompes on
    Bart Hompes - Eindhoven University of Technology
  • GayanBuddhikaGayanBuddhika Posts: 6
    edited November 2016
    Dear bhompes,
    Sorry if you confused with my question. By position information what I meant was the order of values in the selected case dimension. I am not considering the event attributes,but case attributes. Are there any occasions we need to consider the order of the values in the profile vector with selected case attributes.
    Gayan Buddhika
    Post edited by GayanBuddhika on
  • If I understand correctly, you want to know whether the location of the case attributes (that you choose as your clustering perspectives) in the profile vectors is important. In other words, whether the order of terms in the feature vector matters.

    The answer is that it depends on your similarity measure. For the default similarity measure (cosine similarity), this is not the case. It might be important in case you want to use another similarity measure though. I'll try to explain a bit more below.

    In broad terms what the technique does is the following:
    - Choose clustering perspectives -> these will define the dimensions of the profile vectors
    - Map every case to a profile vector -> result is a vector of equal length for every case
    - Calculate similarity between the profile vectors -> result is a similarity matrix
    - Apply MCL algorithm on this similarity matrix -> result is a clustering

    So, your similarity measure can work pairwise or set-wise to construct the similarity matrix, and it may need an ordering over the vector dimensions (i.e. the perspectives). However, the default way the similarity matrix is constructed is using pair-wise cosine similarity between the profile vectors, and for this technique the ordering of terms in the vector is lost. For more information see section 4 of the paper (here).

    I hope this answers your question.
    Bart Hompes - Eindhoven University of Technology
  • Dear bhompes,
    Thanks for your response  :)
    Yes you've got me.I'm also motivated to raise this question after reading that part in the paper.I have included that part below.
    "A typical downside of vector similarity measures is that the order of terms is lost. This problem can be solved by incorporating order in the perspectives, such as the occurrence of frequent patterns."
    I am not clear the solution proposed here.
    If I imagined an situation where departments which are involved in the curation process(healthcare) has to be consider in particular order in the perspectives(dimensions)in MCL can I incorporate this in the profile vector. what do you mean by occurence of frequent patterns?Are you meaning a solution something like adding feature sets (maximal repeats etc.) to the profile vector which are proposed in the literature in control flow aspects. Please guide me. Thanks again.
Sign In or Register to comment.