Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

In this Discussion

To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

ProM5 - Hamming distance in Social Network Miner (Similar task)

edited September 2010 in - Usage
Hi all,
Trying to understand how the social network miner works, I'm experimenting some troubles with the "Similar task" metrics. In particular, it seems that the Hamming distance implementation differs from the definition given in the corresponding paper.

In practice, instead of the distance function defined in the paper:

d(x,y) = 0 if ((x>0 && y>0) || (x == y == 0))
= 1 otherwise

in UtilOperation.java, line 110, the following distance function is implemented:

d'(x,y) = 0 if (x == y)
= 1 otherwise

that is, if A performs activity act 1 time and B performs the same activity 2 times, their distance with respect to act is 0 for d and 1 for d'.

Is my obvservation correct? I am not sure about d', that I extracted from the code. If I am right, why the implemented distance differs from the paper one?

Moreover, it sounds strange to me also the calculation of the overall distance between originators A,B as

(column - temp) / column

(line 114), because a greater temp (which counts how many times activitites differ) leads to a smaller distance, while I would expect

temp/column

which assigns a greater value to those pair having a smaller distance.

In this case is simply a issue of inversion, but again I think that there is a discrepancy between the paper and the implementation. There is no problem, except that a user is not sure of the meaning of a greater or lower value on the edge from A to B (maybe the convention is written somewhere and I didn't find out that place).

Tagged: