To prevent spam users, you can only post on this forum after registration, which is by invitation. If you want to post on the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return for an account.

mapping strings

ankur
edited July 2011 in Process Mining

everybody, i want to map some strings(word) with number. the similar the string, the nearer their value(mapped number) . also, while checking the positional combination of the letters should impact the mapping.the mapping function should be funtion of letters, positions(combination given position of letter thepriority such as pit and tip should be different), number of letters.

well, i would give some examples : starter, stater , stapler, startler, tstarter are some words. These words are of format "(*optinal)sta(*opt)*er" where * denotes some sort of variable in our case it is either 't' or 'l' (i.e. in case of starter and staler). these all should be mapped INDIVIDUALLY, without context to other such that their value are not of much difference. and later on which creating groups i can put appropiate range of numbers for differentiating groups.

so while mapping the string their values should be similar. there are many words, so comparing each other would be complex. so mapping with some numeric value for each word independently and putting the similar string( as they have similar value) in a group and then later find these pattern by other means.

So, for now i need to look up for some existing methods of mapping such that similar strings( i guess i have clarify the term 'similar' for my context) have similar value and these value should be different to the dissimilar ones. please, again I emphasize that the number of string would be huge and comparing each with other is practically impossible(or computationally expensive and much slow).SO WHAT I THINK IS TO DEVISE AN ALGORITHM(taking help from existing ones) FOR MAPPING WORD(STRING) ON ITS OWN


have i made you clear? please give me some idea to start with. some terms to search and research.
Thank you all

Comments

  • Hi Ankur,

    First of all, welcome aboard!

    I think I understand your question but this might not be the best place for it (since we are more into process mining algorithms and not as much in to string similarity/clustering algorithms).

    There exists an open source Java project in which multiple string similarity metrics are implemented. Unfortunately, at the moment the main webpage can not be accessed but you might want to try the following webpages:
    http://en.wikipedia.org/wiki/SimMetrics
    http://sourceforge.net/projects/simmetrics/

    Good luck!
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
  • ohh...sorry sir, thank you very much for replying and suggestions. sorry again for irrelevant post. should i need to remove it?

  • Hi Ankur,

    No, no need to remove it, it might help other people that are searching for string matching.

    Good luck!!!
    Joos Buijs

    Senior Data Scientist and process mining expert at APG (Dutch pension fund executor).
    Previously Assistant Professor in Process Mining at Eindhoven University of Technology
Sign In or Register to comment.