To prevent spam users, you can only join this forum by invitation. If you want to join the forum, please send me a mail (h DOT m DOT w DOT verbeek AT tue DOT nl) and I'll send you an invitation in return.
On June 4th, we noticed that many discussions were gone from the forum. To fix this, a backup of May 22nd has been restored. As a result, any addition from May 22nd to June 4th is now lost. Please add it again if necessary. Apologies for the inconvenience.
mapping strings
  • ankurankur
    Posts: 2
    everybody, i want to map some strings(word) with number. the similar the string, the nearer their value(mapped number) . also, while checking the positional combination of the letters should impact the mapping.the mapping function should be funtion of letters, positions(combination given position of letter thepriority such as pit and tip should be different), number of letters.

    well, i would give some examples : starter, stater , stapler, startler, tstarter are some words. These words are of format "(*optinal)sta(*opt)*er" where * denotes some sort of variable in our case it is either 't' or 'l' (i.e. in case of starter and staler). these all should be mapped INDIVIDUALLY, without context to other such that their value are not of much difference. and later on which creating groups i can put appropiate range of numbers for differentiating groups.

    so while mapping the string their values should be similar. there are many words, so comparing each other would be complex. so mapping with some numeric value for each word independently and putting the similar string( as they have similar value) in a group and then later find these pattern by other means.

    So, for now i need to look up for some existing methods of mapping such that similar strings( i guess i have clarify the term 'similar' for my context) have similar value and these value should be different to the dissimilar ones. please, again I emphasize that the number of string would be huge and comparing each with other is practically impossible(or computationally expensive and much slow).SO WHAT I THINK IS TO DEVISE AN ALGORITHM(taking help from existing ones) FOR MAPPING WORD(STRING) ON ITS OWN


    have i made you clear? please give me some idea to start with. some terms to search and research.
    Thank you all
    Post edited by ankur at 2011-07-10 08:54:25
  • JBuijsJBuijs
    Posts: 389
    Hi Ankur,

    First of all, welcome aboard!

    I think I understand your question but this might not be the best place for it (since we are more into process mining algorithms and not as much in to string similarity/clustering algorithms).

    There exists an open source Java project in which multiple string similarity metrics are implemented. Unfortunately, at the moment the main webpage can not be accessed but you might want to try the following webpages:
    http://en.wikipedia.org/wiki/SimMetrics
    http://sourceforge.net/projects/simmetrics/

    Good luck!
    Joos Buijs
    Forum Admin
  • ankurankur
    Posts: 2
    ohh...sorry sir, thank you very much for replying and suggestions. sorry again for irrelevant post. should i need to remove it?

  • JBuijsJBuijs
    Posts: 389
    Hi Ankur,

    No, no need to remove it, it might help other people that are searching for string matching.

    Good luck!!!
    Joos Buijs
    Forum Admin

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

In this Discussion

Tagged