Calculating levenshtein distance within a list Python -



Calculating levenshtein distance within a list Python -

i have list of strings , want filter out strings similar based on levenstein distance. if lev(list[0], list[10]) < 50; del list[10]. there way can calculate such distance between every pair of strings in list, more efficiently?? thanks!!

data2= [] in data: index, j in enumerate(data): s = levenshtein(i, j) if s < 50: del data[index] data2.append(i)

the rather dumb code above taking long compute...

what if kept indexes of hit-strings , skipped them later? ignore how much enumerate() , del() weigh , percentage of hits (i.e. how many strings must removed dataset).

threshold = 50 info = ["hel", "how", "are", "you"] # replace dataset tbr = {} # holds index of strings removed idx = 0 in data: j in xrange(len(data)): if j != idx , levenshtein(i, data[j]) < threshold: tbr[j] = true idx += 1 # print tbr data2 = [] idx = -1 d in data: idx += 1 if idx in tbr: go on # skip string data2.append(d) # print data2

python levenshtein-distance edit-distance

Comments

Popular posts from this blog

java - How to set log4j.defaultInitOverride property to false in jboss server 6 -

c - GStreamer 1.0 1.4.5 RTSP Example Server sends 503 Service unavailable -

Using ajax with sonata admin list view pagination -