Calculating levenshtein distance within a list Python -
Calculating levenshtein distance within a list Python -
i have list of strings , want filter out strings similar based on levenstein distance. if lev(list[0], list[10]) < 50; del list[10]. there way can calculate such distance between every pair of strings in list, more efficiently?? thanks!!
data2= [] in data: index, j in enumerate(data): s = levenshtein(i, j) if s < 50: del data[index] data2.append(i) the rather dumb code above taking long compute...
what if kept indexes of hit-strings , skipped them later? ignore how much enumerate() , del() weigh , percentage of hits (i.e. how many strings must removed dataset).
threshold = 50 info = ["hel", "how", "are", "you"] # replace dataset tbr = {} # holds index of strings removed idx = 0 in data: j in xrange(len(data)): if j != idx , levenshtein(i, data[j]) < threshold: tbr[j] = true idx += 1 # print tbr data2 = [] idx = -1 d in data: idx += 1 if idx in tbr: go on # skip string data2.append(d) # print data2 python levenshtein-distance edit-distance
Comments
Post a Comment