Did you know ... | Search Documentation: |
library(isub): isub: a string similarity measure |
The library(isub)
implements a similarity measure
between strings, i.e., something similar to the Levenshtein distance.
This method is based on the length of common substrings.
?- isub('E56.Language', 'languange', D, [normalize(true)]). D = 0.4226950354609929. % [-1,1] range ?- isub('E56.Language', 'languange', D, [normalize(true),zero_to_one(true)]). D = 0.7113475177304964. % [0,1] range ?- isub('E56.Language', 'languange', D, []). % without normalization D = 0.19047619047619047. % [-1,1] range ?- isub(aa, aa, D, []). % does not work for short substrings D = -0.8. ?- isub(aa, aa, D, [substring_threshold(0)]). % works with short substrings D = 1.0. % but may give unwanted values % between e.g. 'store' and 'spore'. ?- isub(joe, hoe, D, [substring_threshold(0)]). D = 0.5315315315315314. ?- isub(joe, hoe, D, []). D = -1.0.
This is a new version of isub/4 which replaces the old version while providing backwards compatibility. This new version allows several options to tweak the algorithm.
Text1 | and Text2 are either an atom, string or a list of characters or character codes. |
Similarity | is a float in the range [-1,1.0], where 1.0 means most similar. The range can be set to [0,1] with the zero_to_one option described below. |
Options | is a list with elements described
below. Please note that the options are processed at compile time using
goal_expansion to provide much better speed. Supported options are:
|