创建一种算法来定义 Javascript 中字符串的匹配概率

Question

我正在制作 google 地图并试图找到指定的地点。但是，有时我会得到更多指定地点的结果（因为我试图在整个城市中搜索带有名称的地点）。所以我将输出限制为前三个匹配的建议，并排除了除此之外的所有结果。但问题是，前 3 个匹配的建议仍然有可能在同一个地方，我只想显示他们的一个建议以使建议准确。例如：

我搜索过"Pizza Hut"
我得到前三个结果如下：
1. 斋浦尔 MI 路必胜客 --- 第一个建议
2. 必胜客，MI 路，Ajmeri 门，斋浦尔 --- 第二个建议
3. 必胜客，Malviya Nagar，斋浦尔 --- 第三条建议

第一个和第二个建议表示同一个地方。

现在我想应用预测方法（使用概率）来发现第一个和第二个建议是相同的地方，而第三个是不同的地方。

我的方法是什么：-

var p = [], ap = []; //p -- places & ap -- array of splitted strings
    p[0] = "Pizza Hut, MI Road, Jaipur";
    p[1] = "Pizza Hut, MI Road, Ajmeri Gate, Jaipur";
    p[2] = "Pizza Hut, Malviya Nagar, Jaipur";

    //split all places
    ap[0] = p[0].split(",");
    ap[1] = p[1].split(",");
    ap[2] = p[2].split(",");

    /*
     --- Theoretically ---
     ### Splitting strings into symbols ###
     string_symbols_1 = a1, a2, a3;  --- a1 = "Pizza Hut", a2 = "MI Road", a3="Jaipur"; 
     string_symbols_2 = b1, b2, b3, b4; --- b1 = "Pizza Hut", b2 = "MI Road", b3 = "Ajmeri 
     Gate", b4 = "Jaipur"
     string_symbols_3 = c1, c2, c3; --- c1 = "Pizza Hut", c2 = "Malviya Nagar", c3 = 
     "Jaipur"

     ### On Prediction Basis ###
     I am trying to evaluate that if 60% of the symbols match with 
     another string symbols then there is probability that both strings are 
     same. 
     From above case I am considering if I am able to find >40% unique 
     symbols in both strings (that is being compared) then there is 
     probability that both strings are unique. (It will reduce the 60% 
     comparison to 40% comparison in best cases).
     Once found the unique strings return their indexes;
    */

    //pseudo implementation
    function findUniquePlaces(ap){
       //stuck here..
       //now match the splitted string arrays to find the unique places 
       //what should be the logic
       return index(0 and 2)
    }

我知道如何实现它。但我想知道什么是最好的实现方式。我想确保这个任务一定不是计算密集型任务。我听说过 map reduce 技术。我应该使用 map reduce 技术还是其他一些计算成本更低的技术。

Answer 1

这背后的一般理论是字符串度量 https://en.wikipedia.org/wiki/String_metric 计算字符串之间的距离，然后只使用现成的解决方案之一：

https://www.npmjs.com/package/fast-levenshtein

https://www.npmjs.com/package/js-levenshtein

https://www.npmjs.com/package/string-similarity

或寻找您可以在 google npm string distance

中找到的任何其他内容

这些速度非常快，所以您的问题可能更多是尺寸问题而不是速度问题。

或者你可以使用模糊搜索https://glench.github.io/fuzzyset.js/

此外，您获得的这个列表可能已经在下面使用了这些算法，所以先拿吧？

创建一种算法来定义 Javascript 中字符串的匹配概率

Creating an algorithm to define matching probability of strings in Javascript

javascript

probability

string-comparison