近似字符串匹配的逻辑是什么?
What is the logic of approximate string matching?
有谁知道下面例子的原因是什么:
agrepl("cold", "cool")
#> [1] FALSE
agrepl("cool", "cold")
#> [1] TRUE
由于 max distance
默认为:
If cost is not given, all defaults to 10%, and the other transformation number bounds default to all. The component names can be abbreviated.
并且:
Expressed either as integer, or as a fraction of the pattern length times the maximal transformation cost (will be replaced by the smallest integer not less than the corresponding fraction)
长度为 4 的模式的默认最大转换量为 1。
cool
模式匹配 cold
开头的 col
,仅使用 1 个删除。更改 cold
以匹配 cool
将至少进行两次转换(两次替换或一次删除和一次插入)。
这些例子可能会进一步解释它:
agrepl("cold", "cool",max.distance = 1) # two changes necessary
#> [1] FALSE
agrepl("cold", "cool",max.distance = 2)
#> [1] TRUE
agrepl("cold", "coold") # just one addition necessary
#> [1] TRUE
有谁知道下面例子的原因是什么:
agrepl("cold", "cool")
#> [1] FALSE
agrepl("cool", "cold")
#> [1] TRUE
由于 max distance
默认为:
If cost is not given, all defaults to 10%, and the other transformation number bounds default to all. The component names can be abbreviated.
并且:
Expressed either as integer, or as a fraction of the pattern length times the maximal transformation cost (will be replaced by the smallest integer not less than the corresponding fraction)
长度为 4 的模式的默认最大转换量为 1。
cool
模式匹配 cold
开头的 col
,仅使用 1 个删除。更改 cold
以匹配 cool
将至少进行两次转换(两次替换或一次删除和一次插入)。
这些例子可能会进一步解释它:
agrepl("cold", "cool",max.distance = 1) # two changes necessary
#> [1] FALSE
agrepl("cold", "cool",max.distance = 2)
#> [1] TRUE
agrepl("cold", "coold") # just one addition necessary
#> [1] TRUE