近似字符串匹配的逻辑是什么？

Question

有谁知道下面例子的原因是什么：

agrepl("cold", "cool")
#> [1] FALSE
agrepl("cool", "cold")
#> [1] TRUE

Answer 1

由于 max distance 默认为：

If cost is not given, all defaults to 10%, and the other transformation number bounds default to all. The component names can be abbreviated.

并且：

Expressed either as integer, or as a fraction of the pattern length times the maximal transformation cost (will be replaced by the smallest integer not less than the corresponding fraction)

长度为 4 的模式的默认最大转换量为 1。 cool 模式匹配 cold 开头的 col，仅使用 1 个删除。更改 cold 以匹配 cool 将至少进行两次转换（两次替换或一次删除和一次插入）。

这些例子可能会进一步解释它：

agrepl("cold", "cool",max.distance = 1) # two changes necessary
#> [1] FALSE
agrepl("cold", "cool",max.distance = 2)
#> [1] TRUE
agrepl("cold", "coold") # just one addition necessary
#> [1] TRUE

近似字符串匹配的逻辑是什么？

What is the logic of approximate string matching?

r

approximate

agrep