比较 R 中的两个字符串并查看添加、删除
Compare two strings in R and see additions, deletions
我想比较 R 中的两个字符值,看看添加和删除了哪些字符以便稍后显示,类似于 git diff --color-words=.
(见下面的屏幕截图)
例如:
a <- "hello world"
b <- "helo world!"
diff <- FUN(a, b)
其中 diff
会以某种方式显示删除了 l
并添加了 !
。
最终目标是构造一个像这样的 html 字符串 hel<span class="deleted">l</span>o world<span class="added">!</span>
。
我知道 diffobj
,但到目前为止我无法了解 return 字符差异,只有元素之间的差异。
git diff --color-words=.
的输出
输出如下所示:
找到了使用 diffobj::ses_dat()
并将数据拆分为之前的字符的解决方案。
get_html_diff <- function(a, b) {
aa <- strsplit(a, "")[[1]]
bb <- strsplit(b, "")[[1]]
s <- diffobj::ses_dat(aa, bb)
m <- cumsum(as.integer(s$op) != c(Inf, s$op[1:(length(s$op) - 1)]))
res <- paste(
sapply(split(seq_along(s$op), m), function(i) {
val <- paste(s$val[i], collapse = "")
if (s$op[i[[1]]] == "Insert")
val <- paste0("<span class=\"add\">", val, "</span>")
if (s$op[i[[1]]] == "Delete")
val <- paste0("<span class=\"del\">", val, "</span>")
val
}),
collapse = "")
res
}
get_html_diff("hello world", "helo World!")
#> [1] "hel<span class=\"del\">l</span>o <span class=\"del\">w</span><span class=\"add\">W</span>orld<span class=\"add\">!</span>"
由 reprex package (v2.0.1)
创建于 2022-05-31
Base R 有一个函数 adist
可以计算广义 Levenshtein 距离。使用参数 count
和 partial
属性 "trafos"
设置为从一个字符串到另一个字符串所需的匹配、插入和删除序列。从文档的值部分,我强调:
If counts
is TRUE
, the transformation counts are returned as the "counts"
attribute of this matrix, as a 3-dimensional array with dimensions corresponding to the elements of x, the elements of y, and the type of transformation (insertions, deletions and substitutions), respectively. Additionally, if partial = FALSE
, the transformation sequences are returned as the "trafos"
attribute of the return value, as character strings with elements ‘M’, ‘I’, ‘D’ and ‘S’ indicating a match, insertion, deletion and substitution, respectively. If partial = TRUE
, the offsets (positions of the first and last element) of the matched substrings are returned as the "offsets" attribute of the return value (with both offsets -1−1 in case of no match).
a <- "hello world"
b <- "helo world!"
attr(adist(a, b, counts = TRUE), "trafos")
#> [,1]
#> [1,] "MMDMMMMMMMMI"
由 reprex package (v2.0.1)
创建于 2022-05-31
第3个字符有删除,字符串末尾有插入a
。
我们使用diffobj
来比较配置文件(在或多或少的生产环境中),它工作得很好。在你的情况下,diffobj::diffChr
不是你想要的吗?
diffobj::diffChr("hello world", "helo world!", color.mode = 'rgb')
我想比较 R 中的两个字符值,看看添加和删除了哪些字符以便稍后显示,类似于 git diff --color-words=.
(见下面的屏幕截图)
例如:
a <- "hello world"
b <- "helo world!"
diff <- FUN(a, b)
其中 diff
会以某种方式显示删除了 l
并添加了 !
。
最终目标是构造一个像这样的 html 字符串 hel<span class="deleted">l</span>o world<span class="added">!</span>
。
我知道 diffobj
,但到目前为止我无法了解 return 字符差异,只有元素之间的差异。
git diff --color-words=.
的输出
输出如下所示:
找到了使用 diffobj::ses_dat()
并将数据拆分为之前的字符的解决方案。
get_html_diff <- function(a, b) {
aa <- strsplit(a, "")[[1]]
bb <- strsplit(b, "")[[1]]
s <- diffobj::ses_dat(aa, bb)
m <- cumsum(as.integer(s$op) != c(Inf, s$op[1:(length(s$op) - 1)]))
res <- paste(
sapply(split(seq_along(s$op), m), function(i) {
val <- paste(s$val[i], collapse = "")
if (s$op[i[[1]]] == "Insert")
val <- paste0("<span class=\"add\">", val, "</span>")
if (s$op[i[[1]]] == "Delete")
val <- paste0("<span class=\"del\">", val, "</span>")
val
}),
collapse = "")
res
}
get_html_diff("hello world", "helo World!")
#> [1] "hel<span class=\"del\">l</span>o <span class=\"del\">w</span><span class=\"add\">W</span>orld<span class=\"add\">!</span>"
由 reprex package (v2.0.1)
创建于 2022-05-31Base R 有一个函数 adist
可以计算广义 Levenshtein 距离。使用参数 count
和 partial
属性 "trafos"
设置为从一个字符串到另一个字符串所需的匹配、插入和删除序列。从文档的值部分,我强调:
If
counts
isTRUE
, the transformation counts are returned as the"counts"
attribute of this matrix, as a 3-dimensional array with dimensions corresponding to the elements of x, the elements of y, and the type of transformation (insertions, deletions and substitutions), respectively. Additionally, ifpartial = FALSE
, the transformation sequences are returned as the"trafos"
attribute of the return value, as character strings with elements ‘M’, ‘I’, ‘D’ and ‘S’ indicating a match, insertion, deletion and substitution, respectively. Ifpartial = TRUE
, the offsets (positions of the first and last element) of the matched substrings are returned as the "offsets" attribute of the return value (with both offsets -1−1 in case of no match).
a <- "hello world"
b <- "helo world!"
attr(adist(a, b, counts = TRUE), "trafos")
#> [,1]
#> [1,] "MMDMMMMMMMMI"
由 reprex package (v2.0.1)
创建于 2022-05-31第3个字符有删除,字符串末尾有插入a
。
我们使用diffobj
来比较配置文件(在或多或少的生产环境中),它工作得很好。在你的情况下,diffobj::diffChr
不是你想要的吗?
diffobj::diffChr("hello world", "helo world!", color.mode = 'rgb')