如何使这些视觉上相同的字符串在计算上相等？

Question

上下文：

我正在寻找基于字符向量连接两个 tibbles，但在 write.csv() 和 read.csv() 之间发生了一些事情，这使得它们不等价。在下面的 reprex 中，str_cmp() returns 0（一个 'match'），但在我的实际项目中它返回 -1（字符串不可比较）。我不知道为什么会这样。

无论如何，str_cmp() 对我来说用处不大，因为 dplyr::left_join 以相等的值连接，不能使用函数。

如何更改这些字符串之一，以便str1==str2 returns TRUE？我需要能够对整个字符向量执行此操作，以便我可以执行以下操作：

dplyr::left_join(tibble1,tibble2,by = c("charVector1"="charVector2") charVector1 和 charVector2 是从中提取 str1 和 str2 的列。

代表：

#DL 19/10/30
## Tryna work out why these strings aren't the same
#####################################################################

#Get strings from GitHub repo ---------------------------------------
read.table(
  "https://raw.githubusercontent.com/davelovellCARU/stringHelp/master/string1.txt"
) ->
  str1

read.table(
  "https://raw.githubusercontent.com/davelovellCARU/stringHelp/master/string2.txt"
) ->
  str2

# The strings are not equal -----------------------------------------
str1 == str2
#>       x
#> 1 FALSE
# But they look the same and the computer knows it ------------------
stringi::stri_cmp(str1, str2)
#> [1] 0

^{由 reprex package (v0.3.0)}

于 2019-10-30 创建

Answer 1

知道了！

有一个简洁的函数可以执行此操作：stringclean::replace_non_ascii(string) 我运行它在两个字符串上，现在它们是一样的。只要把它变成一个变异，tibbles 就会加入。

如何使这些视觉上相同的字符串在计算上相等？

How can I make these visually identical strings computationally equal?

string

r

stringi

上下文：

代表：