尽管在 source() 中设置了 UTF-8 编码，但未使用

Question

我不明白这是怎么回事（在 Windows 平台上使用 RStudio）：

保存脚本test_abc.R

a <- "ä"
b <- "ü"
c <- "ö"

然后，运行下面的脚本Test.R：

compare_text <- function() {
  l <- list()
  if (a != a2) {
    l[[1]] <- c(a, a2)
  }
  if (b != b2) {
    l[[1]] <- c(b, b2)
  }
  if (c != c2) {
    l[[1]] <- c(c, c2)
  }
}

a <- "ä"
b <- "ü"
c <- "ö"
a2 <- "ä"
b2 <- "ü"
c2 <- "ö"

out_text <- compare_text()
# The next active "source-line" overwrites a, b and c!
source("path2/test2_abc.R") # called "V1" OR
# source("path2/test2_abc.R", encoding = "UTF-8") # called "V2"
out_text2 <- compare_text()
print(out_text)
print(out_text2)

如果你运行 V1 版本的脚本test.R你会得到

source('~/Desktop/test1.R', encoding = 'UTF-8')
# NULL
# [1] "Ã¶" "ö"

尽管它声明它是运行使用 UTF-8 编码。
如果你运行版本 "V2" 中的脚本 test.R 你会得到

source('~/Desktop/test1.R', encoding = 'UTF-8') 
# NULL
# NULL

不知道相关的是否有帮助。

Answer 1

在 V1 中，您获取一个文件而不指定该文件的编码 (test_abc.R)。源帮助的 "encoding" 部分说：

By default the input is read and parsed in the current encoding of the R session. This is usually what it required, but occasionally re-encoding is needed, e.g. if a file from a UTF-8-using system is to be read on Windows (or vice versa).

无法正确读取 "Umlaute" 和函数 compare_text returns c(c, c2) 因为 c != c2 为真。

在 V2 中，"Umlaute" 被正确读取并且 compare_text 函数 returns 为空（未发现差异）。

在源函数中读取文件的是R本身。 R使用默认编码OS。在 Windows 上，这是（主要是？）"Windows code page 1252"，它与 UTF-8 不同。您可以使用 Sys.getlocale() 在您的机器上对其进行测试。这就是为什么你必须告诉 R 你想要获取源文件的编码是 UTF-8

尽管在 source() 中设置了 UTF-8 编码，但未使用

UTF-8 encoding not used although it is set in source()

r

utf-8

rstudio