在 r 中使用 fread 从文件中读取时解释新行 \n 字符

Interpreting new line \n character when reading from file using fread in r

我无法在 data.table 包中获取 fread 来按预期处理新行 (\n)。它们以“\n”而不是新行的形式出现(head 显示“\\n”而不是“\n”)。根据下面这个post我理解fread应该可以处理这种情况: fread and a quoted multi-line column value

我试过引用 ("string") 值列,结果相同。我错过了一个简单的解决方案或参数吗?他们应该以某种方式逃脱吗?这是一个说明问题的示例,以及我的实现:

[编辑:]一些说明,因此您无需阅读代码即可。 strings.txt的内容显示在下面的代码注释中# strings.txt。该文件是一个制表符分隔的文本文件,有四列和三行加上一个 header 行。文件中的第一个条目 strMsg1strAsIntended 相同。但是,fread 在读取文件时向 \n 添加了一个额外的反斜杠,这使得换行符变成了文字 \n。如何避免这种情况?我只需要能够将新行编码到我的字符串中。希望这是可以理解的。

[Edit2:]结果如图所示。

library(data.table)
library(gWidgets2)

# strings.txt
# scope order   key value
# test_gui  1   strMsg1 Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line.
# test_gui  2   strMsg2 Some text does not contain new line characters.
# test_gui  3   strMsg3 Expand window to see text and button widgets

strAsIntended <- "Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line."
filePath <- "C:\path\to\strings.txt"

# Read file.
dt <- fread(file = filePath, sep = "\t", encoding = "UTF-8")
head(dt) # \n has become \n

# Set key column.
setkey(dt, key = "key")

# Get strings for the specific function.
dt <- dt[dt$scope == "test_gui", ]

# Get strings.
strText <- dt["strMsg1"]$value
strButton <- dt["strMsg2"]$value
strWinTitle <- dt["strMsg3"]$value

# Construct gui.
w <- gwindow(title = strWinTitle)
g <- ggroup(horizontal = FALSE, container = w, expand = TRUE, fill = "both")
gtext(text = strText, container = g)
gtext(text = strAsIntended, container = g)
gbutton(text = strButton, container = g)

[Edit3:] @user2554330 感谢您的解释和解决方案。这确实不是我想的那样。

这是一个更新后的工作代码示例,截图如下:

library(data.table)
library(gWidgets2)

# strings.txt
# scope order   key value
# test_gui  1   strMsg1 Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line.
# test_gui  2   strMsg2 Some text does not contain new line characters.
# test_gui  3   strMsg3 Expand window to see text and button widgets

strAsIntended <- "Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line."
filePath <- "C:\Users\oskar\OneDrive\Dokument\R\win-library\3.6\strvalidator\extdata\languages\strings.txt"

# Read file.
dt <- fread(file = filePath, sep = "\t", encoding = "UTF-8")

# Check data. Not identical.
print(dt[1]$value) # print adds backslash \n
print(strAsIntended) # prints \n
cat(dt[1]$value) # cat prints as is \n
cat(strAsIntended) # prints with new line

# Set key column.
setkey(dt, key = "key")

# Get strings for the specific function.
dt <- dt[dt$scope == "test_gui", ]

# Fix new line character.
dt[ , value:=gsub("\n", "\n", value, fixed = TRUE)]

# Cehck data. Now identical and prints \n
print(dt[1]$value)
print(strAsIntended) 
# Now identical and prints with a new line.
cat(dt[1]$value)
cat(strAsIntended)

# Get strings.
strText <- dt["strMsg1"]$value
strButton <- dt["strMsg2"]$value
strWinTitle <- dt["strMsg3"]$value

# Construct gui.
w <- gwindow(title = strWinTitle)
g <- ggroup(horizontal = FALSE, container = w, expand = TRUE, fill = "both")
gtext(text = strText, container = g)
gtext(text = strAsIntended, container = g)
gbutton(text = strButton, container = g)

运行:

R 版本 3.6.2 (2019-12-12) 平台:x86_64-w64-mingw32/x64(64 位) 运行 下:Windows 10 x64(内部版本 18362)

语言环境: 1LC_COLLATE=English_United王国.1252LC_CTYPE=English_United王国.1252
[3] LC_MONETARY=English_United王国.1252LC_NUMERIC=C
[5] LC_TIME=English_United王国.1252

您误解了 fread 在做什么。您的输入文件包含一个反斜杠,后跟 n,这就是 fread 中的字符串所包含的内容。但是,当您打印包含反斜杠的字符串时,它会加倍。 (如果不需要,请使用 cat() 打印它。)您的 strAsIntended 变量不包含反斜杠,它包含单个换行符,显示为 \n 时印刷。

如果您想将输入文件中的\n转换为换行符,请使用gsub或其他替换函数。例如,

dt[,3] <- gsub("\n", "\n", dt[,3], fixed = TRUE)