在 r 中使用 fread 从文件中读取时解释新行 \n 字符
Interpreting new line \n character when reading from file using fread in r
我无法在 data.table
包中获取 fread
来按预期处理新行 (\n)。它们以“\n”而不是新行的形式出现(head
显示“\\n”而不是“\n”)。根据下面这个post我理解fread
应该可以处理这种情况:
fread and a quoted multi-line column value
我试过引用 ("string") 值列,结果相同。我错过了一个简单的解决方案或参数吗?他们应该以某种方式逃脱吗?这是一个说明问题的示例,以及我的实现:
[编辑:]一些说明,因此您无需阅读代码即可。 strings.txt的内容显示在下面的代码注释中# strings.txt
。该文件是一个制表符分隔的文本文件,有四列和三行加上一个 header 行。文件中的第一个条目 strMsg1
与 strAsIntended
相同。但是,fread
在读取文件时向 \n 添加了一个额外的反斜杠,这使得换行符变成了文字 \n。如何避免这种情况?我只需要能够将新行编码到我的字符串中。希望这是可以理解的。
[Edit2:]结果如图所示。
library(data.table)
library(gWidgets2)
# strings.txt
# scope order key value
# test_gui 1 strMsg1 Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line.
# test_gui 2 strMsg2 Some text does not contain new line characters.
# test_gui 3 strMsg3 Expand window to see text and button widgets
strAsIntended <- "Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line."
filePath <- "C:\path\to\strings.txt"
# Read file.
dt <- fread(file = filePath, sep = "\t", encoding = "UTF-8")
head(dt) # \n has become \n
# Set key column.
setkey(dt, key = "key")
# Get strings for the specific function.
dt <- dt[dt$scope == "test_gui", ]
# Get strings.
strText <- dt["strMsg1"]$value
strButton <- dt["strMsg2"]$value
strWinTitle <- dt["strMsg3"]$value
# Construct gui.
w <- gwindow(title = strWinTitle)
g <- ggroup(horizontal = FALSE, container = w, expand = TRUE, fill = "both")
gtext(text = strText, container = g)
gtext(text = strAsIntended, container = g)
gbutton(text = strButton, container = g)
[Edit3:] @user2554330 感谢您的解释和解决方案。这确实不是我想的那样。
这是一个更新后的工作代码示例,截图如下:
library(data.table)
library(gWidgets2)
# strings.txt
# scope order key value
# test_gui 1 strMsg1 Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line.
# test_gui 2 strMsg2 Some text does not contain new line characters.
# test_gui 3 strMsg3 Expand window to see text and button widgets
strAsIntended <- "Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line."
filePath <- "C:\Users\oskar\OneDrive\Dokument\R\win-library\3.6\strvalidator\extdata\languages\strings.txt"
# Read file.
dt <- fread(file = filePath, sep = "\t", encoding = "UTF-8")
# Check data. Not identical.
print(dt[1]$value) # print adds backslash \n
print(strAsIntended) # prints \n
cat(dt[1]$value) # cat prints as is \n
cat(strAsIntended) # prints with new line
# Set key column.
setkey(dt, key = "key")
# Get strings for the specific function.
dt <- dt[dt$scope == "test_gui", ]
# Fix new line character.
dt[ , value:=gsub("\n", "\n", value, fixed = TRUE)]
# Cehck data. Now identical and prints \n
print(dt[1]$value)
print(strAsIntended)
# Now identical and prints with a new line.
cat(dt[1]$value)
cat(strAsIntended)
# Get strings.
strText <- dt["strMsg1"]$value
strButton <- dt["strMsg2"]$value
strWinTitle <- dt["strMsg3"]$value
# Construct gui.
w <- gwindow(title = strWinTitle)
g <- ggroup(horizontal = FALSE, container = w, expand = TRUE, fill = "both")
gtext(text = strText, container = g)
gtext(text = strAsIntended, container = g)
gbutton(text = strButton, container = g)
运行:
R 版本 3.6.2 (2019-12-12)
平台:x86_64-w64-mingw32/x64(64 位)
运行 下:Windows 10 x64(内部版本 18362)
语言环境:
1LC_COLLATE=English_United王国.1252LC_CTYPE=English_United王国.1252
[3] LC_MONETARY=English_United王国.1252LC_NUMERIC=C
[5] LC_TIME=English_United王国.1252
您误解了 fread
在做什么。您的输入文件包含一个反斜杠,后跟 n
,这就是 fread
中的字符串所包含的内容。但是,当您打印包含反斜杠的字符串时,它会加倍。 (如果不需要,请使用 cat()
打印它。)您的 strAsIntended
变量不包含反斜杠,它包含单个换行符,显示为 \n
时印刷。
如果您想将输入文件中的\n
转换为换行符,请使用gsub
或其他替换函数。例如,
dt[,3] <- gsub("\n", "\n", dt[,3], fixed = TRUE)
我无法在 data.table
包中获取 fread
来按预期处理新行 (\n)。它们以“\n”而不是新行的形式出现(head
显示“\\n”而不是“\n”)。根据下面这个post我理解fread
应该可以处理这种情况:
fread and a quoted multi-line column value
我试过引用 ("string") 值列,结果相同。我错过了一个简单的解决方案或参数吗?他们应该以某种方式逃脱吗?这是一个说明问题的示例,以及我的实现:
[编辑:]一些说明,因此您无需阅读代码即可。 strings.txt的内容显示在下面的代码注释中# strings.txt
。该文件是一个制表符分隔的文本文件,有四列和三行加上一个 header 行。文件中的第一个条目 strMsg1
与 strAsIntended
相同。但是,fread
在读取文件时向 \n 添加了一个额外的反斜杠,这使得换行符变成了文字 \n。如何避免这种情况?我只需要能够将新行编码到我的字符串中。希望这是可以理解的。
[Edit2:]结果如图所示。
library(data.table)
library(gWidgets2)
# strings.txt
# scope order key value
# test_gui 1 strMsg1 Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line.
# test_gui 2 strMsg2 Some text does not contain new line characters.
# test_gui 3 strMsg3 Expand window to see text and button widgets
strAsIntended <- "Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line."
filePath <- "C:\path\to\strings.txt"
# Read file.
dt <- fread(file = filePath, sep = "\t", encoding = "UTF-8")
head(dt) # \n has become \n
# Set key column.
setkey(dt, key = "key")
# Get strings for the specific function.
dt <- dt[dt$scope == "test_gui", ]
# Get strings.
strText <- dt["strMsg1"]$value
strButton <- dt["strMsg2"]$value
strWinTitle <- dt["strMsg3"]$value
# Construct gui.
w <- gwindow(title = strWinTitle)
g <- ggroup(horizontal = FALSE, container = w, expand = TRUE, fill = "both")
gtext(text = strText, container = g)
gtext(text = strAsIntended, container = g)
gbutton(text = strButton, container = g)
[Edit3:] @user2554330 感谢您的解释和解决方案。这确实不是我想的那样。
这是一个更新后的工作代码示例,截图如下:
library(data.table)
library(gWidgets2)
# strings.txt
# scope order key value
# test_gui 1 strMsg1 Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line.
# test_gui 2 strMsg2 Some text does not contain new line characters.
# test_gui 3 strMsg3 Expand window to see text and button widgets
strAsIntended <- "Text with new line characters:\n1) The first point and the\n2) second point should be on separate lines\n\nThen perhaps some text below, separated by an empty line."
filePath <- "C:\Users\oskar\OneDrive\Dokument\R\win-library\3.6\strvalidator\extdata\languages\strings.txt"
# Read file.
dt <- fread(file = filePath, sep = "\t", encoding = "UTF-8")
# Check data. Not identical.
print(dt[1]$value) # print adds backslash \n
print(strAsIntended) # prints \n
cat(dt[1]$value) # cat prints as is \n
cat(strAsIntended) # prints with new line
# Set key column.
setkey(dt, key = "key")
# Get strings for the specific function.
dt <- dt[dt$scope == "test_gui", ]
# Fix new line character.
dt[ , value:=gsub("\n", "\n", value, fixed = TRUE)]
# Cehck data. Now identical and prints \n
print(dt[1]$value)
print(strAsIntended)
# Now identical and prints with a new line.
cat(dt[1]$value)
cat(strAsIntended)
# Get strings.
strText <- dt["strMsg1"]$value
strButton <- dt["strMsg2"]$value
strWinTitle <- dt["strMsg3"]$value
# Construct gui.
w <- gwindow(title = strWinTitle)
g <- ggroup(horizontal = FALSE, container = w, expand = TRUE, fill = "both")
gtext(text = strText, container = g)
gtext(text = strAsIntended, container = g)
gbutton(text = strButton, container = g)
运行:
R 版本 3.6.2 (2019-12-12) 平台:x86_64-w64-mingw32/x64(64 位) 运行 下:Windows 10 x64(内部版本 18362)
语言环境:
1LC_COLLATE=English_United王国.1252LC_CTYPE=English_United王国.1252
[3] LC_MONETARY=English_United王国.1252LC_NUMERIC=C
[5] LC_TIME=English_United王国.1252
您误解了 fread
在做什么。您的输入文件包含一个反斜杠,后跟 n
,这就是 fread
中的字符串所包含的内容。但是,当您打印包含反斜杠的字符串时,它会加倍。 (如果不需要,请使用 cat()
打印它。)您的 strAsIntended
变量不包含反斜杠,它包含单个换行符,显示为 \n
时印刷。
如果您想将输入文件中的\n
转换为换行符,请使用gsub
或其他替换函数。例如,
dt[,3] <- gsub("\n", "\n", dt[,3], fixed = TRUE)