read_tsv 将数据错误地解析为 R

Question

我目前正在研究 Mac OS 并尝试使用 tidyverse 中的 read_tsv 来读取下面的 txt 文件:

igg oxygen
881 34.6
1290    45
2147    62.3
1909    58.9
1282    42.5
1530    44.3
2067    67.9
1982    58.5
1019    35.6
1651    49.6
752 33
1687    52
1782    61.4
1529    50.2
969 34.1
1660    52.5
2121    69.9
1382    38.8
1714    50.6
1959    69.4
1158    37.4
965 35.1
1456    43
1273    44.1
1418    49.8
1743    54.4
1997    68.5
2177    69.5
1965    63
1264    43.2

但是，当我尝试读入文件时，出现以下问题：

exerimmun <- read_tsv(file = "./exerimmun.txt")

── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
cols(
  i = col_logical(),
  col_logical()
)

Warning: 124 parsing failures.
row col           expected    actual                                        file
  1  i  1/0/T/F/TRUE/FALSE                                     './exerimmun.txt'
  1  -- 2 columns          1 columns                           './exerimmun.txt'
  2  i  1/0/T/F/TRUE/FALSE                                     './exerimmun.txt'
  2     1/0/T/F/TRUE/FALSE                                     './exerimmun.txt'
  3  i  1/0/T/F/TRUE/FALSE                                     './exerimmun.txt'
... ... .................. ......... ...........................................
See problems(...) for more details.

据我所知，数据似乎在 txt 文件中被正确解析，所以我不确定为什么我在将它读入 R 时遇到问题。这是我使用 problems(exerimmun)

时的结果

> problems(exerimmun)
# A tibble: 124 x 5
     row col   expected           actual      file                                       
   <int> <chr> <chr>              <chr>       <chr>                                      
 1     1 "i"   1/0/T/F/TRUE/FALSE ""          './exerimmun.txt'
 2     1  NA   2 columns          "1 columns" './exerimmun.txt'
 3     2 "i"   1/0/T/F/TRUE/FALSE ""          './exerimmun.txt'
 4     2 ""    1/0/T/F/TRUE/FALSE ""          './exerimmun.txt'
 5     3 "i"   1/0/T/F/TRUE/FALSE ""          './exerimmun.txt'
 6     3  NA   2 columns          "1 columns" './exerimmun.txt'
 7     4 "i"   1/0/T/F/TRUE/FALSE ""          './exerimmun.txt'
 8     4 ""    1/0/T/F/TRUE/FALSE ""          './exerimmun.txt'
 9     5 "i"   1/0/T/F/TRUE/FALSE ""          './exerimmun.txt'
10     5  NA   2 columns          "1 columns" './exerimmun.txt'
# … with 114 more rows

对我来说，这应该可以正常工作，因为数据只有两列。在查看有关如何读取 txt 文件的文档后，我不确定我遗漏了什么。

编辑：我试过 read.table("./exerimmun.txt") 并得到以下结果：

Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec,  : 
  invalid multibyte string at '<ff><fe>i'
In addition: Warning messages:
1: In read.table(file = "./exerimmun.txt") :
  line 1 appears to contain embedded nulls
2: In read.table(file = "./exerimmun.txt") :
  line 2 appears to contain embedded nulls
3: In read.table(file = "./exerimmun.txt") :
  line 3 appears to contain embedded nulls
4: In read.table(file = "./exerimmun.txt") :
  line 4 appears to contain embedded nulls
5: In read.table(file = "./exerimmun.txt") :
  line 5 appears to contain embedded nulls
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  embedded nul(s) found in input

提前致谢。

Answer 1

也许现在最简单的方法就是避免使用该功能？之后您可以随时转换为 tibble。

这里我将你的数据保存为 /tmp/data.tsv，我使用普通的 base R 来处理它：

> x <- read.table("/tmp/data.tsv", header=TRUE)
> str(x)
'data.frame':   30 obs. of  2 variables:
 $ igg   : int  881 1290 2147 1909 1282 1530 2067 1982 1019 1651 ...
 $ oxygen: num  34.6 45 62.3 58.9 42.5 44.3 67.9 58.5 35.6 49.6 ...
> summary(x)
      igg           oxygen    
 Min.   : 752   Min.   :33.0  
 1st Qu.:1275   1st Qu.:42.6  
 Median :1590   Median :50.0  
 Mean   :1558   Mean   :50.6  
 3rd Qu.:1946   3rd Qu.:60.8  
 Max.   :2177   Max.   :69.9  
>

read_tsv 将数据错误地解析为 R

read_tsv Parsing data incorrectly into R

r

readr