当我使用 fread 读取大 table 时，它会稍微改变其中一列中的数字

Question

我有一个大文件，看起来像这样

region              type    coeff      p-value  distance    count
82365593523656436   A      -0.9494     0.050    -16479472.5 8
82365593523656436   B      0.47303     0.526    57815363.0  8
82365593523656436   C      -0.8938     0.106    42848210.5  8

我用fread读的时候突然找不到82365593523656436了

correlations <- data.frame(fread('all_to_all_correlations.txt'))
> "82365593523656436" %in% correlations$region
[1] FALSE

我可以找到一个稍微不同的数字

> "82365593523656432" %in% correlations$region
[1] TRUE

但这个数字不在实际文件中

grep 82365593523656432 all_to_all_correlations.txt

没有结果，而

grep 82365593523656436 all_to_all_correlations.txt

会。

当我尝试读取上面显示的小样本文件而不是我得到的完整文件时

Warning message:
In fread("test.txt") :
  Some columns have been read as type 'integer64' but package bit64 isn't  loaded. 
Those columns will display as strange looking floating point data. 
There is no need to reload the data. 
Just require(bit64) toobtain the integer64 print method and print the data again.

数据看起来像

     region type    coeff       p.value  distance      count
1 3.758823e-303    A -0.94940   0.050    -16479472     8
2 3.758823e-303    B  0.47303   0.526     57815363     8
3 3.758823e-303    C -0.89380   0.106     42848210     8

所以我认为在阅读过程中 82365593523656436 被更改为 82365593523656432。我该如何防止这种情况发生？

Answer 1

ID（这显然是第一列）通常应该读作字符：

correlations <- setDF(fread('region              type    coeff      p-value  distance    count
                                 82365593523656436   A      -0.9494     0.050    -16479472.5 8
                                 82365593523656436   B      0.47303     0.526    57815363.0  8
                                 82365593523656436   C      -0.8938     0.106    42848210.5  8',
                            colClasses = c(region = "character")))
str(correlations)
#'data.frame':  3 obs. of  6 variables:
# $ region  : chr  "82365593523656436" "82365593523656436" "82365593523656436"
# $ type    : chr  "A" "B" "C"
# $ coeff   : num  -0.949 0.473 -0.894
# $ p-value : num  0.05 0.526 0.106
# $ distance: num  -16479473 57815363 42848211
# $ count   : int  8 8 8

当我使用 fread 读取大 table 时，它会稍微改变其中一列中的数字

When I read in a large table using fread it slightly changes the numbers in one of the columns

r

fread