当我使用 fread 读取大 table 时,它会稍微改变其中一列中的数字
When I read in a large table using fread it slightly changes the numbers in one of the columns
我有一个大文件,看起来像这样
region type coeff p-value distance count
82365593523656436 A -0.9494 0.050 -16479472.5 8
82365593523656436 B 0.47303 0.526 57815363.0 8
82365593523656436 C -0.8938 0.106 42848210.5 8
我用fread读的时候突然找不到82365593523656436了
correlations <- data.frame(fread('all_to_all_correlations.txt'))
> "82365593523656436" %in% correlations$region
[1] FALSE
我可以找到一个稍微不同的数字
> "82365593523656432" %in% correlations$region
[1] TRUE
但这个数字不在实际文件中
grep 82365593523656432 all_to_all_correlations.txt
没有结果,而
grep 82365593523656436 all_to_all_correlations.txt
会。
当我尝试读取上面显示的小样本文件而不是我得到的完整文件时
Warning message:
In fread("test.txt") :
Some columns have been read as type 'integer64' but package bit64 isn't loaded.
Those columns will display as strange looking floating point data.
There is no need to reload the data.
Just require(bit64) toobtain the integer64 print method and print the data again.
数据看起来像
region type coeff p.value distance count
1 3.758823e-303 A -0.94940 0.050 -16479472 8
2 3.758823e-303 B 0.47303 0.526 57815363 8
3 3.758823e-303 C -0.89380 0.106 42848210 8
所以我认为在阅读过程中 82365593523656436 被更改为 82365593523656432。我该如何防止这种情况发生?
ID(这显然是第一列)通常应该读作字符:
correlations <- setDF(fread('region type coeff p-value distance count
82365593523656436 A -0.9494 0.050 -16479472.5 8
82365593523656436 B 0.47303 0.526 57815363.0 8
82365593523656436 C -0.8938 0.106 42848210.5 8',
colClasses = c(region = "character")))
str(correlations)
#'data.frame': 3 obs. of 6 variables:
# $ region : chr "82365593523656436" "82365593523656436" "82365593523656436"
# $ type : chr "A" "B" "C"
# $ coeff : num -0.949 0.473 -0.894
# $ p-value : num 0.05 0.526 0.106
# $ distance: num -16479473 57815363 42848211
# $ count : int 8 8 8
我有一个大文件,看起来像这样
region type coeff p-value distance count
82365593523656436 A -0.9494 0.050 -16479472.5 8
82365593523656436 B 0.47303 0.526 57815363.0 8
82365593523656436 C -0.8938 0.106 42848210.5 8
我用fread读的时候突然找不到82365593523656436了
correlations <- data.frame(fread('all_to_all_correlations.txt'))
> "82365593523656436" %in% correlations$region
[1] FALSE
我可以找到一个稍微不同的数字
> "82365593523656432" %in% correlations$region
[1] TRUE
但这个数字不在实际文件中
grep 82365593523656432 all_to_all_correlations.txt
没有结果,而
grep 82365593523656436 all_to_all_correlations.txt
会。
当我尝试读取上面显示的小样本文件而不是我得到的完整文件时
Warning message:
In fread("test.txt") :
Some columns have been read as type 'integer64' but package bit64 isn't loaded.
Those columns will display as strange looking floating point data.
There is no need to reload the data.
Just require(bit64) toobtain the integer64 print method and print the data again.
数据看起来像
region type coeff p.value distance count
1 3.758823e-303 A -0.94940 0.050 -16479472 8
2 3.758823e-303 B 0.47303 0.526 57815363 8
3 3.758823e-303 C -0.89380 0.106 42848210 8
所以我认为在阅读过程中 82365593523656436 被更改为 82365593523656432。我该如何防止这种情况发生?
ID(这显然是第一列)通常应该读作字符:
correlations <- setDF(fread('region type coeff p-value distance count
82365593523656436 A -0.9494 0.050 -16479472.5 8
82365593523656436 B 0.47303 0.526 57815363.0 8
82365593523656436 C -0.8938 0.106 42848210.5 8',
colClasses = c(region = "character")))
str(correlations)
#'data.frame': 3 obs. of 6 variables:
# $ region : chr "82365593523656436" "82365593523656436" "82365593523656436"
# $ type : chr "A" "B" "C"
# $ coeff : num -0.949 0.473 -0.894
# $ p-value : num 0.05 0.526 0.106
# $ distance: num -16479473 57815363 42848211
# $ count : int 8 8 8