函数 ff:read.csv.ffdf 中 colClasses 参数的当前状态(ff - R 包)
Current status of colClasses argument in function ff:read.csv.ffdf (ff - R package)
错误 vmode 'character' not implemented
由于以下代码中的参数 colClasses=c("id"="character")
而发生:
df <- read.csv.ffdf('TenGBsample.csv',
colClasses=c("id"="character"), VERBOSE=TRUE)
read.table.ffdf 1..1000 (1000) csv-read=0.02secError in ff(initdata =
initdata, length = length, levels = levels, ordered = ordered, :
vmode 'character' not implemented
其中 TenGBsample.csv
中的第一列是 'id',由 30 位数字组成,超过了我的 64 位系统 (Windows) 上的最大数字,我想处理它们作为字符,第二列包含小数字,因此无需调整。
我查过了, vmode
有 'character' 模式:http://127.0.0.1:16624/library/ff/html/vmode.html
注意以下来自 help(read.csv.ffdf)
... read.table.ffdf
has been designed to behave as much
like read.table
as possible. However, note the following differences:
- character vectors are not supported, character data must be
read as one of the following colClasses: 'Date', 'POSIXct', 'factor, 'ordered'. By default character columns are read as factors.
Accordingly arguments 'as.is' and 'stringsAsFactors' are not allowed.
所以你不能读取作为字符的值。但是如果文件中的 id
列已经有数值,那么您可以将它们作为双精度读入,然后重新格式化。 format(x, scientific = FALSE)
将以标准表示法打印 x
。
这是一个示例数据集 x
,其中 id
是数字并且有 30 个数字。
library(ff)
x <- data.frame(
id = (267^12 + (102:106)^12),
other = paste0(LETTERS[1:5],letters[1:5])
)
## create a csv file with 'x'
csvfile <- tempPathFile(path = getOption("fftempdir"), extension = "csv")
write.csv(
format(x, scientific = FALSE),
file = csvfile, row.names = FALSE, quote = 2
)
## read in the data without colClasses
ffx <- read.csv.ffdf(file = csvfile)
vmode(ffx)
# id other
# "double" "integer"
现在我们可以使用 ffx[,]
将 ffx
强制转换为 class data.frame
并重新格式化 id
列。
df <- within(ffx[,], id <- format(id, scientific = FALSE))
class(df$id)
# [1] "character"
df
# id other
# 1 131262095302921040298042720256 Aa
# 2 131262252822013319483345600512 Bb
# 3 131262428093345052649582493696 Cc
# 4 131262622917452503293152460800 Dd
# 5 131262839257598318815163187200 Ee
错误 vmode 'character' not implemented
由于以下代码中的参数 colClasses=c("id"="character")
而发生:
df <- read.csv.ffdf('TenGBsample.csv',
colClasses=c("id"="character"), VERBOSE=TRUE)
read.table.ffdf 1..1000 (1000) csv-read=0.02secError in ff(initdata = initdata, length = length, levels = levels, ordered = ordered, :
vmode 'character' not implemented
其中 TenGBsample.csv
中的第一列是 'id',由 30 位数字组成,超过了我的 64 位系统 (Windows) 上的最大数字,我想处理它们作为字符,第二列包含小数字,因此无需调整。
我查过了, vmode
有 'character' 模式:http://127.0.0.1:16624/library/ff/html/vmode.html
注意以下来自 help(read.csv.ffdf)
...
read.table.ffdf
has been designed to behave as much likeread.table
as possible. However, note the following differences:
- character vectors are not supported, character data must be read as one of the following colClasses: 'Date', 'POSIXct', 'factor, 'ordered'. By default character columns are read as factors. Accordingly arguments 'as.is' and 'stringsAsFactors' are not allowed.
所以你不能读取作为字符的值。但是如果文件中的 id
列已经有数值,那么您可以将它们作为双精度读入,然后重新格式化。 format(x, scientific = FALSE)
将以标准表示法打印 x
。
这是一个示例数据集 x
,其中 id
是数字并且有 30 个数字。
library(ff)
x <- data.frame(
id = (267^12 + (102:106)^12),
other = paste0(LETTERS[1:5],letters[1:5])
)
## create a csv file with 'x'
csvfile <- tempPathFile(path = getOption("fftempdir"), extension = "csv")
write.csv(
format(x, scientific = FALSE),
file = csvfile, row.names = FALSE, quote = 2
)
## read in the data without colClasses
ffx <- read.csv.ffdf(file = csvfile)
vmode(ffx)
# id other
# "double" "integer"
现在我们可以使用 ffx[,]
将 ffx
强制转换为 class data.frame
并重新格式化 id
列。
df <- within(ffx[,], id <- format(id, scientific = FALSE))
class(df$id)
# [1] "character"
df
# id other
# 1 131262095302921040298042720256 Aa
# 2 131262252822013319483345600512 Bb
# 3 131262428093345052649582493696 Cc
# 4 131262622917452503293152460800 Dd
# 5 131262839257598318815163187200 Ee