在 R 中一次导入大型 CSV 文件
Import big CSV files at once in R
我在一个文件夹中有 70 个具有相同列的 csv 文件,每个文件都是 0.5 GB。
我想将它们导入到 R 中的单个数据框中。
通常我会正确导入它们,如下所示:
df <- read_delim("file.csv",
"|", escape_double = FALSE, col_types = cols(pc_no = col_character(),
id_key = col_character()), trim_ws = TRUE)
全部导入,这样编码,报错如下:
缺少参数 "delim",没有默认值
tbl <-
list.files(pattern = "*.csv") %>%
map_df(~read_delim("|", escape_double = FALSE, col_types = cols(pc_no = col_character(), id_key = col_character()), trim_ws = TRUE))
对于 read_csv,已导入但只出现一列,其中包含所有列和值。
tbl <-
list.files(pattern = "*.csv") %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
在您的第二个代码块中,您缺少 .
,因此 read_delim
将您的参数解释为 read_delim(file="|", delim=<nothing provided>, ...)
。尝试:
tbl <- list.files(pattern = "*.csv") %>%
map_df(~ read_delim(., delim = "|", escape_double = FALSE,
col_types = cols(pc_no = col_character(), id_key = col_character()),
trim_ws = TRUE))
我在这里明确指出 delim=
但这不是绝对必要的。但是,如果您在第一次尝试时就这样做了,您就会看到
readr::read_delim(delim = "|", escape_double = FALSE,
col_types = cols(pc_no = col_character(), id_key = col_character()),
trim_ws = TRUE)
# Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, :
# argument "file" is missing, with no default
哪个更能说明实际问题。
我在一个文件夹中有 70 个具有相同列的 csv 文件,每个文件都是 0.5 GB。 我想将它们导入到 R 中的单个数据框中。
通常我会正确导入它们,如下所示:
df <- read_delim("file.csv",
"|", escape_double = FALSE, col_types = cols(pc_no = col_character(),
id_key = col_character()), trim_ws = TRUE)
全部导入,这样编码,报错如下: 缺少参数 "delim",没有默认值
tbl <-
list.files(pattern = "*.csv") %>%
map_df(~read_delim("|", escape_double = FALSE, col_types = cols(pc_no = col_character(), id_key = col_character()), trim_ws = TRUE))
对于 read_csv,已导入但只出现一列,其中包含所有列和值。
tbl <-
list.files(pattern = "*.csv") %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
在您的第二个代码块中,您缺少 .
,因此 read_delim
将您的参数解释为 read_delim(file="|", delim=<nothing provided>, ...)
。尝试:
tbl <- list.files(pattern = "*.csv") %>%
map_df(~ read_delim(., delim = "|", escape_double = FALSE,
col_types = cols(pc_no = col_character(), id_key = col_character()),
trim_ws = TRUE))
我在这里明确指出 delim=
但这不是绝对必要的。但是,如果您在第一次尝试时就这样做了,您就会看到
readr::read_delim(delim = "|", escape_double = FALSE,
col_types = cols(pc_no = col_character(), id_key = col_character()),
trim_ws = TRUE)
# Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, :
# argument "file" is missing, with no default
哪个更能说明实际问题。