将多个 .csv 文件与 tidyr 函数组合时需要读取部分(或全部)列 as.character
Need to read some (or all) columns as.character when combining multiple .csv files with tidyr functions
我正在读取许多具有相同列名的大型 .csv 文件,并使用以下代码对它们进行行绑定(如 https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R 中所建议):
require(readr) # for read_csv()
require(purrr) # for map(), reduce()
# find all file names ending in .csv
files <- dir(pattern = "*.csv")
files
data <- files %>%
map(read_csv) %>% # read in all the files individually, using
# the function read_csv() from the readr package
reduce(rbind) # reduce with rbind into one dataframe
data
但是,我的数据有一列需要在 as.character 中读取,因为它包含由“,”分隔的数字字符串条目,否则 read_csv 将该列转换为数字而不逗号。
我怎么能
1.) 指定只读取一列(最好按名称)as.character?
或
2.) 只需阅读所有列 as.character?
第二个选项并不理想,因为那时我不得不将许多列改回数字。
我尝试使用:
col_types = cols(.default = "c")
如 https://github.com/tidyverse/readr/issues/148 and https://github.com/tidyverse/readr/issues/292 所述。
我的方法是这样的:
data <- files %>%
map(read_csv( col_types = cols(.default = "c" ))) %>%
reduce(rbind)
data
但是,这不起作用,因为 read_csv() 需要 'x' 输入(即 .csv 文件路径)。它抛出这个错误:
Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, :
argument "file" is missing, with no default
九(或其他数字)列,每个 .csv 文件具有相同的列名,只有两列(在本例中为 "start_scan" 和 "end_scan")被读取为数字,所有其他列作为角色:
files <- dir(pattern = "*.csv")
metadata <- files %>%
map_df(~read_csv(., col_types = cols(.default = "c",
scan_end = "n", scan_start = "n") ))
我正在读取许多具有相同列名的大型 .csv 文件,并使用以下代码对它们进行行绑定(如 https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R 中所建议):
require(readr) # for read_csv()
require(purrr) # for map(), reduce()
# find all file names ending in .csv
files <- dir(pattern = "*.csv")
files
data <- files %>%
map(read_csv) %>% # read in all the files individually, using
# the function read_csv() from the readr package
reduce(rbind) # reduce with rbind into one dataframe
data
但是,我的数据有一列需要在 as.character 中读取,因为它包含由“,”分隔的数字字符串条目,否则 read_csv 将该列转换为数字而不逗号。
我怎么能
1.) 指定只读取一列(最好按名称)as.character?
或
2.) 只需阅读所有列 as.character?
第二个选项并不理想,因为那时我不得不将许多列改回数字。
我尝试使用:
col_types = cols(.default = "c")
如 https://github.com/tidyverse/readr/issues/148 and https://github.com/tidyverse/readr/issues/292 所述。
我的方法是这样的:
data <- files %>%
map(read_csv( col_types = cols(.default = "c" ))) %>%
reduce(rbind)
data
但是,这不起作用,因为 read_csv() 需要 'x' 输入(即 .csv 文件路径)。它抛出这个错误:
Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, :
argument "file" is missing, with no default
九(或其他数字)列,每个 .csv 文件具有相同的列名,只有两列(在本例中为 "start_scan" 和 "end_scan")被读取为数字,所有其他列作为角色:
files <- dir(pattern = "*.csv")
metadata <- files %>%
map_df(~read_csv(., col_types = cols(.default = "c",
scan_end = "n", scan_start = "n") ))