使用readr::read_csv()时如何使用select_helpers() [starts_with()]
How to use select_helpers() [starts_with()] when using readr::read_csv()
我有一个相当大的数据集要读取,顶部有超过 1000 个缺失值,但所有变量名称都遵循相同的模式。有没有办法使用 starts_with()
强制正确解析某些变量?
MWE:
library(tidyverse)
library(readr)
mwe.csv <- data.frame(id = c("a", "b"), #not where I actually get the data from
amount1 = c(NA, 20),
currency1 = c(NA, "USD")
)
mwe <- readr::read_csv("mwe.csv", guess_max = 1) #guess_max() for example purposes
我希望能够做到
mwe<- read_csv("mwe.csv", guess.max = 1
col_types = cols(starts_with("amount") = "d",
starts_with("currency") = "c"))
)
> mwe
# A tibble: 2 x 3
id amount currency
<chr> <dbl> <chr>
1 a NA NA
2 b 20 USD
但我收到错误 "unexpected '=' in: read_csv"。有什么想法吗?我不能对其进行硬编码,因为列数会定期更改,但模式 (amountN) 将保持不变。还会有其他列不是 id 或 amount/currency。出于速度目的,我不想增加 guess.max()
选项。
答案是作弊!
mwe <- read_csv("mwe.csv", n_max = 0) # only need the col_names
cnames <- attr(mwe, "spec") # grab the col_names
ctype <- rep("?", ncol(mwe)) # create the col_parser abbr -- all guesses
currency <- grepl("currency", names(cnames$col)) # which ones are currency?
# or use base::startsWith(names(cnames$col), "currency")
ctype[currency] <- "c" # do not guess on currency ones, use character
# repeat lines 4 & 5 as needed
mwe <- read_csv("mwe.csv", col_types = paste(ctype, collapse = ""))
我有一个相当大的数据集要读取,顶部有超过 1000 个缺失值,但所有变量名称都遵循相同的模式。有没有办法使用 starts_with()
强制正确解析某些变量?
MWE:
library(tidyverse)
library(readr)
mwe.csv <- data.frame(id = c("a", "b"), #not where I actually get the data from
amount1 = c(NA, 20),
currency1 = c(NA, "USD")
)
mwe <- readr::read_csv("mwe.csv", guess_max = 1) #guess_max() for example purposes
我希望能够做到
mwe<- read_csv("mwe.csv", guess.max = 1
col_types = cols(starts_with("amount") = "d",
starts_with("currency") = "c"))
)
> mwe
# A tibble: 2 x 3
id amount currency
<chr> <dbl> <chr>
1 a NA NA
2 b 20 USD
但我收到错误 "unexpected '=' in: read_csv"。有什么想法吗?我不能对其进行硬编码,因为列数会定期更改,但模式 (amountN) 将保持不变。还会有其他列不是 id 或 amount/currency。出于速度目的,我不想增加 guess.max()
选项。
答案是作弊!
mwe <- read_csv("mwe.csv", n_max = 0) # only need the col_names
cnames <- attr(mwe, "spec") # grab the col_names
ctype <- rep("?", ncol(mwe)) # create the col_parser abbr -- all guesses
currency <- grepl("currency", names(cnames$col)) # which ones are currency?
# or use base::startsWith(names(cnames$col), "currency")
ctype[currency] <- "c" # do not guess on currency ones, use character
# repeat lines 4 & 5 as needed
mwe <- read_csv("mwe.csv", col_types = paste(ctype, collapse = ""))