readr 中允许的 "col_*()" 形式的列对象是什么?
What are permissible column objects of the form "col_*()" used in readr?
readr::read_csv
误读了我正在加载的文件中的某些列类型,因此我想使用 cols
手动设置它们。
在 ?read_csv
中,它说 col_types 参数应该是 _"One of ‘NULL’, a ‘cols()’ specification, or a string. See ‘vignette("column-types")' 以获取更多详细信息"。嗯,vignette("column-types")
给出 vignette("column-types") not found
所以我尝试了 ?cols
。它说它接受 "column objects created by ‘col_*()’ or their abbreviated character names"。
可接受的函数或缩写字符名称是什么?我在哪里可以找到这些信息? readr 1.1.1
顺便说一句。
这可能不是可用 col_*()
后缀的完整列表,但很接近:
_logical
_integer
_double
_number
_character
_datetime
_date
_time
_factor
If you want to manually specify the column types, you can start by copying and pasting this code, and then tweaking it fix the parsing problems.
df3 <- read_csv(
readr_example("challenge.csv"),
col_types = cols(
x = col_double(),
y = col_date(format = "")
)
)
本文重点介绍了不同类型的解析器,这些解析器按部分(原子向量、Dates/times 等)进行了列举。对于每个 parse_()
函数,都有一个等效的 col_
函数:
Each parse_() is coupled with a col_() function, which will be used in the process of parsing a complete tibble.
有col_double
、col_integer
、col_character
、col_date
、col_factor
、.etc
library(readr)
mtcars <- read_csv(readr_example("mtcars.csv"), col_types =
cols(
mpg = col_double(),
cyl = col_integer(),
disp = col_double(),
hp = col_integer(),
drat = col_double(),
vs = col_integer(),
wt = col_double(),
qsec = col_double(),
am = col_integer(),
gear = col_integer(),
carb = col_integer()
)
)
mtcars
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
或者,您可以使用紧凑的字符串表示形式,其中每个字符代表一列:
c = character
、i = integer
、n = number
、d = double
、l = logical
、D = date
、T = date time
、t = time
、? = guess
或 _
/-
以跳过该列。
mtcars_select <- read_csv(readr_example("mtcars.csv"),
col_types = cols_only(mpg = 'd', cyl = 'i', hp = 'i',
qsec = 'd', gear = 'i'),
na = c("NA", "N/A", "-9999", "-999"))
mtcars_select
#> # A tibble: 32 x 5
#> mpg cyl hp qsec gear
#> <dbl> <int> <int> <dbl> <int>
#> 1 21 6 110 16.5 4
#> 2 21 6 110 17.0 4
#> 3 22.8 4 93 18.6 4
#> 4 21.4 6 110 19.4 3
#> 5 18.7 8 175 17.0 3
#> 6 18.1 6 105 20.2 3
#> 7 14.3 8 245 15.8 3
#> 8 24.4 4 62 20 4
#> 9 22.8 4 95 22.9 4
#> 10 19.2 6 123 18.3 4
#> # ... with 22 more rows
甚至更短
mtcars <- read_csv(readr_example("mtcars.csv"), col_types = "di_i__d__i_")
mtcars
# A tibble: 32 x 5
mpg cyl hp qsec gear
<dbl> <int> <int> <dbl> <int>
1 21 6 110 16.5 4
2 21 6 110 17.0 4
3 22.8 4 93 18.6 4
4 21.4 6 110 19.4 3
5 18.7 8 175 17.0 3
6 18.1 6 105 20.2 3
7 14.3 8 245 15.8 3
8 24.4 4 62 20 4
9 22.8 4 95 22.9 4
10 19.2 6 123 18.3 4
# ... with 22 more rows
参考:
https://cran.r-project.org/web/packages/readr/vignettes/readr.html
https://www.rdocumentation.org/packages/readr/versions/1.1.1/topics/cols
我也认为这没有明显的记录。您可以从 readr
中阅读 col_types.R
的源代码,其中告诉您缩写:
"_" = ,
"-" = col_skip(),
"?" = col_guess(),
c = col_character(),
D = col_date(),
d = col_double(),
i = col_integer(),
l = col_logical(),
n = col_number(),
T = col_datetime(),
t = col_time()
设置列类型的方法是传递一个命名向量:
col_types = cols(column_1 = col_integer(), column2 = col_character())
或者,如果您使用 col_names
,只需传递一个相同长度的向量。
如果覆盖默认值的原因是 read_csv
猜测类型错误,那么您可以使用 spec_csv
并允许使用更多行来猜测类型(通过默认使用 1,000) 。例如
x<- spec_csv(filename,guess_max=2000)
read_csv(filename,col_types = x)`
readr::read_csv
误读了我正在加载的文件中的某些列类型,因此我想使用 cols
手动设置它们。
在 ?read_csv
中,它说 col_types 参数应该是 _"One of ‘NULL’, a ‘cols()’ specification, or a string. See ‘vignette("column-types")' 以获取更多详细信息"。嗯,vignette("column-types")
给出 vignette("column-types") not found
所以我尝试了 ?cols
。它说它接受 "column objects created by ‘col_*()’ or their abbreviated character names"。
可接受的函数或缩写字符名称是什么?我在哪里可以找到这些信息? readr 1.1.1
顺便说一句。
这可能不是可用 col_*()
后缀的完整列表,但很接近:
_logical
_integer
_double
_number
_character
_datetime
_date
_time
_factor
If you want to manually specify the column types, you can start by copying and pasting this code, and then tweaking it fix the parsing problems.
df3 <- read_csv( readr_example("challenge.csv"), col_types = cols( x = col_double(), y = col_date(format = "") ) )
本文重点介绍了不同类型的解析器,这些解析器按部分(原子向量、Dates/times 等)进行了列举。对于每个 parse_()
函数,都有一个等效的 col_
函数:
Each parse_() is coupled with a col_() function, which will be used in the process of parsing a complete tibble.
有col_double
、col_integer
、col_character
、col_date
、col_factor
、.etc
library(readr)
mtcars <- read_csv(readr_example("mtcars.csv"), col_types =
cols(
mpg = col_double(),
cyl = col_integer(),
disp = col_double(),
hp = col_integer(),
drat = col_double(),
vs = col_integer(),
wt = col_double(),
qsec = col_double(),
am = col_integer(),
gear = col_integer(),
carb = col_integer()
)
)
mtcars
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
或者,您可以使用紧凑的字符串表示形式,其中每个字符代表一列:
c = character
、i = integer
、n = number
、d = double
、l = logical
、D = date
、T = date time
、t = time
、? = guess
或 _
/-
以跳过该列。
mtcars_select <- read_csv(readr_example("mtcars.csv"),
col_types = cols_only(mpg = 'd', cyl = 'i', hp = 'i',
qsec = 'd', gear = 'i'),
na = c("NA", "N/A", "-9999", "-999"))
mtcars_select
#> # A tibble: 32 x 5
#> mpg cyl hp qsec gear
#> <dbl> <int> <int> <dbl> <int>
#> 1 21 6 110 16.5 4
#> 2 21 6 110 17.0 4
#> 3 22.8 4 93 18.6 4
#> 4 21.4 6 110 19.4 3
#> 5 18.7 8 175 17.0 3
#> 6 18.1 6 105 20.2 3
#> 7 14.3 8 245 15.8 3
#> 8 24.4 4 62 20 4
#> 9 22.8 4 95 22.9 4
#> 10 19.2 6 123 18.3 4
#> # ... with 22 more rows
甚至更短
mtcars <- read_csv(readr_example("mtcars.csv"), col_types = "di_i__d__i_")
mtcars
# A tibble: 32 x 5
mpg cyl hp qsec gear
<dbl> <int> <int> <dbl> <int>
1 21 6 110 16.5 4
2 21 6 110 17.0 4
3 22.8 4 93 18.6 4
4 21.4 6 110 19.4 3
5 18.7 8 175 17.0 3
6 18.1 6 105 20.2 3
7 14.3 8 245 15.8 3
8 24.4 4 62 20 4
9 22.8 4 95 22.9 4
10 19.2 6 123 18.3 4
# ... with 22 more rows
参考:
https://cran.r-project.org/web/packages/readr/vignettes/readr.html
https://www.rdocumentation.org/packages/readr/versions/1.1.1/topics/cols
我也认为这没有明显的记录。您可以从 readr
中阅读 col_types.R
的源代码,其中告诉您缩写:
"_" = ,
"-" = col_skip(),
"?" = col_guess(),
c = col_character(),
D = col_date(),
d = col_double(),
i = col_integer(),
l = col_logical(),
n = col_number(),
T = col_datetime(),
t = col_time()
设置列类型的方法是传递一个命名向量:
col_types = cols(column_1 = col_integer(), column2 = col_character())
或者,如果您使用 col_names
,只需传递一个相同长度的向量。
如果覆盖默认值的原因是 read_csv
猜测类型错误,那么您可以使用 spec_csv
并允许使用更多行来猜测类型(通过默认使用 1,000) 。例如
x<- spec_csv(filename,guess_max=2000)
read_csv(filename,col_types = x)`