为什么 unite 函数不接受我的列名?
Why is the unite function not accepting my column names?
我很困惑。此代码不适用于我的数据集,但它适用于虚拟数据。据我所知,这两个数据集的结构没有重要区别。为什么我会收到关于未定义列的错误?
> packageVersion('tidyr')
[1] ‘1.2.0’
> str(test)
'data.frame': 229 obs. of 9 variables:
$ Response : chr "presence" "presence" "presence" "presence" ...
$ Predictor : chr "tussock_gram" "wet_sedge" "nontussock_gram" "dry_gram_dwarf_shrub" ...
$ Estimate : num 1.03 2.77 2.02 13.73 -6.69 ...
$ Std.Error : chr "1.6469" "1.7951" "8.5393" "14.6206" ...
$ DF : num 844 844 844 844 844 844 844 844 844 844 ...
$ Crit.Value : num 0.628 1.542 0.236 0.939 -0.761 ...
$ P.Value : num 0.53 0.123 0.813 0.348 0.447 ...
$ Std.Estimate: num 0.0233 0.0536 0.0177 0.1019 -0.1441 ...
$ : chr "" "" "" "" ...
> dput(head(test))
structure(list(Response = c("presence", "presence", "presence",
"presence", "presence", "presence"), Predictor = c("tussock_gram",
"wet_sedge", "nontussock_gram", "dry_gram_dwarf_shrub", "low_shrub",
"high_shrub"), Estimate = c(1.035, 2.7687, 2.0189, 13.7295, -6.6858,
12.4353), Std.Error = c("1.6469", "1.7951", "8.5393", "14.6206",
"8.7873", "3.5288"), DF = c(844, 844, 844, 844, 844, 844), Crit.Value = c(0.6285,
1.5424, 0.2364, 0.9391, -0.7608, 3.524), P.Value = c(0.5297,
0.123, 0.8131, 0.3477, 0.4467, 0.0004), Std.Estimate = c(0.0233,
0.0536, 0.0177, 0.1019, -0.1441, 0.1436), c("", "", "", "", "",
"***")), row.names = c(NA, 6L), class = "data.frame")
> test <- test %>%
unite("Relationship", c(Response, Predictor), sep = "~")
Error in `[.data.frame`(out, setdiff(names(out), names(from_vars))) :
undefined columns selected
> df <- as.data.frame(expand_grid(Response = c("a", NA), Predictor = c("b", NA)))
> str(df)
'data.frame': 4 obs. of 2 variables:
$ Response : chr "a" "a" NA NA
$ Predictor: chr "b" NA "b" NA
> df <- df %>%
unite("Relationship", c(Response, Predictor), sep = "~")
# works fine
更新后的 dput
中有一列是空白的列名 (""
)。我们需要删除它
library(dplyr)
library(tidyr)
test %>%
select(-"") %>%
unite(Relationship, Response, Predictor, sep = "~")
Relationship Estimate Std.Error DF Crit.Value P.Value Std.Estimate
1 presence~tussock_gram 1.0350 1.6469 844 0.6285 0.5297 0.0233
2 presence~wet_sedge 2.7687 1.7951 844 1.5424 0.1230 0.0536
3 presence~nontussock_gram 2.0189 8.5393 844 0.2364 0.8131 0.0177
4 presence~dry_gram_dwarf_shrub 13.7295 14.6206 844 0.9391 0.3477 0.1019
5 presence~low_shrub -6.6858 8.7873 844 -0.7608 0.4467 -0.1441
6 presence~high_shrub 12.4353 3.5288 844 3.5240 0.0004 0.1436
问题出在它检查的源代码中
...
out <- out[setdiff(names(out), names(from_vars))]
...
它触发错误是因为当我们尝试 select 以空白 (""
) 作为列名的列时,它 returns 错误
> names(test)
[1] "Response" "Predictor" "Estimate" "Std.Error" "DF" "Crit.Value" "P.Value" "Std.Estimate" ""
> test[""]
Error in `[.data.frame`(test, "") : undefined columns selected
如果有不寻常的列名,要么运行 make.names
(来自base R
)
> make.names(names(test))
[1] "Response" "Predictor" "Estimate" "Std.Error" "DF" "Crit.Value" "P.Value" "Std.Estimate" "X"
或使用 clean_names
来自 janitor
> janitor::clean_names(test)
response predictor estimate std_error df crit_value p_value std_estimate x
1 presence tussock_gram 1.0350 1.6469 844 0.6285 0.5297 0.0233
2 presence wet_sedge 2.7687 1.7951 844 1.5424 0.1230 0.0536
3 presence nontussock_gram 2.0189 8.5393 844 0.2364 0.8131 0.0177
4 presence dry_gram_dwarf_shrub 13.7295 14.6206 844 0.9391 0.3477 0.1019
5 presence low_shrub -6.6858 8.7873 844 -0.7608 0.4467 -0.1441
6 presence high_shrub 12.4353 3.5288 844 3.5240 0.0004 0.1436 ***
因此,更新列名称将确保它 运行 具有 unite
(不删除列 ''
)
names(test) <- make.names(names(test))
test %>%
unite(Relationship, Response, Predictor, sep = "~")
Relationship Estimate Std.Error DF Crit.Value P.Value Std.Estimate X
1 presence~tussock_gram 1.0350 1.6469 844 0.6285 0.5297 0.0233
2 presence~wet_sedge 2.7687 1.7951 844 1.5424 0.1230 0.0536
3 presence~nontussock_gram 2.0189 8.5393 844 0.2364 0.8131 0.0177
4 presence~dry_gram_dwarf_shrub 13.7295 14.6206 844 0.9391 0.3477 0.1019
5 presence~low_shrub -6.6858 8.7873 844 -0.7608 0.4467 -0.1441
6 presence~high_shrub 12.4353 3.5288 844 3.5240 0.0004 0.1436 ***
我很困惑。此代码不适用于我的数据集,但它适用于虚拟数据。据我所知,这两个数据集的结构没有重要区别。为什么我会收到关于未定义列的错误?
> packageVersion('tidyr')
[1] ‘1.2.0’
> str(test)
'data.frame': 229 obs. of 9 variables:
$ Response : chr "presence" "presence" "presence" "presence" ...
$ Predictor : chr "tussock_gram" "wet_sedge" "nontussock_gram" "dry_gram_dwarf_shrub" ...
$ Estimate : num 1.03 2.77 2.02 13.73 -6.69 ...
$ Std.Error : chr "1.6469" "1.7951" "8.5393" "14.6206" ...
$ DF : num 844 844 844 844 844 844 844 844 844 844 ...
$ Crit.Value : num 0.628 1.542 0.236 0.939 -0.761 ...
$ P.Value : num 0.53 0.123 0.813 0.348 0.447 ...
$ Std.Estimate: num 0.0233 0.0536 0.0177 0.1019 -0.1441 ...
$ : chr "" "" "" "" ...
> dput(head(test))
structure(list(Response = c("presence", "presence", "presence",
"presence", "presence", "presence"), Predictor = c("tussock_gram",
"wet_sedge", "nontussock_gram", "dry_gram_dwarf_shrub", "low_shrub",
"high_shrub"), Estimate = c(1.035, 2.7687, 2.0189, 13.7295, -6.6858,
12.4353), Std.Error = c("1.6469", "1.7951", "8.5393", "14.6206",
"8.7873", "3.5288"), DF = c(844, 844, 844, 844, 844, 844), Crit.Value = c(0.6285,
1.5424, 0.2364, 0.9391, -0.7608, 3.524), P.Value = c(0.5297,
0.123, 0.8131, 0.3477, 0.4467, 0.0004), Std.Estimate = c(0.0233,
0.0536, 0.0177, 0.1019, -0.1441, 0.1436), c("", "", "", "", "",
"***")), row.names = c(NA, 6L), class = "data.frame")
> test <- test %>%
unite("Relationship", c(Response, Predictor), sep = "~")
Error in `[.data.frame`(out, setdiff(names(out), names(from_vars))) :
undefined columns selected
> df <- as.data.frame(expand_grid(Response = c("a", NA), Predictor = c("b", NA)))
> str(df)
'data.frame': 4 obs. of 2 variables:
$ Response : chr "a" "a" NA NA
$ Predictor: chr "b" NA "b" NA
> df <- df %>%
unite("Relationship", c(Response, Predictor), sep = "~")
# works fine
更新后的 dput
中有一列是空白的列名 (""
)。我们需要删除它
library(dplyr)
library(tidyr)
test %>%
select(-"") %>%
unite(Relationship, Response, Predictor, sep = "~")
Relationship Estimate Std.Error DF Crit.Value P.Value Std.Estimate
1 presence~tussock_gram 1.0350 1.6469 844 0.6285 0.5297 0.0233
2 presence~wet_sedge 2.7687 1.7951 844 1.5424 0.1230 0.0536
3 presence~nontussock_gram 2.0189 8.5393 844 0.2364 0.8131 0.0177
4 presence~dry_gram_dwarf_shrub 13.7295 14.6206 844 0.9391 0.3477 0.1019
5 presence~low_shrub -6.6858 8.7873 844 -0.7608 0.4467 -0.1441
6 presence~high_shrub 12.4353 3.5288 844 3.5240 0.0004 0.1436
问题出在它检查的源代码中
...
out <- out[setdiff(names(out), names(from_vars))]
...
它触发错误是因为当我们尝试 select 以空白 (""
) 作为列名的列时,它 returns 错误
> names(test)
[1] "Response" "Predictor" "Estimate" "Std.Error" "DF" "Crit.Value" "P.Value" "Std.Estimate" ""
> test[""]
Error in `[.data.frame`(test, "") : undefined columns selected
如果有不寻常的列名,要么运行 make.names
(来自base R
)
> make.names(names(test))
[1] "Response" "Predictor" "Estimate" "Std.Error" "DF" "Crit.Value" "P.Value" "Std.Estimate" "X"
或使用 clean_names
来自 janitor
> janitor::clean_names(test)
response predictor estimate std_error df crit_value p_value std_estimate x
1 presence tussock_gram 1.0350 1.6469 844 0.6285 0.5297 0.0233
2 presence wet_sedge 2.7687 1.7951 844 1.5424 0.1230 0.0536
3 presence nontussock_gram 2.0189 8.5393 844 0.2364 0.8131 0.0177
4 presence dry_gram_dwarf_shrub 13.7295 14.6206 844 0.9391 0.3477 0.1019
5 presence low_shrub -6.6858 8.7873 844 -0.7608 0.4467 -0.1441
6 presence high_shrub 12.4353 3.5288 844 3.5240 0.0004 0.1436 ***
因此,更新列名称将确保它 运行 具有 unite
(不删除列 ''
)
names(test) <- make.names(names(test))
test %>%
unite(Relationship, Response, Predictor, sep = "~")
Relationship Estimate Std.Error DF Crit.Value P.Value Std.Estimate X
1 presence~tussock_gram 1.0350 1.6469 844 0.6285 0.5297 0.0233
2 presence~wet_sedge 2.7687 1.7951 844 1.5424 0.1230 0.0536
3 presence~nontussock_gram 2.0189 8.5393 844 0.2364 0.8131 0.0177
4 presence~dry_gram_dwarf_shrub 13.7295 14.6206 844 0.9391 0.3477 0.1019
5 presence~low_shrub -6.6858 8.7873 844 -0.7608 0.4467 -0.1441
6 presence~high_shrub 12.4353 3.5288 844 3.5240 0.0004 0.1436 ***