为什么 unite 函数不接受我的列名?

Why is the unite function not accepting my column names?

我很困惑。此代码不适用于我的数据集,但它适用于虚拟数据。据我所知,这两个数据集的结构没有重要区别。为什么我会收到关于未定义列的错误?

> packageVersion('tidyr')
[1] ‘1.2.0’


> str(test)
'data.frame':   229 obs. of  9 variables:
 $ Response    : chr  "presence" "presence" "presence" "presence" ...
 $ Predictor   : chr  "tussock_gram" "wet_sedge" "nontussock_gram" "dry_gram_dwarf_shrub" ...
 $ Estimate    : num  1.03 2.77 2.02 13.73 -6.69 ...
 $ Std.Error   : chr  "1.6469" "1.7951" "8.5393" "14.6206" ...
 $ DF          : num  844 844 844 844 844 844 844 844 844 844 ...
 $ Crit.Value  : num  0.628 1.542 0.236 0.939 -0.761 ...
 $ P.Value     : num  0.53 0.123 0.813 0.348 0.447 ...
 $ Std.Estimate: num  0.0233 0.0536 0.0177 0.1019 -0.1441 ...
 $             : chr  "" "" "" "" ...

> dput(head(test))
structure(list(Response = c("presence", "presence", "presence", 
"presence", "presence", "presence"), Predictor = c("tussock_gram", 
"wet_sedge", "nontussock_gram", "dry_gram_dwarf_shrub", "low_shrub", 
"high_shrub"), Estimate = c(1.035, 2.7687, 2.0189, 13.7295, -6.6858, 
12.4353), Std.Error = c("1.6469", "1.7951", "8.5393", "14.6206", 
"8.7873", "3.5288"), DF = c(844, 844, 844, 844, 844, 844), Crit.Value = c(0.6285, 
1.5424, 0.2364, 0.9391, -0.7608, 3.524), P.Value = c(0.5297, 
0.123, 0.8131, 0.3477, 0.4467, 0.0004), Std.Estimate = c(0.0233, 
0.0536, 0.0177, 0.1019, -0.1441, 0.1436), c("", "", "", "", "", 
"***")), row.names = c(NA, 6L), class = "data.frame")



> test <- test %>%
  unite("Relationship", c(Response, Predictor), sep = "~") 

Error in `[.data.frame`(out, setdiff(names(out), names(from_vars))) : 
  undefined columns selected


> df <- as.data.frame(expand_grid(Response = c("a", NA), Predictor = c("b", NA)))

> str(df)
'data.frame':   4 obs. of  2 variables:
 $ Response : chr  "a" "a" NA NA
 $ Predictor: chr  "b" NA "b" NA


> df <- df %>%
  unite("Relationship", c(Response, Predictor), sep = "~")

# works fine



更新后的 dput 中有一列是空白的列名 ("")。我们需要删除它

library(dplyr)
library(tidyr)
test %>% 
   select(-"") %>% 
   unite(Relationship, Response, Predictor, sep = "~")
  Relationship Estimate Std.Error  DF Crit.Value P.Value Std.Estimate
1         presence~tussock_gram   1.0350    1.6469 844     0.6285  0.5297       0.0233
2            presence~wet_sedge   2.7687    1.7951 844     1.5424  0.1230       0.0536
3      presence~nontussock_gram   2.0189    8.5393 844     0.2364  0.8131       0.0177
4 presence~dry_gram_dwarf_shrub  13.7295   14.6206 844     0.9391  0.3477       0.1019
5            presence~low_shrub  -6.6858    8.7873 844    -0.7608  0.4467      -0.1441
6           presence~high_shrub  12.4353    3.5288 844     3.5240  0.0004       0.1436

问题出在它检查的源代码中

...
 out <- out[setdiff(names(out), names(from_vars))]
...

它触发错误是因为当我们尝试 select 以空白 ("") 作为列名的列时,它 returns 错误

> names(test)
[1] "Response"     "Predictor"    "Estimate"     "Std.Error"    "DF"           "Crit.Value"   "P.Value"      "Std.Estimate" ""       
> test[""]
Error in `[.data.frame`(test, "") : undefined columns selected

如果有不寻常的列名,要么运行 make.names(来自base R

> make.names(names(test))
[1] "Response"     "Predictor"    "Estimate"     "Std.Error"    "DF"           "Crit.Value"   "P.Value"      "Std.Estimate" "X"    

或使用 clean_names 来自 janitor

> janitor::clean_names(test)
  response            predictor estimate std_error  df crit_value p_value std_estimate   x
1 presence         tussock_gram   1.0350    1.6469 844     0.6285  0.5297       0.0233    
2 presence            wet_sedge   2.7687    1.7951 844     1.5424  0.1230       0.0536    
3 presence      nontussock_gram   2.0189    8.5393 844     0.2364  0.8131       0.0177    
4 presence dry_gram_dwarf_shrub  13.7295   14.6206 844     0.9391  0.3477       0.1019    
5 presence            low_shrub  -6.6858    8.7873 844    -0.7608  0.4467      -0.1441    
6 presence           high_shrub  12.4353    3.5288 844     3.5240  0.0004       0.1436 ***

因此,更新列名称将确保它 运行 具有 unite(不删除列 ''

names(test) <- make.names(names(test))
test %>%  
    unite(Relationship, Response, Predictor, sep = "~")
                   Relationship Estimate Std.Error  DF Crit.Value P.Value Std.Estimate   X
1         presence~tussock_gram   1.0350    1.6469 844     0.6285  0.5297       0.0233    
2            presence~wet_sedge   2.7687    1.7951 844     1.5424  0.1230       0.0536    
3      presence~nontussock_gram   2.0189    8.5393 844     0.2364  0.8131       0.0177    
4 presence~dry_gram_dwarf_shrub  13.7295   14.6206 844     0.9391  0.3477       0.1019    
5            presence~low_shrub  -6.6858    8.7873 844    -0.7608  0.4467      -0.1441    
6           presence~high_shrub  12.4353    3.5288 844     3.5240  0.0004       0.1436 ***