在函数内的 dplyr 中的列中存储和调用变量

Storing and calling variables in a column in dplyr within a function

我想在 tibble 的列单元格中存储一些变量。然后我想调用该列并粘贴这些变量的名称或调用该列并将这些变量对应的列粘贴在一起。此外,所有这些都发生在一个函数中,这是唯一剩下的硬编码,所以我真的很想找到解决这个问题的方法。

library("tidyverse") 
myData<-tibble("c1"=c("a","b","c"),
"c2"=c("1","2","3"),
"c3"=c("A","B","C"),
factors=c(list(c("c1","c2")),list(c("c2","c3")),list(c("c1","c2","c3"))))

myData%>%mutate(factors1=interaction(!!!quos(factors),sep=":",lex.order=TRUE))
# A tibble: 3 x 5
  c1    c2    c3    factors   factors1
  <chr> <chr> <chr> <list>    <fct>   
1 a     1     A     <chr [2]> c1:c2:c1
2 b     2     B     <chr [2]> c2:c3:c2
3 c     3     C     <chr [3]> c1:c2:c3

所以这允许我连接变量的名称,但如您所见,如果一个列表比其他列表长,它会循环。

对于第二个问题,我想使用 $factors 列来专门调用其他列的值,我可以这样硬编码:

myData%>%
mutate(factors2=interaction(!!!syms(c("c1","c2")),sep=":",lex.order=TRUE))
# A tibble: 3 x 5
 c1    c2    c3    factors   factors2
 <chr> <chr> <chr> <list>    <fct>   
1 a     1     A     <chr [2]> a:1     
2 b     2     B     <chr [2]> b:2     
3 c     3     C     <chr [3]> c:3  

但是,如果我尝试这样做:

myData%>%
mutate(factors2=interaction(!!!syms(factors),sep=":",lex.order=TRUE))

Error in lapply(.x, .f, ...) : object 'factors' not found

如果我尝试取消列出因素或使用其他 rlang 表达式,也会发生同样的情况。我也试过嵌套 rlang 表达式,但到目前为止还没有找到一个能按我的预期工作的表达式。

我觉得这应该是可能的,但到目前为止,我还没有找到关于堆栈溢出的问题,也没有找到表明它是这样的教程,所以也许我正在疯狂追逐。感谢大家的宝贵时间和帮助。

我的完整代码:

library("tidyverse") 

myData<-tibble("c1"=c("a","b","c"),
"c2"=c("1","2","3"),
"c3"=c("A","B","C"),
factors=c(list(c("c1","c2")),list(c("c2","c3")),list(c("c1","c2","c3"))))%>%
mutate(factors1=interaction(!!!quos(factors),sep=":",lex.order=TRUE))%>%
mutate(factors2=interaction(!!!syms(factors),sep=":",lex.order=TRUE))

我想要的输出是:

    # A tibble: 3 x 6
 c1    c2    c3    factors   factors1   factors2
 <chr> <chr> <chr> <list>     <fct>      <fct>   
1 a     1     A     <chr [2]> c1:c2       a:1     
2 b     2     B     <chr [2]> c2:c3       2:B     
3 c     3     C     <chr [3]> c1:c2:c3    c:3:C  

你的第一个问题可以用 purrr::mappurrr::lift 函数族来解决:

myData %>%
  mutate( factors1 = map(factors, lift_dv(interaction, sep=":", lex.order=TRUE)) ) %>%
  mutate_at( "factors1", lift(fct_c) )
# # A tibble: 3 x 5
#   c1    c2    c3    factors   factors1
#   <chr> <chr> <chr> <list>    <fct>
# 1 a     1     A     <chr [2]> c1:c2
# 2 b     2     B     <chr [2]> c2:c3
# 3 c     3     C     <chr [3]> c1:c2:c3

第二个问题比较棘手,因为 !!! 会立即对其参数求值,这有时会导致 dplyr 链中不直观的运算符优先级。最干净的方法是定义一个独立的函数来组成你的 interaction 表达式:

f <- function(fct) {expr( interaction(!!!syms(fct), sep=":", lex.order=TRUE) )}

# Example usage
f( myData$factors[[1]] )    # interaction(c1, c2, sep = ":", lex.order = TRUE)
f( myData$factors[[2]] )    # interaction(c2, c3, sep = ":", lex.order = TRUE)

myData %>% mutate( e = map(factors, f) )
# # A tibble: 3 x 5
#   c1    c2    c3    factors   e
#   <chr> <chr> <chr> <list>    <list>
# 1 a     1     A     <chr [2]> <language>
# 2 b     2     B     <chr [2]> <language>
# 3 c     3     C     <chr [3]> <language>

不幸的是,我们不能直接计算 e,因为它会将整个列 c1c2c3 提供给表达式,而你只需要与表达式位于同一行的单个值。因此,我们需要以行方式封装 c1c3 列。

X <- myData %>% mutate( e = map(factors, f) ) %>%
  rowwise() %>% mutate( d = list(data_frame(c1,c2,c3)) ) %>% ungroup()
# # A tibble: 3 x 6
#   c1    c2    c3    factors   e          d
#   <chr> <chr> <chr> <list>    <list>     <list>
# 1 a     1     A     <chr [2]> <language> <tibble [1 × 3]>
# 2 b     2     B     <chr [2]> <language> <tibble [1 × 3]>
# 3 c     3     C     <chr [3]> <language> <tibble [1 × 3]>

现在 e 中有表达式需要应用于 d 中的数据,因此从这里开始只是一个简单的 map2 遍历。将所有东西放在一起并清理,我们得到:

myData %>%
  mutate( factors1 = map(factors, lift_dv(interaction, sep=":", lex.order=TRUE)) ) %>%
  mutate( e = map(factors, f) ) %>%
  rowwise() %>% mutate( d = list(data_frame(c1,c2,c3)) ) %>% ungroup() %>%
  mutate( factors2 = map2( e, d, rlang::eval_tidy ) ) %>%
  mutate_at( vars(factors1,factors2), lift(fct_c) ) %>%
  select( -e, -d )
# # A tibble: 3 x 6
#   c1    c2    c3    factors   factors1 factors2
#   <chr> <chr> <chr> <list>    <fct>    <fct>
# 1 a     1     A     <chr [2]> c1:c2    a:1
# 2 b     2     B     <chr [2]> c2:c3    2:B
# 3 c     3     C     <chr [3]> c1:c2:c3 c:3:C

这是一个使用 mapimap 的方法:

library(tidyverse)

myData %>%
  mutate(factor1 = factors %>% map(~interaction(as.list(.), sep=':', lex.order = TRUE)) %>% unlist(),
         factor2 = factors %>% imap(~interaction(myData[.y, match(.x, names(myData))], sep=":", lex.order = TRUE)) %>% unlist())

对于factor1,我没有将参数拼接成点,而是将列表传递给interaction

对于factor2,我将每行中的factorsmyData中的names进行匹配,并结合使用列索引(match(.x, names(myData)))行索引(imap 中的.y)将适当的元素子集化以馈入 interaction.

factor1factor2 都需要一个 unlist 因为 mapimap returns 列表。

输出:

# A tibble: 3 x 6
  c1    c2    c3    factors   factor1  factor2
  <chr> <chr> <chr> <list>    <fct>    <fct>  
1 a     1     A     <chr [2]> c1:c2    a:1    
2 b     2     B     <chr [2]> c2:c3    2:B    
3 c     3     C     <chr [3]> c1:c2:c3 c:3:C