通过使用 group_split 和 group_map 对变量进行分组来使用 tabyl 制表

Question

为了一次获得一列或多列的快速频率（制表），我使用 tabyl 函数，如下所示：

library(janitor)
library(tidyverse)

#tabulate one column at a time
iris %>% 
  tabyl(Petal.Width)

#tabulate multiple columns at once using map
iris %>% 
  select(Petal.Width, Petal.Length) %>% 
  map(tabyl)

我正在尝试复制这两种情况，但通过分组变量获得输出，在此示例中为 Species。我想要最简单的解决方案，我想为此尝试更新的 group_split 和 group_map 命令。

我已经能够以数据帧格式生成类似类型的输出（尽管 tabyl 生成的简单列表是我想要的用于多个变量的情况）：

#works 
iris %>%
  group_by(Species) %>%
  nest() %>% 
  mutate(out = map(data, ~ tabyl(.x$Petal.Width) %>% 
                     as_tibble)) %>% 
  select(-data) %>%
  unnest

这行得通，但我认为它可能像我的列方法方法那样更简单一些，我在想这样的事情，每个分组变量一列：

#by group for one column
iris %>% 
  group_by(Species) %>% 
  group_split() %>% 
  map(~tabyl(Petal.Width))

对于多列，我不确定是否需要 select 行？也许 group_map 可以简化成一行？

#by group for multiple columns
iris %>% 
  #do i need to select grouping variable and variables of interest?
  select(Species, Petal.Width, Petal.Length) %>% 
  group_by(Species) %>% 
  group_split() %>% 
  map(~tabyl())   #could I use group_map and select the columns at once?

有什么建议吗？

Answer 1

  iris %>% 
       #use split(.$Species) if you need a list with names 
       group_split(Species) %>% 
       map(~imap(.x %>%select(Species, Petal.Width, Petal.Length), 
                  function(x,y){
                          out <-tabyl(x)
                          colnames(out)[1]=y
                          out}))

如果你只需要第一列的默认列名，那么你可以iris %>% group_split(Species) %>% map(~map(.x, tabyl))

通过使用 group_split 和 group_map 对变量进行分组来使用 tabyl 制表

tabulate using tabyl by grouping variable using group_split and group_map

group-by

r

dplyr

purrr

janitor