按多个因素分组并汇总因素计数

Grouping by multiple factors and summarizing counts of factors

我有一堆分类船 "Type" 数据,例如一年中不同月份不同离岸距离(DOS,例如 0-12 海里、0-25 海里等)内的乘客、渔业、货物等。

最初我想计算类型的数量,例如整个 year/data 集的每个 DOS 的乘客、船舶。然后我想为一年中的每个月做同样的事情。

我想这应该是某种 group_by 函数,后跟摘要??但我在尝试获取输出时遇到了麻烦,因为我还不太擅长使用 dplyr。

我尝试过的一些事情:

ships <- df %>% group_by(DOS, Type)
shipc <- summarize(ships, count = n())

df1 <- gather(df, Type, DOS) %>% count(Type, DOS) %>% spread(DOS, n, fill = 0)

但我很确定它不起作用,因为我没有正确理解语法....

这是一些虚拟数据:

df <- structure(list(Type = c("Cargo ship", "Cargo ship", "Cargo ship", 
"Cargo ship", "Cargo ship", "Cargo ship", "Fishing", "Fishing", 
 "Fishing", "Fishing", "Fishing", "Cargo ship", "Cargo ship", 
 "Cargo ship", "Cargo ship", "Cargo ship", "Fishing", "Fishing", 
"Fishing", "Fishing", "Fishing", "Fishing", "Fishing", "Fishing", 
"Fishing", "Cargo ship:DG,HS,MP(A)", "Cargo ship", "Cargo ship", 
"Fishing", "Fishing", "Fishing", "Fishing", "Fishing", "Tanker", 
 "Cargo ship", "Cargo ship", "Fishing", "Fishing", "Cargo 
 ship:DG,HS,MP(A)", 
 "Cargo ship:DG,HS,MP(D)", "Cargo ship:DG,HS,MP(D)", "Cargo 
 ship:DG,HS,MP(D)", 
 "Cargo ship"), DOS = c("0-100", "0-50", "0-25", "0-100", "0-50", 
 "0-25", "0-100", "0-25", "0-12", "0-50", "0-100", "0-50", "0-100", 
 "0-25", "0-50", "0-100", "0-50", "0-25", "0-50", "0-100", "0-25", 
 "0-100", "0-100", "0-50", "0-25", "0-100", "0-100", "0-50", "0-100", 
 "0-50", "0-25", "0-100", "0-100", "0-100", "0-50", "0-100", "0-100", 
 "0-100", "0-100", "0-25", "0-50", "0-100", "0-100"), Month = c("May", 
 "May", "May", "May", "May", "May", "May", "May", "May", "May", 
 "June", "June", "June", "June", "June", "June", "June", "June", 
 "June", "June", "June", "August", "August", "August", "August", 
 "August", "August", "August", "August", "August", "August", "August", 
 "January", "January", "January", "January", "January", "January", 
 "January", "January", "January", "January", "January"), Year = c(2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 
 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019)), row.names = c(NA, 
-43L), class = c("tbl_df", "tbl", "data.frame"))

我想要的是类型类别、DOS 和属于这些唯一标识符的船舶总数。然后我进一步希望按月和年分组。

不清楚预期。根据描述,按所有列分组(group_by_all),得到频率计数(n())和spread到'wide'格式

library(dplyr)
df %>% 
   group_by_all %>% 
   summarise(n = n()) %>% 
   spread(DOS, n, fill = 0)

或使用count (group_by + summarise) 和spread

df %>% 
  dplyr::count(Type, DOS, Month, Year) %>% 
  spread(DOS, n, fill = 0)