按多个变量分组并汇总字符频率

Question

我正在尝试按多个变量对我的数据集进行分组，并建立字符变量出现次数的频率 table。这是一个示例数据集：

Location    State   County  Job         Pet
            Ohio    Miami   Data        Dog
Urban       Ohio    Miami   Business    Dog, Cat
Urban       Ohio    Miami   Data        Cat
Rural      Kentucky Clark   Data        Cat, Fish
City       Indiana  Shelby  Business    Dog

农村肯塔基州克拉克数据狗、鱼俄亥俄州迈阿密数据狗，猫 Urban Ohio Miami Business 狗，猫农村肯塔基州克拉克数据鱼城市印第安纳谢尔比商业猫

我希望我的输出看起来像这样：

Location    State   County  Job      Frequency  Pet:Cat Pet:Dog Pet:Fish
            Ohio    Miami   Data        2         1        2       0
 Urban      Ohio    Miami   Business    2         2        2       0
 Urban      Ohio    Miami   Data        1         1        0       0
 Rural    Kentucky  Clark   Data        3         1        1       3
 City     Indiana   Shelby  Business    2         1        1       0

我尝试了以下代码的不同迭代，我接近了，但不太正确：

Output<-df%>%group_by(Location, State, County, Job)%>%
  dplyr::summarise(
    Frequency= dplyr::n(),
    Pet:Cat = count(str_match(Pet, "Cat")),
    Pet:Dog = count(str_match(Pet, "Dog")),
    Pet:Fish = count(str_match(Pet, "Fish")),
    )

如有任何帮助，我们将不胜感激！提前谢谢你

Answer 1

试试这个：

library(dplyr)
library(tidyr)
#Code
new <- df %>% 
  separate_rows(Pet,sep=',') %>%
  mutate(Pet=trimws(Pet)) %>%
  group_by(Location,State,County,Job,Pet) %>%
  summarise(N=n()) %>%
  mutate(Pet=paste0('Pet:',Pet)) %>%
  group_by(Location,State,County,Job,.drop = F) %>%
  mutate(Freq=n()) %>%
  pivot_wider(names_from = Pet,values_from=N,values_fill=0)

输出：

# A tibble: 5 x 8
# Groups:   Location, State, County, Job [5]
  Location State    County Job       Freq `Pet:Cat` `Pet:Dog` `Pet:Fish`
  <chr>    <chr>    <chr>  <chr>    <int>     <int>     <int>      <int>
1 ""       Ohio     Miami  Data         2         1         2          0
2 "City"   Indiana  Shelby Business     2         1         1          0
3 "Rural"  Kentucky Clark  Data         3         1         1          3
4 "Urban"  Ohio     Miami  Business     2         2         2          0
5 "Urban"  Ohio     Miami  Data         1         1         0          0

使用了一些数据：

#Data
df <- structure(list(Location = c("", "Urban", "Urban", "Rural", "City", 
"Rural", "", "Urban", "Rural", "City"), State = c("Ohio", "Ohio", 
"Ohio", "Kentucky", "Indiana", "Kentucky", "Ohio", "Ohio", "Kentucky", 
"Indiana"), County = c("Miami", "Miami", "Miami", "Clark", "Shelby", 
"Clark", "Miami", "Miami", "Clark", "Shelby"), Job = c("Data", 
"Business", "Data", "Data", "Business", "Data", "Data", "Business", 
"Data", "Business"), Pet = c("Dog", "Dog, Cat", "Cat", "Cat, Fish", 
"Dog", "Dog, Fish", "Dog, Cat", "Dog, Cat", "Fish", "Cat")), row.names = c(NA, 
-10L), class = "data.frame")

按多个变量分组并汇总字符频率

Grouping by Multiple variables and summarizing character frequencies

r

plyr

stringr

dplyr

data-wrangling