按多个变量分组并汇总字符频率
Grouping by Multiple variables and summarizing character frequencies
我正在尝试按多个变量对我的数据集进行分组,并建立字符变量出现次数的频率 table。这是一个示例数据集:
Location State County Job Pet
Ohio Miami Data Dog
Urban Ohio Miami Business Dog, Cat
Urban Ohio Miami Data Cat
Rural Kentucky Clark Data Cat, Fish
City Indiana Shelby Business Dog
农村肯塔基州克拉克数据狗、鱼
俄亥俄州迈阿密数据狗,猫
Urban Ohio Miami Business 狗,猫
农村肯塔基州克拉克数据鱼
城市印第安纳谢尔比商业猫
我希望我的输出看起来像这样:
Location State County Job Frequency Pet:Cat Pet:Dog Pet:Fish
Ohio Miami Data 2 1 2 0
Urban Ohio Miami Business 2 2 2 0
Urban Ohio Miami Data 1 1 0 0
Rural Kentucky Clark Data 3 1 1 3
City Indiana Shelby Business 2 1 1 0
我尝试了以下代码的不同迭代,我接近了,但不太正确:
Output<-df%>%group_by(Location, State, County, Job)%>%
dplyr::summarise(
Frequency= dplyr::n(),
Pet:Cat = count(str_match(Pet, "Cat")),
Pet:Dog = count(str_match(Pet, "Dog")),
Pet:Fish = count(str_match(Pet, "Fish")),
)
如有任何帮助,我们将不胜感激!提前谢谢你
试试这个:
library(dplyr)
library(tidyr)
#Code
new <- df %>%
separate_rows(Pet,sep=',') %>%
mutate(Pet=trimws(Pet)) %>%
group_by(Location,State,County,Job,Pet) %>%
summarise(N=n()) %>%
mutate(Pet=paste0('Pet:',Pet)) %>%
group_by(Location,State,County,Job,.drop = F) %>%
mutate(Freq=n()) %>%
pivot_wider(names_from = Pet,values_from=N,values_fill=0)
输出:
# A tibble: 5 x 8
# Groups: Location, State, County, Job [5]
Location State County Job Freq `Pet:Cat` `Pet:Dog` `Pet:Fish`
<chr> <chr> <chr> <chr> <int> <int> <int> <int>
1 "" Ohio Miami Data 2 1 2 0
2 "City" Indiana Shelby Business 2 1 1 0
3 "Rural" Kentucky Clark Data 3 1 1 3
4 "Urban" Ohio Miami Business 2 2 2 0
5 "Urban" Ohio Miami Data 1 1 0 0
使用了一些数据:
#Data
df <- structure(list(Location = c("", "Urban", "Urban", "Rural", "City",
"Rural", "", "Urban", "Rural", "City"), State = c("Ohio", "Ohio",
"Ohio", "Kentucky", "Indiana", "Kentucky", "Ohio", "Ohio", "Kentucky",
"Indiana"), County = c("Miami", "Miami", "Miami", "Clark", "Shelby",
"Clark", "Miami", "Miami", "Clark", "Shelby"), Job = c("Data",
"Business", "Data", "Data", "Business", "Data", "Data", "Business",
"Data", "Business"), Pet = c("Dog", "Dog, Cat", "Cat", "Cat, Fish",
"Dog", "Dog, Fish", "Dog, Cat", "Dog, Cat", "Fish", "Cat")), row.names = c(NA,
-10L), class = "data.frame")
我正在尝试按多个变量对我的数据集进行分组,并建立字符变量出现次数的频率 table。这是一个示例数据集:
Location State County Job Pet
Ohio Miami Data Dog
Urban Ohio Miami Business Dog, Cat
Urban Ohio Miami Data Cat
Rural Kentucky Clark Data Cat, Fish
City Indiana Shelby Business Dog
农村肯塔基州克拉克数据狗、鱼 俄亥俄州迈阿密数据狗,猫 Urban Ohio Miami Business 狗,猫 农村肯塔基州克拉克数据鱼 城市印第安纳谢尔比商业猫
我希望我的输出看起来像这样:
Location State County Job Frequency Pet:Cat Pet:Dog Pet:Fish
Ohio Miami Data 2 1 2 0
Urban Ohio Miami Business 2 2 2 0
Urban Ohio Miami Data 1 1 0 0
Rural Kentucky Clark Data 3 1 1 3
City Indiana Shelby Business 2 1 1 0
我尝试了以下代码的不同迭代,我接近了,但不太正确:
Output<-df%>%group_by(Location, State, County, Job)%>%
dplyr::summarise(
Frequency= dplyr::n(),
Pet:Cat = count(str_match(Pet, "Cat")),
Pet:Dog = count(str_match(Pet, "Dog")),
Pet:Fish = count(str_match(Pet, "Fish")),
)
如有任何帮助,我们将不胜感激!提前谢谢你
试试这个:
library(dplyr)
library(tidyr)
#Code
new <- df %>%
separate_rows(Pet,sep=',') %>%
mutate(Pet=trimws(Pet)) %>%
group_by(Location,State,County,Job,Pet) %>%
summarise(N=n()) %>%
mutate(Pet=paste0('Pet:',Pet)) %>%
group_by(Location,State,County,Job,.drop = F) %>%
mutate(Freq=n()) %>%
pivot_wider(names_from = Pet,values_from=N,values_fill=0)
输出:
# A tibble: 5 x 8
# Groups: Location, State, County, Job [5]
Location State County Job Freq `Pet:Cat` `Pet:Dog` `Pet:Fish`
<chr> <chr> <chr> <chr> <int> <int> <int> <int>
1 "" Ohio Miami Data 2 1 2 0
2 "City" Indiana Shelby Business 2 1 1 0
3 "Rural" Kentucky Clark Data 3 1 1 3
4 "Urban" Ohio Miami Business 2 2 2 0
5 "Urban" Ohio Miami Data 1 1 0 0
使用了一些数据:
#Data
df <- structure(list(Location = c("", "Urban", "Urban", "Rural", "City",
"Rural", "", "Urban", "Rural", "City"), State = c("Ohio", "Ohio",
"Ohio", "Kentucky", "Indiana", "Kentucky", "Ohio", "Ohio", "Kentucky",
"Indiana"), County = c("Miami", "Miami", "Miami", "Clark", "Shelby",
"Clark", "Miami", "Miami", "Clark", "Shelby"), Job = c("Data",
"Business", "Data", "Data", "Business", "Data", "Data", "Business",
"Data", "Business"), Pet = c("Dog", "Dog, Cat", "Cat", "Cat, Fish",
"Dog", "Dog, Fish", "Dog, Cat", "Dog, Cat", "Fish", "Cat")), row.names = c(NA,
-10L), class = "data.frame")