如何利用两个分类列在 R 中创建一个百分比列?

How do I utilize two catergorical columns, to create one percentage column in R?

我有一个包含两个分类变量的 df:team 和 home_win。 我想获得每支球队主场获胜的百分比(1 = home_win;2 = home_loss)。 但是,我不知道如何使用两个分类变量来创建百分比。

请帮忙!

team home_win total_games
"red" 1 3
"blue 1 1
"orange" 2 1
"red" 1 3
"red" 2 3
  data.frame(
    team = c("red", "blue", "orange", "red", "red"),
    home_win = c(1, 1, 2, 1, 2),
    total_games = c(3, 1, 1, 3, 3)
  )
team home_win total_games percentage
"red" 1 3 66.66
"blue 1 1 100
"orange" 2 1 0
"red" 1 3 66.66
"red" 2 3 66.66

你可以试试这个。如果我们将 2 换成 0,我们可以简单地取每个团队的平均值。

library(dplyr)
dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
> dat %>% group_by(team) %>% summarise(win_perc = mean(home_win) * 100)
# A tibble: 3 × 2
  team   win_perc
  <chr>     <dbl>
1 blue      100  
2 orange      0  
3 red        66.7

或者如果您想保留其他列:

dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
dat %>% group_by(team) %>% mutate(win_perc = mean(home_win) * 100)
# A tibble: 5 × 4
# Groups:   team [3]
  team   home_win total_games win_perc
  <chr>     <dbl>       <dbl>    <dbl>
1 red           1           3     66.7
2 blue          1           1    100  
3 orange        0           1      0  
4 red           1           3     66.7
5 red           0           3     66.7

这是一个base方法:

将数据框中的 2 转换为 0(因为 0 表示丢失):df[df$home_win==2,]$home_win<-0

然后使用tapply:

Avg<-tapply(df$home_win,df$team,mean)
percentage<-Avg[df$team]*100
cbind(df,percentage)

#    team home_win percentage
#1    red        1   66.66667
#2   blue        1  100.00000
#3 orange        0    0.00000
#4    red        1   66.66667
#5    red        0   66.66667

注意:您不需要 total games 列,您可以将其删除...