如何利用两个分类列在 R 中创建一个百分比列？

Question

我有一个包含两个分类变量的 df：team 和 home_win。我想获得每支球队主场获胜的百分比（1 = home_win；2 = home_loss）。但是，我不知道如何使用两个分类变量来创建百分比。

请帮忙！

team	home_win	total_games
"red"	1	3
"blue	1	1
"orange"	2	1
"red"	1	3
"red"	2	3

  data.frame(
    team = c("red", "blue", "orange", "red", "red"),
    home_win = c(1, 1, 2, 1, 2),
    total_games = c(3, 1, 1, 3, 3)
  )

team	home_win	total_games	percentage
"red"	1	3	66.66
"blue	1	1	100
"orange"	2	1	0
"red"	1	3	66.66
"red"	2	3	66.66

Answer 1

你可以试试这个。如果我们将 2 换成 0，我们可以简单地取每个团队的平均值。

library(dplyr)
dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
> dat %>% group_by(team) %>% summarise(win_perc = mean(home_win) * 100)

# A tibble: 3 × 2
  team   win_perc
  <chr>     <dbl>
1 blue      100  
2 orange      0  
3 red        66.7

或者如果您想保留其他列：

dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
dat %>% group_by(team) %>% mutate(win_perc = mean(home_win) * 100)

# A tibble: 5 × 4
# Groups:   team [3]
  team   home_win total_games win_perc
  <chr>     <dbl>       <dbl>    <dbl>
1 red           1           3     66.7
2 blue          1           1    100  
3 orange        0           1      0  
4 red           1           3     66.7
5 red           0           3     66.7

Answer 2

这是一个base方法：

将数据框中的 2 转换为 0（因为 0 表示丢失）：df[df$home_win==2,]$home_win<-0

然后使用tapply:

Avg<-tapply(df$home_win,df$team,mean)
percentage<-Avg[df$team]*100
cbind(df,percentage)

#    team home_win percentage
#1    red        1   66.66667
#2   blue        1  100.00000
#3 orange        0    0.00000
#4    red        1   66.66667
#5    red        0   66.66667

注意：您不需要 total games 列，您可以将其删除...

如何利用两个分类列在 R 中创建一个百分比列？

How do I utilize two catergorical columns, to create one percentage column in R?

r

percentage

categorical-data