如何利用两个分类列在 R 中创建一个百分比列?
How do I utilize two catergorical columns, to create one percentage column in R?
我有一个包含两个分类变量的 df:team 和 home_win。
我想获得每支球队主场获胜的百分比(1 = home_win;2 = home_loss)。
但是,我不知道如何使用两个分类变量来创建百分比。
请帮忙!
team
home_win
total_games
"red"
1
3
"blue
1
1
"orange"
2
1
"red"
1
3
"red"
2
3
data.frame(
team = c("red", "blue", "orange", "red", "red"),
home_win = c(1, 1, 2, 1, 2),
total_games = c(3, 1, 1, 3, 3)
)
team
home_win
total_games
percentage
"red"
1
3
66.66
"blue
1
1
100
"orange"
2
1
0
"red"
1
3
66.66
"red"
2
3
66.66
你可以试试这个。如果我们将 2
换成 0
,我们可以简单地取每个团队的平均值。
library(dplyr)
dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
> dat %>% group_by(team) %>% summarise(win_perc = mean(home_win) * 100)
# A tibble: 3 × 2
team win_perc
<chr> <dbl>
1 blue 100
2 orange 0
3 red 66.7
或者如果您想保留其他列:
dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
dat %>% group_by(team) %>% mutate(win_perc = mean(home_win) * 100)
# A tibble: 5 × 4
# Groups: team [3]
team home_win total_games win_perc
<chr> <dbl> <dbl> <dbl>
1 red 1 3 66.7
2 blue 1 1 100
3 orange 0 1 0
4 red 1 3 66.7
5 red 0 3 66.7
这是一个base
方法:
将数据框中的 2 转换为 0(因为 0 表示丢失):df[df$home_win==2,]$home_win<-0
然后使用tapply
:
Avg<-tapply(df$home_win,df$team,mean)
percentage<-Avg[df$team]*100
cbind(df,percentage)
# team home_win percentage
#1 red 1 66.66667
#2 blue 1 100.00000
#3 orange 0 0.00000
#4 red 1 66.66667
#5 red 0 66.66667
注意:您不需要 total games
列,您可以将其删除...
我有一个包含两个分类变量的 df:team 和 home_win。 我想获得每支球队主场获胜的百分比(1 = home_win;2 = home_loss)。 但是,我不知道如何使用两个分类变量来创建百分比。
请帮忙!
team | home_win | total_games |
---|---|---|
"red" | 1 | 3 |
"blue | 1 | 1 |
"orange" | 2 | 1 |
"red" | 1 | 3 |
"red" | 2 | 3 |
data.frame(
team = c("red", "blue", "orange", "red", "red"),
home_win = c(1, 1, 2, 1, 2),
total_games = c(3, 1, 1, 3, 3)
)
team | home_win | total_games | percentage |
---|---|---|---|
"red" | 1 | 3 | 66.66 |
"blue | 1 | 1 | 100 |
"orange" | 2 | 1 | 0 |
"red" | 1 | 3 | 66.66 |
"red" | 2 | 3 | 66.66 |
你可以试试这个。如果我们将 2
换成 0
,我们可以简单地取每个团队的平均值。
library(dplyr)
dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
> dat %>% group_by(team) %>% summarise(win_perc = mean(home_win) * 100)
# A tibble: 3 × 2
team win_perc
<chr> <dbl>
1 blue 100
2 orange 0
3 red 66.7
或者如果您想保留其他列:
dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
dat %>% group_by(team) %>% mutate(win_perc = mean(home_win) * 100)
# A tibble: 5 × 4
# Groups: team [3]
team home_win total_games win_perc
<chr> <dbl> <dbl> <dbl>
1 red 1 3 66.7
2 blue 1 1 100
3 orange 0 1 0
4 red 1 3 66.7
5 red 0 3 66.7
这是一个base
方法:
将数据框中的 2 转换为 0(因为 0 表示丢失):df[df$home_win==2,]$home_win<-0
然后使用tapply
:
Avg<-tapply(df$home_win,df$team,mean)
percentage<-Avg[df$team]*100
cbind(df,percentage)
# team home_win percentage
#1 red 1 66.66667
#2 blue 1 100.00000
#3 orange 0 0.00000
#4 red 1 66.66667
#5 red 0 66.66667
注意:您不需要 total games
列,您可以将其删除...