R:对多行和多列进行分组
R: Group Over Multiple Rows and Columns
我正在使用 R 编程语言。我有以下数据集:
set.seed(123)
Game = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,19,20,20)
id = c(3,4,3,4,3,4,3,4,3,4,3,4, 3,4,3,4,3,4,3,4)
c <- c("1", "2")
coin <- sample(c, 20, replace=TRUE, prob=c(0.5,0.5))
winner <- c("win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win", "win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win", "win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win", "win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win")
my_data = data.frame(Game, id, coin, winner)
数据(“my_data”)看起来像这样:
Game id coin winner
1 1 3 2 win
2 1 4 1 win
3 2 3 2 lose
4 2 4 1 lose
5 3 3 1 tie
6 3 4 2 tie
对于这个数据集(“my_data”),我想执行以下操作:
对于“游戏”变量的每个唯一值(例如游戏 = 1、游戏 = 2 等),找出每个唯一“硬币”组合的频率。例如,也许 Coin = (2,1) 发生了 5 次,Coin = (2,2) 发生了 11 次,等等
接下来,对于这些独特的“硬币”组合中的每一个 - 找到“赢”、“输”和“平”的细目分类。例如,对于 Coin = (2,2),可能有 5/11 赢、3/11 输和 3/11 平。
我尝试使用以下代码完成此操作:
第 1 部分:(手动)找出每个游戏的唯一硬币组合(例如,1,1 OR 1,2 OR 2,1 OR 2,2)
for (i in 1:19) {
for (j in 2:20) {
my_data$comb = ifelse(my_data[i,3] == "1" & my_data[j,3] == "1", "one,one", ifelse(my_data[i,3] == "2" & my_data[j,3] == "1", "two, one", ifelse(my_data[i,3] == "1" & my_data[j,3] == "2", "one,two", "two,two)))
}
}
第 2 部分:(如果有效)找出第 1 部分中每个独特组合的Win/Tie/Loss 细分:
library(dplyr)
my_data %>% group_by(comb) %>% summarise(percent = n() )
所需的输出应如下所示 (注意: 1,2 = 2,1):
目前,我正在将“my_data”导入 Microsoft Excel - 但有人可以告诉我如何在 R 中执行此操作吗?
谁能告诉我如何获得上面的 table?
谢谢!
您可以使用以下代码:
library(tidyverse)
ff <- my_data %>% group_by(Game) %>% arrange(Game, coin) %>%
do(as.data.frame(t(combn(.[["coin"]], 2)))) %>% mutate(coin = paste(V1, V2, sep = ",")) %>% select(Game, coin)
my_data <- my_data %>% select(Game, winner) %>% distinct() %>% left_join(ff)
那么你想要的输出可以通过以下方式获得:
my_data %>% group_by(coin, winner) %>% summarise(n = n()) %>% mutate(p = 100 * n / sum(n, na.rm = T))
# A tibble: 5 x 4
# Groups: coin [3]
coin winner n p
<chr> <chr> <int> <dbl>
1 1,1 lose 4 100
2 1,2 lose 2 14.3
3 1,2 tie 4 28.6
4 1,2 win 8 57.1
5 2,2 lose 2 100
这是我的方法。我不确定这是否是预期的方式:
library(dplyr)
my_data %>%
group_by(Game) %>%
mutate(combinations = toString(coin)) %>%
distinct(combinations, .keep_all = TRUE) %>%
ungroup() %>%
group_by(combinations, winner) %>%
summarise(n = n()) %>%
mutate(freq = n/sum(n))
combinations winner n freq
<chr> <chr> <int> <dbl>
1 1, 1 lose 4 1
2 1, 2 tie 2 0.333
3 1, 2 win 4 0.667
4 2, 1 lose 2 0.25
5 2, 1 tie 2 0.25
6 2, 1 win 4 0.5
7 2, 2 lose 2 1
我正在使用 R 编程语言。我有以下数据集:
set.seed(123)
Game = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,19,20,20)
id = c(3,4,3,4,3,4,3,4,3,4,3,4, 3,4,3,4,3,4,3,4)
c <- c("1", "2")
coin <- sample(c, 20, replace=TRUE, prob=c(0.5,0.5))
winner <- c("win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win", "win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win", "win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win", "win", "win", "lose", "lose", "tie", "tie", "lose", "lose", "win", "win")
my_data = data.frame(Game, id, coin, winner)
数据(“my_data”)看起来像这样:
Game id coin winner
1 1 3 2 win
2 1 4 1 win
3 2 3 2 lose
4 2 4 1 lose
5 3 3 1 tie
6 3 4 2 tie
对于这个数据集(“my_data”),我想执行以下操作:
对于“游戏”变量的每个唯一值(例如游戏 = 1、游戏 = 2 等),找出每个唯一“硬币”组合的频率。例如,也许 Coin = (2,1) 发生了 5 次,Coin = (2,2) 发生了 11 次,等等
接下来,对于这些独特的“硬币”组合中的每一个 - 找到“赢”、“输”和“平”的细目分类。例如,对于 Coin = (2,2),可能有 5/11 赢、3/11 输和 3/11 平。
我尝试使用以下代码完成此操作:
第 1 部分:(手动)找出每个游戏的唯一硬币组合(例如,1,1 OR 1,2 OR 2,1 OR 2,2)
for (i in 1:19) {
for (j in 2:20) {
my_data$comb = ifelse(my_data[i,3] == "1" & my_data[j,3] == "1", "one,one", ifelse(my_data[i,3] == "2" & my_data[j,3] == "1", "two, one", ifelse(my_data[i,3] == "1" & my_data[j,3] == "2", "one,two", "two,two)))
}
}
第 2 部分:(如果有效)找出第 1 部分中每个独特组合的Win/Tie/Loss 细分:
library(dplyr)
my_data %>% group_by(comb) %>% summarise(percent = n() )
所需的输出应如下所示 (注意: 1,2 = 2,1):
目前,我正在将“my_data”导入 Microsoft Excel - 但有人可以告诉我如何在 R 中执行此操作吗?
谁能告诉我如何获得上面的 table?
谢谢!
您可以使用以下代码:
library(tidyverse)
ff <- my_data %>% group_by(Game) %>% arrange(Game, coin) %>%
do(as.data.frame(t(combn(.[["coin"]], 2)))) %>% mutate(coin = paste(V1, V2, sep = ",")) %>% select(Game, coin)
my_data <- my_data %>% select(Game, winner) %>% distinct() %>% left_join(ff)
那么你想要的输出可以通过以下方式获得:
my_data %>% group_by(coin, winner) %>% summarise(n = n()) %>% mutate(p = 100 * n / sum(n, na.rm = T))
# A tibble: 5 x 4
# Groups: coin [3]
coin winner n p
<chr> <chr> <int> <dbl>
1 1,1 lose 4 100
2 1,2 lose 2 14.3
3 1,2 tie 4 28.6
4 1,2 win 8 57.1
5 2,2 lose 2 100
这是我的方法。我不确定这是否是预期的方式:
library(dplyr)
my_data %>%
group_by(Game) %>%
mutate(combinations = toString(coin)) %>%
distinct(combinations, .keep_all = TRUE) %>%
ungroup() %>%
group_by(combinations, winner) %>%
summarise(n = n()) %>%
mutate(freq = n/sum(n))
combinations winner n freq
<chr> <chr> <int> <dbl>
1 1, 1 lose 4 1
2 1, 2 tie 2 0.333
3 1, 2 win 4 0.667
4 2, 1 lose 2 0.25
5 2, 1 tie 2 0.25
6 2, 1 win 4 0.5
7 2, 2 lose 2 1