发生次数的动态计数
Dynamic counting of occurrences
R新人。我的数据的小代表。
TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(cbind(TeamHome, TeamAway))
df
TeamHome TeamAway
LAL IND
HOU SAS
SAS LAL
LAL HOU
想象一下,这是一个有数千场比赛的赛季的前四场比赛。对于主队和客队,我想计算主场、客场和总比赛的累计次数。因此主队和客队都有 3 个新列。我想得到这样的结果(在这种情况下,我只计算主队的新变量):
TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
为了计算第一列 (HomeTeamGamesPlayedatHome),我设法做到了:
df$HomeTeamGamesPlayedatHome <- ave(df$TeamHome==df$TeamHome, df$TeamHome, FUN=cumsum)
但感觉太复杂了,而且我无法用这种方法计算其他列。
我也想到了用公式table来统计出现的次数:
table(df$TeamHome)
但它只是计算总数,我想要任何给定时间点的结果。
谢谢!
循环解决方案:
TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(TeamHome,TeamAway,HomeTeamGamesPlayedatHome=ave(TeamHome==TeamHome, TeamHome, FUN=cumsum))
for (i in 1:nrow(df)) {
curdf<-df[1:i,];v<-ave(curdf$TeamAway==as.character(curdf$TeamHome[i]), curdf$TeamAway, FUN=cumsum)
df$HomeTeamGamesPlayedRoad[i] <- sum(v)
}
df$HomeTeamTotalgames <- df$HomeTeamGamesPlayedatHome + df$HomeTeamGamesPlayedRoad
TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
library(dplyr)
df <- df %>% group_by(TeamHome) %>%
mutate(HomeGames = seq_along(TeamHome))
lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- sum(df$TeamAway[1:i] == df$TeamHome[i])
df$HomeTeamGamesPlayedRoad <- unlist(lst)
df %>% mutate(HomeTeamTotalgames = HomeGames+HomeTeamGamesPlayedRoad)
TeamHome TeamAway HomeGames HomeTeamGamesPlayedRoad HomeGames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
HomeGames
是用 seq_along
按行迭代创建的。 HomeTeamGamesPlayedRoad
是通过循环检查 TeamAway
中直到并包括当前游戏的值创建的。最后一行是另外两行的总和。
R新人。我的数据的小代表。
TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(cbind(TeamHome, TeamAway))
df
TeamHome TeamAway
LAL IND
HOU SAS
SAS LAL
LAL HOU
想象一下,这是一个有数千场比赛的赛季的前四场比赛。对于主队和客队,我想计算主场、客场和总比赛的累计次数。因此主队和客队都有 3 个新列。我想得到这样的结果(在这种情况下,我只计算主队的新变量):
TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
为了计算第一列 (HomeTeamGamesPlayedatHome),我设法做到了:
df$HomeTeamGamesPlayedatHome <- ave(df$TeamHome==df$TeamHome, df$TeamHome, FUN=cumsum)
但感觉太复杂了,而且我无法用这种方法计算其他列。
我也想到了用公式table来统计出现的次数:
table(df$TeamHome)
但它只是计算总数,我想要任何给定时间点的结果。 谢谢!
循环解决方案:
TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(TeamHome,TeamAway,HomeTeamGamesPlayedatHome=ave(TeamHome==TeamHome, TeamHome, FUN=cumsum))
for (i in 1:nrow(df)) {
curdf<-df[1:i,];v<-ave(curdf$TeamAway==as.character(curdf$TeamHome[i]), curdf$TeamAway, FUN=cumsum)
df$HomeTeamGamesPlayedRoad[i] <- sum(v)
}
df$HomeTeamTotalgames <- df$HomeTeamGamesPlayedatHome + df$HomeTeamGamesPlayedRoad
TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
library(dplyr)
df <- df %>% group_by(TeamHome) %>%
mutate(HomeGames = seq_along(TeamHome))
lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- sum(df$TeamAway[1:i] == df$TeamHome[i])
df$HomeTeamGamesPlayedRoad <- unlist(lst)
df %>% mutate(HomeTeamTotalgames = HomeGames+HomeTeamGamesPlayedRoad)
TeamHome TeamAway HomeGames HomeTeamGamesPlayedRoad HomeGames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
HomeGames
是用 seq_along
按行迭代创建的。 HomeTeamGamesPlayedRoad
是通过循环检查 TeamAway
中直到并包括当前游戏的值创建的。最后一行是另外两行的总和。