发生次数的动态计数

Dynamic counting of occurrences

R新人。我的数据的小代表。

TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(cbind(TeamHome, TeamAway))
df

   TeamHome TeamAway
     LAL      IND
     HOU      SAS
     SAS      LAL
     LAL      HOU

想象一下,这是一个有数千场比赛的赛季的前四场比赛。对于主队和客队,我想计算主场、客场和总比赛的累计次数。因此主队和客队都有 3 个新列。我想得到这样的结果(在这种情况下,我只计算主队的新变量):

    TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1      LAL      IND                         1                       0                  1
2      HOU      SAS                         1                       0                  1
3      SAS      LAL                         1                       1                  2
4      LAL      HOU                         2                       1                  3

为了计算第一列 (HomeTeamGamesPlayedatHome),我设法做到了:

df$HomeTeamGamesPlayedatHome <- ave(df$TeamHome==df$TeamHome, df$TeamHome, FUN=cumsum)

但感觉太复杂了,而且我无法用这种方法计算其他列。

我也想到了用公式table来统计出现的次数:

 table(df$TeamHome)

但它只是计算总数,我想要任何给定时间点的结果。 谢谢!

循环解决方案:

TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(TeamHome,TeamAway,HomeTeamGamesPlayedatHome=ave(TeamHome==TeamHome, TeamHome, FUN=cumsum))

for (i in 1:nrow(df)) {
        curdf<-df[1:i,];v<-ave(curdf$TeamAway==as.character(curdf$TeamHome[i]), curdf$TeamAway, FUN=cumsum)
        df$HomeTeamGamesPlayedRoad[i] <- sum(v)
}
df$HomeTeamTotalgames <- df$HomeTeamGamesPlayedatHome + df$HomeTeamGamesPlayedRoad

      TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1      LAL      IND                         1                       0                  1
2      HOU      SAS                         1                       0                  1
3      SAS      LAL                         1                       1                  2
4      LAL      HOU                         2                       1                  3
library(dplyr)
df <- df %>% group_by(TeamHome) %>% 
  mutate(HomeGames = seq_along(TeamHome))
lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- sum(df$TeamAway[1:i] == df$TeamHome[i])
df$HomeTeamGamesPlayedRoad <- unlist(lst)
df %>% mutate(HomeTeamTotalgames = HomeGames+HomeTeamGamesPlayedRoad)
  TeamHome TeamAway HomeGames HomeTeamGamesPlayedRoad HomeGames
1      LAL      IND         1                       0         1
2      HOU      SAS         1                       0         1
3      SAS      LAL         1                       1         2
4      LAL      HOU         2                       1         3

HomeGames 是用 seq_along 按行迭代创建的。 HomeTeamGamesPlayedRoad 是通过循环检查 TeamAway 中直到并包括当前游戏的值创建的。最后一行是另外两行的总和。