随着团队数量的增加动态排名
Dynamic rankings with increasing number of teams
我的数据的小代表:
Date <- as.Date(rep(c("2015-05-14", "2015-05-15","2015-05-16"),c(4,2,1)))
TEAM1 <- c("GSW","SAS","MIL","ATL","GSW","SAC","LAL")
TEAM2 <- c("HOU","MIN","NOP","LAL","SAS","TOR","GSW")
PCW_TEAM1 <- c(0.88,0.72,0.34,0.46,0.87,0.28,0.24)
PCW_TEAM2 <- c(0.67,0.31,0.52,0.23,0.74,0.48,0.90)
df <- data.frame(cbind(Date,TEAM1,TEAM2,PCW_TEAM1,PCW_TEAM2), stringsAsFactors=F)
df
Date TEAM1 TEAM2 PCW_TEAM1 PCW_TEAM2
1 16569 GSW HOU 0.88 0.67
2 16569 SAS MIN 0.72 0.31
3 16569 MIL NOP 0.34 0.52
4 16569 ATL LAL 0.46 0.23
5 16570 GSW SAS 0.87 0.74
6 16570 SAC TOR 0.28 0.48
7 16571 LAL GSW 0.24 0.9
想象一下,这是 NBA 赛季的前 7 场比赛。在第一个日期 (16569) 有四场比赛,因此排名将超出 8。但是,一旦我们添加下一个日期 (16570),就会有两场比赛,而且只有两支新球队,因为 GSW 和 SAS 已经参加了第一场比赛日期。
我想根据最后可用日期的胜率对独特的球队进行排名。输出将如下所示:
Date TEAM1 TEAM2 PCW_TEAM1 PCW_TEAM2 RANK_TEAM1 RANK_TEAM2
1 16569 GSW HOU 0.88 0.67 1 3
2 16569 SAS MIN 0.72 0.31 2 7
3 16569 MIL NOP 0.34 0.52 6 4
4 16569 ATL LAL 0.46 0.23 5 8
5 16570 GSW SAS 0.87 0.74 1 2
6 16570 SAC TOR 0.28 0.48 9 5
7 16571 LAL GSW 0.24 0.9 10 1
请注意,在第 5 行,GSW 的获胜百分比为 0.87,排名为 1。第一行的获胜百分比更高 (0.88),但也是 GSW。
在此示例中,有 7 场比赛和 10 支不同的球队。根据真实数据,有 30 个独特的团队。
unique(c(TEAM1,TEAM2))
[1] "GSW" "SAS" "MIL" "ATL" "SAC" "LAL" "HOU" "MIN" "NOP" "TOR"
我想创建一个向量来收集每个独特团队的最后可用获胜百分比,然后根据该信息对团队进行排名,但不知道该怎么做,也不知道这是否是最佳方法。
TEAMs <- c(TEAM1,TEAM2)
teamsall <- unique(TEAMs)
PCWs <- c(PCW_TEAM1,PCW_TEAM2)
Dates <- c(Date,Date)
u = order(sapply(1:length(teamsall),function(x) {u=match(TEAMs,teamsall)==x;PCWs[u][which.max(Dates[u])]}),decreasing=T)
df$RANK1 = match(TEAM1,teamsall[u])
df$RANK2 = match(TEAM2,teamsall[u])
df
我觉得这可能是其中一种方式
我认为以长格式处理这些数据会更容易。下面代码的总体思路是扩展数据,以便每个团队在每个日期都有一个条目(当他们没有参加比赛时填充 NA 值)。然后,将数据按球队进行分组,将胜率前移填充NA值(使用na.locf
from zoo
),以确定每个日期的排名。然后,转换回宽格式。
## Rearrange the data into a long format
long <- do.call(rbind, lapply(1:2, function(i)
setNames(df[,c("Date", grep(i, names(df), value=T))], c("Date", "TEAM", "PCW"))))
long$index <- rep(1:nrow(df)) # used to transform back to wide
## Expand to include an entry for each team at each date
dat <- merge(expand.grid(Date=unique(long$Date), TEAM=unique(long$TEAM)), long, all=T)
## Fill in the NA values for each team, carrying forward previous win%
library(zoo) # na.locf
dat <- cbind(do.call(rbind, lapply(split(dat, dat$TEAM), function(x)
transform(x, PCW=na.locf(PCW, na.rm=F)))))
## Then, group by date and order (I would leave it in this form)
library(dplyr)
dat %>% group_by(Date) %>%
mutate(RANK=match(TEAM, TEAM[order(PCW, decreasing = T)])) -> res
## Put it back into wide format
out <- do.call(rbind, lapply(split(res[,-which(names(res)=="index")], res$index), function(x)
setNames(cbind(x[1,], x[2,-1]), c("Date", paste0(names(x)[-1], 1), paste0(names(x)[-1], 2)))))
# Date TEAM1 PCW1 RANK1 TEAM2 PCW2 RANK2
# 1 16569 GSW 0.88 1 HOU 0.67 3
# 2 16569 SAS 0.72 2 MIN 0.31 7
# 3 16569 MIL 0.34 6 NOP 0.52 4
# 4 16569 ATL 0.46 5 LAL 0.23 8
# 5 16570 GSW 0.87 1 SAS 0.74 2
# 6 16570 SAC 0.28 9 TOR 0.48 5
# 7 16571 GSW 0.9 1 LAL 0.24 10
请注意,我没有采取预防措施来确保 TEAM1 和 TEAM2 保持正确的顺序,例如,在最后一行中,与您期望的输出相比,团队发生了切换。
我的数据的小代表:
Date <- as.Date(rep(c("2015-05-14", "2015-05-15","2015-05-16"),c(4,2,1)))
TEAM1 <- c("GSW","SAS","MIL","ATL","GSW","SAC","LAL")
TEAM2 <- c("HOU","MIN","NOP","LAL","SAS","TOR","GSW")
PCW_TEAM1 <- c(0.88,0.72,0.34,0.46,0.87,0.28,0.24)
PCW_TEAM2 <- c(0.67,0.31,0.52,0.23,0.74,0.48,0.90)
df <- data.frame(cbind(Date,TEAM1,TEAM2,PCW_TEAM1,PCW_TEAM2), stringsAsFactors=F)
df
Date TEAM1 TEAM2 PCW_TEAM1 PCW_TEAM2
1 16569 GSW HOU 0.88 0.67
2 16569 SAS MIN 0.72 0.31
3 16569 MIL NOP 0.34 0.52
4 16569 ATL LAL 0.46 0.23
5 16570 GSW SAS 0.87 0.74
6 16570 SAC TOR 0.28 0.48
7 16571 LAL GSW 0.24 0.9
想象一下,这是 NBA 赛季的前 7 场比赛。在第一个日期 (16569) 有四场比赛,因此排名将超出 8。但是,一旦我们添加下一个日期 (16570),就会有两场比赛,而且只有两支新球队,因为 GSW 和 SAS 已经参加了第一场比赛日期。
我想根据最后可用日期的胜率对独特的球队进行排名。输出将如下所示:
Date TEAM1 TEAM2 PCW_TEAM1 PCW_TEAM2 RANK_TEAM1 RANK_TEAM2
1 16569 GSW HOU 0.88 0.67 1 3
2 16569 SAS MIN 0.72 0.31 2 7
3 16569 MIL NOP 0.34 0.52 6 4
4 16569 ATL LAL 0.46 0.23 5 8
5 16570 GSW SAS 0.87 0.74 1 2
6 16570 SAC TOR 0.28 0.48 9 5
7 16571 LAL GSW 0.24 0.9 10 1
请注意,在第 5 行,GSW 的获胜百分比为 0.87,排名为 1。第一行的获胜百分比更高 (0.88),但也是 GSW。
在此示例中,有 7 场比赛和 10 支不同的球队。根据真实数据,有 30 个独特的团队。
unique(c(TEAM1,TEAM2))
[1] "GSW" "SAS" "MIL" "ATL" "SAC" "LAL" "HOU" "MIN" "NOP" "TOR"
我想创建一个向量来收集每个独特团队的最后可用获胜百分比,然后根据该信息对团队进行排名,但不知道该怎么做,也不知道这是否是最佳方法。
TEAMs <- c(TEAM1,TEAM2)
teamsall <- unique(TEAMs)
PCWs <- c(PCW_TEAM1,PCW_TEAM2)
Dates <- c(Date,Date)
u = order(sapply(1:length(teamsall),function(x) {u=match(TEAMs,teamsall)==x;PCWs[u][which.max(Dates[u])]}),decreasing=T)
df$RANK1 = match(TEAM1,teamsall[u])
df$RANK2 = match(TEAM2,teamsall[u])
df
我觉得这可能是其中一种方式
我认为以长格式处理这些数据会更容易。下面代码的总体思路是扩展数据,以便每个团队在每个日期都有一个条目(当他们没有参加比赛时填充 NA 值)。然后,将数据按球队进行分组,将胜率前移填充NA值(使用na.locf
from zoo
),以确定每个日期的排名。然后,转换回宽格式。
## Rearrange the data into a long format
long <- do.call(rbind, lapply(1:2, function(i)
setNames(df[,c("Date", grep(i, names(df), value=T))], c("Date", "TEAM", "PCW"))))
long$index <- rep(1:nrow(df)) # used to transform back to wide
## Expand to include an entry for each team at each date
dat <- merge(expand.grid(Date=unique(long$Date), TEAM=unique(long$TEAM)), long, all=T)
## Fill in the NA values for each team, carrying forward previous win%
library(zoo) # na.locf
dat <- cbind(do.call(rbind, lapply(split(dat, dat$TEAM), function(x)
transform(x, PCW=na.locf(PCW, na.rm=F)))))
## Then, group by date and order (I would leave it in this form)
library(dplyr)
dat %>% group_by(Date) %>%
mutate(RANK=match(TEAM, TEAM[order(PCW, decreasing = T)])) -> res
## Put it back into wide format
out <- do.call(rbind, lapply(split(res[,-which(names(res)=="index")], res$index), function(x)
setNames(cbind(x[1,], x[2,-1]), c("Date", paste0(names(x)[-1], 1), paste0(names(x)[-1], 2)))))
# Date TEAM1 PCW1 RANK1 TEAM2 PCW2 RANK2
# 1 16569 GSW 0.88 1 HOU 0.67 3
# 2 16569 SAS 0.72 2 MIN 0.31 7
# 3 16569 MIL 0.34 6 NOP 0.52 4
# 4 16569 ATL 0.46 5 LAL 0.23 8
# 5 16570 GSW 0.87 1 SAS 0.74 2
# 6 16570 SAC 0.28 9 TOR 0.48 5
# 7 16571 GSW 0.9 1 LAL 0.24 10
请注意,我没有采取预防措施来确保 TEAM1 和 TEAM2 保持正确的顺序,例如,在最后一行中,与您期望的输出相比,团队发生了切换。