从 data.table 创建方形偏好矩阵

Creating a square preference matrix from a data.table

我正在尝试根据 data.table 个条目创建一个偏好或计数方阵(实际上并不重要)。

假设我有以下 data.table 可以使用:

library(data.table)

segment=c("track","track","track","round","round","sprint","sprint","sprint","sprint")
athlete=c("gunnar","brandon","raphael","gunnar","ben","brandon","raphael","ben","gunnar")
time=c(54,56,57,23,25,15,16,16,17)

df <- data.table(athlete,segment,time)

df[,time_diff:=min(time)-time,by=segment]

df[,winner:=athlete[1],by=segment]

    athlete segment time time_diff  winner
 1:  gunnar   track   54         0  gunnar
 2: brandon   track   56        -2  gunnar
 3: raphael   track   57        -3  gunnar
 4: raphael   round   23         0 raphael
 5:     ben   round   25        -2 raphael
 6: brandon   round   28        -5 raphael
 7: brandon  sprint   15         0 brandon
 8: raphael  sprint   16        -1 brandon
 9:     ben  sprint   19        -4 brandon
10:  gunnar  sprint   26       -11 brandon

names <- unique(df$athlete)

[1] "gunnar"  "brandon" "raphael" "ben" 

现在我想要一个关于运动员的方阵,显示他们与每条赛道的获胜者的时间,类似于此:

        gunnar  brandon  raphael  ben
gunnar     0     -11        0      0       
brandon   -2       0       -5      0
raphael   -3      -1        0      0
ben       -2      -4        0      0

我脑子里有一些想法来解决这个问题,但似乎没有任何效果。我来自 MATLAB 背景,在那里我只是迭代,但我觉得这根本不是 data.table 方法。

我觉得我应该能够对运动员使用 foreach 迭代来完成它。大致如下:

foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]

[[1]]
    winner pref
1:  gunnar    0
2: brandon  -11

[[2]]
    winner pref
1:  gunnar   -2
2: raphael   -5
3: brandon    0

[[3]]
    winner pref
1:  gunnar   -3
2: raphael    0
3: brandon   -1

[[4]]
    winner pref
1: raphael   -2
2: brandon   -4

但此时我陷入困境,不确定如何继续。我有一些初步的想法,创建一个适当长度的向量 vec <- vector(mode="double", length=length(names)),然后使用 which(names %in% df[,winner,by=IREALLYDONTKNOW]) 对其进行索引,但如您所见,我不清楚如何正确处理它。

如果有人能给我一些关于正确 data.table 方法的提示,我将不胜感激。

虽然 运行 您的代码不会生成打印的 table,但我认为您正在寻找的是 dcast.data.table:

dt_compare <- dcast.data.table(df, athlete ~ winner, value.var = "time_diff")
# add zero columns for athletes that did not win
dt_compare[, setdiff(unique(athlete), names(dt_compare)) := 0]
# you can also reorder columns
setcolorder(dt_compare, c("athlete", dt_compare[["athlete"]]))

我解决的方法其实很简单,经过一番领悟:

names <- unique(df$athlete)

vec <- matrix(data = 0,nrow=length(names),ncol=length(names),dimnames=list(names,names))

pref <- foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]

foreach(n=1:length(names)) %do% (vec[names[n],pref[[n]]$winner] <- pref[[n]]$pref)

> vec
        gunnar brandon raphael ben
gunnar       0     -11       0   0
brandon     -2       0      -5   0
raphael     -3      -1       0   0
ben          0      -4      -2   0