如何根据矩阵位置计算总分?

How to calculate a overall score based on matrix positions?

我有一个包含 12 列不同参与者的数据框,在前 5 名中。它看起来像这样:

> top_5
     4         5         8         9          11         12         15         16         19         20        22         23       
[1,] "Nia"     "Hung"    "Hanaaa"  "Ramziyya" "Marissa"  "Jaelyn"   "Shyanne"  "Jaabir"   "Dionicio" "Nia"     "Shyanne"  "Roger"  
[2,] "Razeena" "Husni"   "Bradly"  "Marissa"  "Bradly"   "Muhsin"   "Razeena"  "Dionicio" "Magnus"   "Kelsey"  "Nia"      "Schyler"
[3,] "Shyanne" "Schyler" "Necko"   "Johannah" "Tatiana"  "Glenn"    "Nia"      "Jaelyn"   "Shyanne"  "Hanaaa"  "Mildred"  "German" 
[4,] "Schyler" "German"  "Hung"    "Lubaaba"  "Johannah" "Magnus"   "Dionicio" "German"   "German"   "Razeena" "Dionicio" "Jaabir" 
[5,] "Husni"   "Necko"   "Razeena" "Afeefa"   "Schyler"  "Dionicio" "Jaabir"   "Roger"    "Johannah" "Remy"    "Jaabir"   "Jaelyn" 

(并且可以使用它重新创建):

structure(c("Nia", "Razeena", "Shyanne", "Schyler", "Husni", 
"Hung", "Husni", "Schyler", "German", "Necko", "Hanaaa", "Bradly", 
"Necko", "Hung", "Razeena", "Ramziyya", "Marissa", "Johannah", 
"Lubaaba", "Afeefa", "Marissa", "Bradly", "Tatiana", "Johannah", 
"Schyler", "Jaelyn", "Muhsin", "Glenn", "Magnus", "Dionicio", 
"Shyanne", "Razeena", "Nia", "Dionicio", "Jaabir", "Jaabir", 
"Dionicio", "Jaelyn", "German", "Roger", "Dionicio", "Magnus", 
"Shyanne", "German", "Johannah", "Nia", "Kelsey", "Hanaaa", "Razeena", 
"Remy", "Shyanne", "Nia", "Mildred", "Dionicio", "Jaabir", "Roger", 
"Schyler", "German", "Jaabir", "Jaelyn"), .Dim = c(5L, 12L), .Dimnames = list(
    NULL, c("4", "5", "8", "9", "11", "12", "15", "16", "19", 
    "20", "22", "23")))

现在,如果参与者在顶行,则意味着他们在该列中排在第一位(因此对于第 1 列,"Nia" 排在第一位,"Razeena" 排在第二位,依此类推。 ).排名第一名 5 分,第二名 4 分,依此类推。现在我想为矩阵中的每个参与者计算 her/his 分。
我的目标是进入总分前 5。我该怎么做?

一个选项是 split 行索引与矩阵值反转为 list 并通过遍历 list 获得每个 list 元素的 sum =13=] (sapply)

out <- sapply(split(row(top_5)[nrow(top_5):1, ], top_5), sum)
out
#Afeefa   Bradly Dionicio   German    Glenn   Hanaaa     Hung    Husni   Jaabir   Jaelyn Johannah   Kelsey  Lubaaba   Magnus  Marissa  Mildred   Muhsin 
#       1        8       14        9        3        8        7        5        9        9        6        4        2        6        9        3        4 
#   Necko      Nia Ramziyya  Razeena     Remy    Roger  Schyler  Shyanne  Tatiana 
#       4       17        5       11        1        6       10       16        3 


head(out[order(-out)], 5)
# Nia  Shyanne Dionicio  Razeena  Schyler 
#  17       16       14       11       10 

或者另一种选择是 rowsum

rowsum(c(row(top_5)[nrow(top_5):1, ]), group = c(top_5))

使用tidyverse函数:

library(tidyr)
library(dplyr)

top_5 %>% 
  as.data.frame %>% 
  head(.,5) %>%
  mutate(rank = nrow(.):1) %>% 
  pivot_longer(., -c(rank), values_to = "name", names_to = "col") %>% 
  group_by(name) %>% 
  summarise_at(vars(rank), list(points = sum))

#> # A tibble: 26 x 2
#>    name   points
#>    <fct>   <int>
#>  1 Husni       5
#>  2 Nia        17
#>  3 Razeena    11
#>  4 Schyler    10
#>  5 Shyanne    16
#>  6 German      9
#>  7 Hung        7
#>  8 Necko       4
#>  9 Bradly      8
#> 10 Hanaaa      8
#> # ... with 16 more rows

这是一个类似于 M-- 的答案的 "convert to long then summarise by group" 方法,但是 data.table

library(data.table)

df <- as.data.table(top_5)[, points := .N:1]
total_points <- melt(df, 'points')[, .(points = sum(points)), value]
setorder(total_points, -points)

head(total_points, 5)
#       value points
# 1:      Nia     17
# 2:  Shyanne     16
# 3: Dionicio     14
# 4:  Razeena     11
# 5:  Schyler     10

或者一个与akrun非常相似的想法,只是用tapply代替sapply + split

out <- sort(tapply(c(6 - row(top_5)), c(top_5), sum), decreasing = TRUE)

head(out, 5)
# Nia  Shyanne Dionicio  Razeena  Schyler 
#  17       16       14       11       10