使用 diag 将只有唯一比较的长格式转换为完整的方阵

Convert long format with only unique comparisons to full square matrix with the diag

假设我有那种输入数据 一个 data.frame 的文件,其中包含长格式的数据,并且只有 Species_A 和 Species_B 之间的唯一比较如下:

Species_A Species_B values
A B 58
A C 64
A D 78
A E 32
B C 10
B D 12
B E 54
C D 99
C E 84
D E 42

我想知道如何轻松地将输入文件转换为方阵

    A   B   C   D   E
A   100 58  64  78  32
B   58  100 10  12  54
C   64  10  100 99  84
D   78  12  99  100 42
E   32  54  84  42  100

使用tidyverse函数的解决方案:

library(tidyverse)

cor_data <- tribble(
~Species_A, ~Species_B, ~values,
"A","B",58,
"A","C",64,
"A","D",78,
"A","E",32,
"B","C",10,
"B","D",12,
"B","E",54,
"C","D",99,
"C","E",84,
"D","E",42)

expand.grid(unique(cor_data[["Species_A"]]), unique(cor_data[["Species_A"]])) %>% 
  left_join(cor_data, by =c("Var1" = "Species_A", "Var2" = "Species_B")) %>% 
  left_join(cor_data, by =c("Var1" = "Species_B", "Var2" = "Species_A")) %>% 
  transmute(Species_A = Var1, Species_B = Var2, values = coalesce(values.x, values.y)) %>% 
  spread(Species_B, values)

我认为你可以通过矩阵子集化来实现你的目标。

# get row/column names of new matrix from columns 1 and 2 of data.frame
myNames <- sort(unique(as.character(unlist(df[1:2]))))

# build matrix of 0s
myMat <- matrix(0, 5, 5, dimnames = list(myNames, myNames))

# fill in upper triangle
myMat[as.matrix(df[c(1,2)])] <- df$values
# fill in the lower triangle
myMat[as.matrix(df[c(2,1)])] <- df$values
# fill in the diagonal
diag(myMat) <- 100

哪个returns

myMat
    A   B   C   D   E
A 100  58  64  78  32
B  58 100  10  12  54
C  64  10 100  99  84
D  78  12  99 100  42
E  32  54  84  42 100

备注

也可以填写下三角。

myMat[lower.tri(myMat)] <- t(myMat)[lower.tri(myMat)]

数据

df <-
structure(list(Species_A = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 3L, 3L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), 
    Species_B = structure(c(1L, 2L, 3L, 4L, 2L, 3L, 4L, 3L, 4L, 
    4L), .Label = c("B", "C", "D", "E"), class = "factor"), values = c(58L, 
    64L, 78L, 32L, 10L, 12L, 54L, 99L, 84L, 42L)), .Names = c("Species_A", 
"Species_B", "values"), class = "data.frame", row.names = c(NA, 
-10L))

好的,我终于成功了

1/ Add self comparison in the data table
2/ Use reshape(df, idvar = "Species_A", timevar = "Species_B", direction = "wide"), constructing sqaure matrix with NA as missing values
3/ reorder the matrix  row and column by counting NA ( in order to retrieve the lower or upper triangular matrix) and now we have half_matrix
4/ then fill the missing part of the matrix by sum the half_matrix and its transposed matrix
square_matrix_full = t(half_matrix) + half_matrix
5/ diag(square_matrix_full) = 100