使用 diag 将只有唯一比较的长格式转换为完整的方阵
Convert long format with only unique comparisons to full square matrix with the diag
假设我有那种输入数据
一个 data.frame 的文件,其中包含长格式的数据,并且只有 Species_A 和 Species_B 之间的唯一比较如下:
Species_A Species_B values
A B 58
A C 64
A D 78
A E 32
B C 10
B D 12
B E 54
C D 99
C E 84
D E 42
我想知道如何轻松地将输入文件转换为方阵
A B C D E
A 100 58 64 78 32
B 58 100 10 12 54
C 64 10 100 99 84
D 78 12 99 100 42
E 32 54 84 42 100
使用tidyverse
函数的解决方案:
library(tidyverse)
cor_data <- tribble(
~Species_A, ~Species_B, ~values,
"A","B",58,
"A","C",64,
"A","D",78,
"A","E",32,
"B","C",10,
"B","D",12,
"B","E",54,
"C","D",99,
"C","E",84,
"D","E",42)
expand.grid(unique(cor_data[["Species_A"]]), unique(cor_data[["Species_A"]])) %>%
left_join(cor_data, by =c("Var1" = "Species_A", "Var2" = "Species_B")) %>%
left_join(cor_data, by =c("Var1" = "Species_B", "Var2" = "Species_A")) %>%
transmute(Species_A = Var1, Species_B = Var2, values = coalesce(values.x, values.y)) %>%
spread(Species_B, values)
我认为你可以通过矩阵子集化来实现你的目标。
# get row/column names of new matrix from columns 1 and 2 of data.frame
myNames <- sort(unique(as.character(unlist(df[1:2]))))
# build matrix of 0s
myMat <- matrix(0, 5, 5, dimnames = list(myNames, myNames))
# fill in upper triangle
myMat[as.matrix(df[c(1,2)])] <- df$values
# fill in the lower triangle
myMat[as.matrix(df[c(2,1)])] <- df$values
# fill in the diagonal
diag(myMat) <- 100
哪个returns
myMat
A B C D E
A 100 58 64 78 32
B 58 100 10 12 54
C 64 10 100 99 84
D 78 12 99 100 42
E 32 54 84 42 100
备注
也可以填写下三角。
myMat[lower.tri(myMat)] <- t(myMat)[lower.tri(myMat)]
数据
df <-
structure(list(Species_A = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
Species_B = structure(c(1L, 2L, 3L, 4L, 2L, 3L, 4L, 3L, 4L,
4L), .Label = c("B", "C", "D", "E"), class = "factor"), values = c(58L,
64L, 78L, 32L, 10L, 12L, 54L, 99L, 84L, 42L)), .Names = c("Species_A",
"Species_B", "values"), class = "data.frame", row.names = c(NA,
-10L))
好的,我终于成功了
1/ Add self comparison in the data table
2/ Use reshape(df, idvar = "Species_A", timevar = "Species_B", direction = "wide"), constructing sqaure matrix with NA as missing values
3/ reorder the matrix row and column by counting NA ( in order to retrieve the lower or upper triangular matrix) and now we have half_matrix
4/ then fill the missing part of the matrix by sum the half_matrix and its transposed matrix
square_matrix_full = t(half_matrix) + half_matrix
5/ diag(square_matrix_full) = 100
假设我有那种输入数据 一个 data.frame 的文件,其中包含长格式的数据,并且只有 Species_A 和 Species_B 之间的唯一比较如下:
Species_A Species_B values
A B 58
A C 64
A D 78
A E 32
B C 10
B D 12
B E 54
C D 99
C E 84
D E 42
我想知道如何轻松地将输入文件转换为方阵
A B C D E
A 100 58 64 78 32
B 58 100 10 12 54
C 64 10 100 99 84
D 78 12 99 100 42
E 32 54 84 42 100
使用tidyverse
函数的解决方案:
library(tidyverse)
cor_data <- tribble(
~Species_A, ~Species_B, ~values,
"A","B",58,
"A","C",64,
"A","D",78,
"A","E",32,
"B","C",10,
"B","D",12,
"B","E",54,
"C","D",99,
"C","E",84,
"D","E",42)
expand.grid(unique(cor_data[["Species_A"]]), unique(cor_data[["Species_A"]])) %>%
left_join(cor_data, by =c("Var1" = "Species_A", "Var2" = "Species_B")) %>%
left_join(cor_data, by =c("Var1" = "Species_B", "Var2" = "Species_A")) %>%
transmute(Species_A = Var1, Species_B = Var2, values = coalesce(values.x, values.y)) %>%
spread(Species_B, values)
我认为你可以通过矩阵子集化来实现你的目标。
# get row/column names of new matrix from columns 1 and 2 of data.frame
myNames <- sort(unique(as.character(unlist(df[1:2]))))
# build matrix of 0s
myMat <- matrix(0, 5, 5, dimnames = list(myNames, myNames))
# fill in upper triangle
myMat[as.matrix(df[c(1,2)])] <- df$values
# fill in the lower triangle
myMat[as.matrix(df[c(2,1)])] <- df$values
# fill in the diagonal
diag(myMat) <- 100
哪个returns
myMat
A B C D E
A 100 58 64 78 32
B 58 100 10 12 54
C 64 10 100 99 84
D 78 12 99 100 42
E 32 54 84 42 100
备注
也可以填写下三角。
myMat[lower.tri(myMat)] <- t(myMat)[lower.tri(myMat)]
数据
df <-
structure(list(Species_A = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
Species_B = structure(c(1L, 2L, 3L, 4L, 2L, 3L, 4L, 3L, 4L,
4L), .Label = c("B", "C", "D", "E"), class = "factor"), values = c(58L,
64L, 78L, 32L, 10L, 12L, 54L, 99L, 84L, 42L)), .Names = c("Species_A",
"Species_B", "values"), class = "data.frame", row.names = c(NA,
-10L))
好的,我终于成功了
1/ Add self comparison in the data table
2/ Use reshape(df, idvar = "Species_A", timevar = "Species_B", direction = "wide"), constructing sqaure matrix with NA as missing values
3/ reorder the matrix row and column by counting NA ( in order to retrieve the lower or upper triangular matrix) and now we have half_matrix
4/ then fill the missing part of the matrix by sum the half_matrix and its transposed matrix
square_matrix_full = t(half_matrix) + half_matrix
5/ diag(square_matrix_full) = 100