根据索引行是相似还是不同来填充矩阵
Fill in a matrix based on whether the index rows are similar or different
我在 R 中有一个非常大的成对距离矩阵。我想根据 row/column 名称相同或不同来对矩阵中的单元格进行编码。
在较小的范围内,row/column 个名字将是:
individuals <- c("apple", "pear", "apple", "cranberry", "peach", "apple")
除了 apple
与 apple
的比较外,我想要一个包含 1
的矩阵,用于每个涉及 apple
的比较。看起来像:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "0" "1" "1" "1" "1" "1"
[2,] "1" "0" "1" "0" "0" "1"
[3,] "1" "1" "0" "1" "1" "1"
[4,] "1" "0" "1" "0" "0" "1"
[5,] "1" "0" "1" "0" "0" "1"
[6,] "1" "1" "1" "1" "1" "0"
我知道我可以通过以下方式实现:
final.matrix <- matrix(nrow= length(individuals), ncol = length(individuals))
final.matrix[grep("apples", individuals),] <- 1
final.matrix[,grep("apples", individuals)] <- 1
diag(final.matrix) <- 0
final.matrix[is.na(final.matrix)] <- 0
但必须有一个 cleaner/simpler 方法。我错过了什么?
此外,当 row/column 名称是一个小标题时,这不起作用,这就是它们在现实中的样子。有关适用于 tibbles 的解决方案的建议?
tibble_inds <- as_tibble(individuals)
grep("apple", tibble_inds)
# 1
听起来你想要
outer(x, x, function(a, b) as.integer(a + b == 1L))
哪里
x <- tibble_inds[[1L]] == "apple"
如果您只接受 "apple"
或
x <- grepl("apple", tibble_inds[[1L]])
如果您接受任何包含 "apple"
的字符串作为子字符串。
我假设您的字符向量 individuals
是 tibble_inds
中的第一个变量。在这种情况下,outer
returns
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 1 0 1 1 0
## [2,] 1 0 1 0 0 1
## [3,] 0 1 0 1 1 0
## [4,] 1 0 1 0 0 1
## [5,] 1 0 1 0 0 1
## [6,] 0 1 0 1 1 0
对于 x
的两种选择。此结果与您的不匹配,因为您的 diag<-
调用未命中 [1,3]
、[3,1]
、[3,6]
、[6,3]
、[1,6]
和 [6,1]
.
另一个可能的解决方案:
individuals <- c("apple", "pear", "apple", "cranberry", "peach", "apple")
m <- matrix(0, length(individuals), length(individuals))
for (i in 1:length(individuals))
for (j in 1:length(individuals))
m[i, j] <- +(sum(c(individuals[i], individuals[j]) == "apple") == 1)
m
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 1 0 1 1 0
#> [2,] 1 0 1 0 0 1
#> [3,] 0 1 0 1 1 0
#> [4,] 1 0 1 0 0 1
#> [5,] 1 0 1 0 0 1
#> [6,] 0 1 0 1 1 0
或用嵌套 sapply
替换嵌套 for
循环:
m <- matrix(0, length(individuals), length(individuals))
sapply(1:length(individuals), \(i) sapply(1:length(individuals),
\(j) m[i,j] <- +(sum(c(individuals[i], individuals[j]) == "apple") == 1)))
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 1 0 1 1 0
#> [2,] 1 0 1 0 0 1
#> [3,] 0 1 0 1 1 0
#> [4,] 1 0 1 0 0 1
#> [5,] 1 0 1 0 0 1
#> [6,] 0 1 0 1 1 0
我们可以像下面那样尝试outer
> x <- grepl("apple",individuals)
> +(outer(x, x, `+`) == 1)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 0 1 1 0
[2,] 1 0 1 0 0 1
[3,] 0 1 0 1 1 0
[4,] 1 0 1 0 0 1
[5,] 1 0 1 0 0 1
[6,] 0 1 0 1 1 0
我在 R 中有一个非常大的成对距离矩阵。我想根据 row/column 名称相同或不同来对矩阵中的单元格进行编码。
在较小的范围内,row/column 个名字将是:
individuals <- c("apple", "pear", "apple", "cranberry", "peach", "apple")
除了 apple
与 apple
的比较外,我想要一个包含 1
的矩阵,用于每个涉及 apple
的比较。看起来像:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "0" "1" "1" "1" "1" "1"
[2,] "1" "0" "1" "0" "0" "1"
[3,] "1" "1" "0" "1" "1" "1"
[4,] "1" "0" "1" "0" "0" "1"
[5,] "1" "0" "1" "0" "0" "1"
[6,] "1" "1" "1" "1" "1" "0"
我知道我可以通过以下方式实现:
final.matrix <- matrix(nrow= length(individuals), ncol = length(individuals))
final.matrix[grep("apples", individuals),] <- 1
final.matrix[,grep("apples", individuals)] <- 1
diag(final.matrix) <- 0
final.matrix[is.na(final.matrix)] <- 0
但必须有一个 cleaner/simpler 方法。我错过了什么?
此外,当 row/column 名称是一个小标题时,这不起作用,这就是它们在现实中的样子。有关适用于 tibbles 的解决方案的建议?
tibble_inds <- as_tibble(individuals)
grep("apple", tibble_inds)
# 1
听起来你想要
outer(x, x, function(a, b) as.integer(a + b == 1L))
哪里
x <- tibble_inds[[1L]] == "apple"
如果您只接受 "apple"
或
x <- grepl("apple", tibble_inds[[1L]])
如果您接受任何包含 "apple"
的字符串作为子字符串。
我假设您的字符向量 individuals
是 tibble_inds
中的第一个变量。在这种情况下,outer
returns
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 1 0 1 1 0
## [2,] 1 0 1 0 0 1
## [3,] 0 1 0 1 1 0
## [4,] 1 0 1 0 0 1
## [5,] 1 0 1 0 0 1
## [6,] 0 1 0 1 1 0
对于 x
的两种选择。此结果与您的不匹配,因为您的 diag<-
调用未命中 [1,3]
、[3,1]
、[3,6]
、[6,3]
、[1,6]
和 [6,1]
.
另一个可能的解决方案:
individuals <- c("apple", "pear", "apple", "cranberry", "peach", "apple")
m <- matrix(0, length(individuals), length(individuals))
for (i in 1:length(individuals))
for (j in 1:length(individuals))
m[i, j] <- +(sum(c(individuals[i], individuals[j]) == "apple") == 1)
m
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 1 0 1 1 0
#> [2,] 1 0 1 0 0 1
#> [3,] 0 1 0 1 1 0
#> [4,] 1 0 1 0 0 1
#> [5,] 1 0 1 0 0 1
#> [6,] 0 1 0 1 1 0
或用嵌套 sapply
替换嵌套 for
循环:
m <- matrix(0, length(individuals), length(individuals))
sapply(1:length(individuals), \(i) sapply(1:length(individuals),
\(j) m[i,j] <- +(sum(c(individuals[i], individuals[j]) == "apple") == 1)))
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 1 0 1 1 0
#> [2,] 1 0 1 0 0 1
#> [3,] 0 1 0 1 1 0
#> [4,] 1 0 1 0 0 1
#> [5,] 1 0 1 0 0 1
#> [6,] 0 1 0 1 1 0
我们可以像下面那样尝试outer
> x <- grepl("apple",individuals)
> +(outer(x, x, `+`) == 1)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 0 1 1 0
[2,] 1 0 1 0 0 1
[3,] 0 1 0 1 1 0
[4,] 1 0 1 0 0 1
[5,] 1 0 1 0 0 1
[6,] 0 1 0 1 1 0