
Number of coincident combinations between groups to generate a posterior matrix

我有一个类似 df:

id <- c("A" , "A" , "A" , "A", "B", "B", "B", "C", "C", "C") 
type <- c(1, 4, 3, 6, 1, 4, 5, 2, 3, 6)
df <- data_frame(id, type)

我想计算每个 (id) 中发生的组合。


A = matrix(
  # Taking sequence of elements 
  c(NA, 0, 1, 2, 1, 1, 0, NA, 1, 0,0,1, 1, 1, NA, 1,0,2, 2, 0, 1, NA, 1, 1, 1,0,0,1, NA, 0, 1,1,2,1,0, NA),
  # No of rows
  nrow = 6,  
  # No of columns
  ncol = 6,        
  # By default matrices are in column-wise order
  # So this parameter decides how to arrange the matrix
  byrow = TRUE         
# Naming rows
rownames(A) = c("Type 1", "Type 2", "Type 3", "Type 4", "Type 5", "Type 6")

# Naming columns
colnames(A) = c("Type 1", "Type 2", "Type 3", "Type 4", "Type 5", "Type 6")

cat("Number of coincidences between Type by id")


intermediate_step <- expand.grid(Variety1=unique(df$Type),    # reshape with a symmetric output
                  Variety2=unique(df$Type), stringsAsFactors = F) %>%
  mutate(counts = map2_dbl(Variety1, Variety2, ~length(intersect(df$id[df$Type ==.x], 
                                                     df$id[df$Type ==.y])))) %>% 
  filter(Variety1 != Variety2) 

AA <- spread(intermediate_step, Variety2, counts)


  1. intermediate_step 计算不正确
  2. 这种方法在计算上非常昂贵。对于这个玩具示例,它有效。对于我的真实数据(93k 个条目),RStudio 中止会话

... 第二个问题的可能解决方案 ...




使用正确的数据 - 即第 2 行第 2 列应该是 4 而不是 2 df[2,2 <- 4,你可以这样做:

`diag<-`(crossprod(table(df)), NA)

type  1  2  3  4  5  6
   1 NA  0  1  2  1  1
   2  0 NA  1  0  0  1
   3  1  1 NA  1  0  2
   4  2  0  1 NA  1  1
   5  1  0  0  1 NA  0
   6  1  1  2  1  0 NA