R - 有条件地汇总来自所有可能的列对的数据

R - Conditionally summarize data from all possible column pairs

我有一个 table,其中列出了每个有机体在几种不同条件下的 presence/absence。我的目标是生成一个新的 table,列出每对生物的所有可能维恩图的值。

...换句话说:对于每对生物,我想要一个 table 总结:

  1. 它们共有的条件数(organism1 == 1 & organism2 == 1)
  2. 有机体1独有的条件数(有机体1 == 1 & 有机体2 == 0)
  3. 有机体2独有的条件数(有机体1 == 0 & 有机体2 == 1)

我目前的方法如下,虽然我真正的Presence/Absence table要大得多,所以如果有更简洁的方法来自动化这个就太好了! (即 for 循环?!)

示例Presence/Absence Table(行=条件,列=有机体):

paData <- data.table(
  Pyro = c(1,1,0,0,1,0,1),
  Anth = c(0,1,0,1,0,1,1),
  Tric = c(1,1,0,1,0,1,1))
 
paData
   Pyro Anth Tric
1:    1    0    1
2:    1    1    1
3:    0    0    0
4:    0    1    1
5:    1    0    0
6:    0    1    1
7:    1    1    1

对于每对生物体(列)指定在每个条件(行)中是否存在一个、两个或两个生物体:

paData$PyroAnth <- ifelse(paData[,1] ==1 & 
                            paData[,2] ==0, "V1alone",
                        ifelse(paData[,1] ==1 & 
                                 paData[,2] ==1, "Overlap",
                               ifelse(paData[,1] ==0 & 
                                        paData[,2] ==1, "V2alone", 
                                            "NA")))

paData$PyroTric <- ifelse(paData[,1] ==1 & 
                           paData[,3] ==0, "V1alone",
                       ifelse(paData[,1] ==1 & 
                                paData[,3] ==1, "Overlap",
                              ifelse(paData[,1] ==0 & 
                                       paData[,3] ==1, "V2alone", 
                                     "NA")))

paData$AnthTric <- ifelse(paData[,2] ==1 & 
                           paData[,3] ==0, "V1alone",
                         ifelse(paData[,2] ==1 & 
                                  paData[,3] ==1, "Overlap",
                                ifelse(paData[,2] ==0 & 
                                         paData[,3] ==1, "V2alone", 
                                       "NA")))

paData
   Pyro Anth Tric PyroAnth PyroTric AnthTric
1:    1    0    1  V1alone  Overlap  V2alone
2:    1    1    1  Overlap  Overlap  Overlap
3:    0    0    0       NA       NA       NA
4:    0    1    1  V2alone  V2alone  Overlap
5:    1    0    0  V1alone  V1alone       NA
6:    0    1    1  V2alone  V2alone  Overlap
7:    1    1    1  Overlap  Overlap  Overlap

创建所需的输出table -- 计算每对生物的条件(行)数;每个生物体要么“单独”存在,要么与第二种生物体“重叠”存在。

DesiredOutput <- data.frame(rbind(list(names(paData[,1]), names(paData[,2]),
                                       nrow(paData[PyroAnth == "V1alone"]),
                                       nrow(paData[PyroAnth == "Overlap"]),
                                       nrow(paData[PyroAnth == "V2alone"])),
                                  list(names(paData[,1]), names(paData[,3]),
                                       nrow(paData[PyroTri == "V1alone"]),
                                       nrow(paData[PyroTri == "Overlap"]),
                                       nrow(paData[PyroTri == "V2alone"])),
                                  list(names(paData[,2]), names(paData[,3]),
                                       nrow(paData[AnthTri == "V1alone"]),
                                       nrow(paData[AnthTri == "Overlap"]),
                                       nrow(paData[AnthTri == "V2alone"]))))

colnames(DesiredOutput) <- c("V1", "V2", "V1alone", "Overlap", "V2alone")

DesiredOutput
    V1   V2 V1alone Overlap V2alone
1 Pyro Anth       2       2       2
2 Pyro Tric       1       3       2
3 Anth Tric       0       4       1

如何实现自动化以有效地为数十种生物体和数百种条件创建我的“DesiredOutput”table?

您可以试试这个方法:

f <- function(v1,v2) list(sum(v1 & !v2),sum(v1 & v2),sum(!v1 & v2))

result = data.table(t(combn(names(paData),2)))

result[, c("v1alone", "overlap", "v2alone"):=f(paData[[V1]], paData[[V2]]), by=1:nrow(result)]

输出:

     V1   V2 v1alone overlap v2alone
1: Pyro Anth       2       2       2
2: Pyro Tric       1       3       2
3: Anth Tric       0       4       1