R - 有条件地汇总来自所有可能的列对的数据
R - Conditionally summarize data from all possible column pairs
我有一个 table,其中列出了每个有机体在几种不同条件下的 presence/absence。我的目标是生成一个新的 table,列出每对生物的所有可能维恩图的值。
...换句话说:对于每对生物,我想要一个 table 总结:
- 它们共有的条件数(organism1 == 1 & organism2 == 1)
- 有机体1独有的条件数(有机体1 == 1 & 有机体2 == 0)
- 有机体2独有的条件数(有机体1 == 0 & 有机体2 == 1)
我目前的方法如下,虽然我真正的Presence/Absence table要大得多,所以如果有更简洁的方法来自动化这个就太好了! (即 for 循环?!)
示例Presence/Absence Table(行=条件,列=有机体):
paData <- data.table(
Pyro = c(1,1,0,0,1,0,1),
Anth = c(0,1,0,1,0,1,1),
Tric = c(1,1,0,1,0,1,1))
paData
Pyro Anth Tric
1: 1 0 1
2: 1 1 1
3: 0 0 0
4: 0 1 1
5: 1 0 0
6: 0 1 1
7: 1 1 1
对于每对生物体(列)指定在每个条件(行)中是否存在一个、两个或两个生物体:
paData$PyroAnth <- ifelse(paData[,1] ==1 &
paData[,2] ==0, "V1alone",
ifelse(paData[,1] ==1 &
paData[,2] ==1, "Overlap",
ifelse(paData[,1] ==0 &
paData[,2] ==1, "V2alone",
"NA")))
paData$PyroTric <- ifelse(paData[,1] ==1 &
paData[,3] ==0, "V1alone",
ifelse(paData[,1] ==1 &
paData[,3] ==1, "Overlap",
ifelse(paData[,1] ==0 &
paData[,3] ==1, "V2alone",
"NA")))
paData$AnthTric <- ifelse(paData[,2] ==1 &
paData[,3] ==0, "V1alone",
ifelse(paData[,2] ==1 &
paData[,3] ==1, "Overlap",
ifelse(paData[,2] ==0 &
paData[,3] ==1, "V2alone",
"NA")))
paData
Pyro Anth Tric PyroAnth PyroTric AnthTric
1: 1 0 1 V1alone Overlap V2alone
2: 1 1 1 Overlap Overlap Overlap
3: 0 0 0 NA NA NA
4: 0 1 1 V2alone V2alone Overlap
5: 1 0 0 V1alone V1alone NA
6: 0 1 1 V2alone V2alone Overlap
7: 1 1 1 Overlap Overlap Overlap
创建所需的输出table -- 计算每对生物的条件(行)数;每个生物体要么“单独”存在,要么与第二种生物体“重叠”存在。
DesiredOutput <- data.frame(rbind(list(names(paData[,1]), names(paData[,2]),
nrow(paData[PyroAnth == "V1alone"]),
nrow(paData[PyroAnth == "Overlap"]),
nrow(paData[PyroAnth == "V2alone"])),
list(names(paData[,1]), names(paData[,3]),
nrow(paData[PyroTri == "V1alone"]),
nrow(paData[PyroTri == "Overlap"]),
nrow(paData[PyroTri == "V2alone"])),
list(names(paData[,2]), names(paData[,3]),
nrow(paData[AnthTri == "V1alone"]),
nrow(paData[AnthTri == "Overlap"]),
nrow(paData[AnthTri == "V2alone"]))))
colnames(DesiredOutput) <- c("V1", "V2", "V1alone", "Overlap", "V2alone")
DesiredOutput
V1 V2 V1alone Overlap V2alone
1 Pyro Anth 2 2 2
2 Pyro Tric 1 3 2
3 Anth Tric 0 4 1
如何实现自动化以有效地为数十种生物体和数百种条件创建我的“DesiredOutput”table?
您可以试试这个方法:
f <- function(v1,v2) list(sum(v1 & !v2),sum(v1 & v2),sum(!v1 & v2))
result = data.table(t(combn(names(paData),2)))
result[, c("v1alone", "overlap", "v2alone"):=f(paData[[V1]], paData[[V2]]), by=1:nrow(result)]
输出:
V1 V2 v1alone overlap v2alone
1: Pyro Anth 2 2 2
2: Pyro Tric 1 3 2
3: Anth Tric 0 4 1
我有一个 table,其中列出了每个有机体在几种不同条件下的 presence/absence。我的目标是生成一个新的 table,列出每对生物的所有可能维恩图的值。
...换句话说:对于每对生物,我想要一个 table 总结:
- 它们共有的条件数(organism1 == 1 & organism2 == 1)
- 有机体1独有的条件数(有机体1 == 1 & 有机体2 == 0)
- 有机体2独有的条件数(有机体1 == 0 & 有机体2 == 1)
我目前的方法如下,虽然我真正的Presence/Absence table要大得多,所以如果有更简洁的方法来自动化这个就太好了! (即 for 循环?!)
示例Presence/Absence Table(行=条件,列=有机体):
paData <- data.table(
Pyro = c(1,1,0,0,1,0,1),
Anth = c(0,1,0,1,0,1,1),
Tric = c(1,1,0,1,0,1,1))
paData
Pyro Anth Tric
1: 1 0 1
2: 1 1 1
3: 0 0 0
4: 0 1 1
5: 1 0 0
6: 0 1 1
7: 1 1 1
对于每对生物体(列)指定在每个条件(行)中是否存在一个、两个或两个生物体:
paData$PyroAnth <- ifelse(paData[,1] ==1 &
paData[,2] ==0, "V1alone",
ifelse(paData[,1] ==1 &
paData[,2] ==1, "Overlap",
ifelse(paData[,1] ==0 &
paData[,2] ==1, "V2alone",
"NA")))
paData$PyroTric <- ifelse(paData[,1] ==1 &
paData[,3] ==0, "V1alone",
ifelse(paData[,1] ==1 &
paData[,3] ==1, "Overlap",
ifelse(paData[,1] ==0 &
paData[,3] ==1, "V2alone",
"NA")))
paData$AnthTric <- ifelse(paData[,2] ==1 &
paData[,3] ==0, "V1alone",
ifelse(paData[,2] ==1 &
paData[,3] ==1, "Overlap",
ifelse(paData[,2] ==0 &
paData[,3] ==1, "V2alone",
"NA")))
paData
Pyro Anth Tric PyroAnth PyroTric AnthTric
1: 1 0 1 V1alone Overlap V2alone
2: 1 1 1 Overlap Overlap Overlap
3: 0 0 0 NA NA NA
4: 0 1 1 V2alone V2alone Overlap
5: 1 0 0 V1alone V1alone NA
6: 0 1 1 V2alone V2alone Overlap
7: 1 1 1 Overlap Overlap Overlap
创建所需的输出table -- 计算每对生物的条件(行)数;每个生物体要么“单独”存在,要么与第二种生物体“重叠”存在。
DesiredOutput <- data.frame(rbind(list(names(paData[,1]), names(paData[,2]),
nrow(paData[PyroAnth == "V1alone"]),
nrow(paData[PyroAnth == "Overlap"]),
nrow(paData[PyroAnth == "V2alone"])),
list(names(paData[,1]), names(paData[,3]),
nrow(paData[PyroTri == "V1alone"]),
nrow(paData[PyroTri == "Overlap"]),
nrow(paData[PyroTri == "V2alone"])),
list(names(paData[,2]), names(paData[,3]),
nrow(paData[AnthTri == "V1alone"]),
nrow(paData[AnthTri == "Overlap"]),
nrow(paData[AnthTri == "V2alone"]))))
colnames(DesiredOutput) <- c("V1", "V2", "V1alone", "Overlap", "V2alone")
DesiredOutput
V1 V2 V1alone Overlap V2alone
1 Pyro Anth 2 2 2
2 Pyro Tric 1 3 2
3 Anth Tric 0 4 1
如何实现自动化以有效地为数十种生物体和数百种条件创建我的“DesiredOutput”table?
您可以试试这个方法:
f <- function(v1,v2) list(sum(v1 & !v2),sum(v1 & v2),sum(!v1 & v2))
result = data.table(t(combn(names(paData),2)))
result[, c("v1alone", "overlap", "v2alone"):=f(paData[[V1]], paData[[V2]]), by=1:nrow(result)]
输出:
V1 V2 v1alone overlap v2alone
1: Pyro Anth 2 2 2
2: Pyro Tric 1 3 2
3: Anth Tric 0 4 1