如何相交并向列添加分数?
how to intersect and add score to column?
我有两个数据集,我想找到它们之间的 overlap/intersect/ 个公共区域,如果有任何重叠,然后提取每个初始 table:
数据A:
chr start end
chr1 25 35
chr1 50 70
chr1 60 85
数据B:
chr start end score
chr1 10 15 24
chr1 55 75 14
chr1 76 82 10
输出 tables:
out put 1:公共区域的结果
chr start end
chr1 55 70
chr1 70 75
chr1 76 82
输出 2:从数据 A 中提取:
chr start end
chr1 50 70
chr1 60 85
输出 3:从数据 B 中提取:
chr start end score
chr1 55 75 14
chr1 76 82 10
我尝试过不同的方法,但我不知道哪种方法最好:
library(GenomicRanges)
enhancer = with(dataA, GRanges(chr, IRanges(start=start, end=end)))
H3K4me1= with(dataB, GRanges(chr, IRanges(start=start, end=end)))
方式一:
hits <- findOverlaps(dataA, dataB)
ranges(dataA)[queryHits(hits)] = ranges(dataB)[subjectHits(hits)]
dataA
dataB
方式 2:
over<- subsetByOverlaps(dataA, dataB)
方式 3:
inter = intersect(dataA, dataB)
方式 4:
groupA <- data.table(dataA)
setkey(groupA, chr, start, end)
groupB <- data.table(dataB)
setkey(groupB, chr, start, end)
over <- foverlaps(groupA, groupB, nomatch = 0)
over2 <- data.table(
chr = over$chr,
start = over[, ifelse(start > i.start, start, i.start)],
end = over[, ifelse(end < i.end, end, i.end)])
我不确定这是否是您想要的。您介意按照 here.
所述创建一个可重现的示例吗
library(dplyr)
DataA <- data.frame(chr = c("chr1", "chr1", "chr1"), start = c(25,50,60), end = c(35,70,85))
DataB <- data.frame(chr = c("chr1", "chr1", "chr1"), start = c(10,55,76), end = c(15,75,82), score = c(24,14,10))
luA <- Map(`:`, DataA$start, DataA$end)
luA <- data.frame(value = unlist(luA),
index = rep(seq_along(luA), lapply(luA, length)))
DataA[luA$index[match(DataB$start, luA$value)],]
DataB[luA$index[match(DataB$start, luA$value)],]
我有两个数据集,我想找到它们之间的 overlap/intersect/ 个公共区域,如果有任何重叠,然后提取每个初始 table:
数据A:
chr start end
chr1 25 35
chr1 50 70
chr1 60 85
数据B:
chr start end score
chr1 10 15 24
chr1 55 75 14
chr1 76 82 10
输出 tables:
out put 1:公共区域的结果
chr start end
chr1 55 70
chr1 70 75
chr1 76 82
输出 2:从数据 A 中提取:
chr start end
chr1 50 70
chr1 60 85
输出 3:从数据 B 中提取:
chr start end score
chr1 55 75 14
chr1 76 82 10
我尝试过不同的方法,但我不知道哪种方法最好:
library(GenomicRanges)
enhancer = with(dataA, GRanges(chr, IRanges(start=start, end=end)))
H3K4me1= with(dataB, GRanges(chr, IRanges(start=start, end=end)))
方式一:
hits <- findOverlaps(dataA, dataB)
ranges(dataA)[queryHits(hits)] = ranges(dataB)[subjectHits(hits)]
dataA
dataB
方式 2:
over<- subsetByOverlaps(dataA, dataB)
方式 3:
inter = intersect(dataA, dataB)
方式 4:
groupA <- data.table(dataA)
setkey(groupA, chr, start, end)
groupB <- data.table(dataB)
setkey(groupB, chr, start, end)
over <- foverlaps(groupA, groupB, nomatch = 0)
over2 <- data.table(
chr = over$chr,
start = over[, ifelse(start > i.start, start, i.start)],
end = over[, ifelse(end < i.end, end, i.end)])
我不确定这是否是您想要的。您介意按照 here.
所述创建一个可重现的示例吗library(dplyr)
DataA <- data.frame(chr = c("chr1", "chr1", "chr1"), start = c(25,50,60), end = c(35,70,85))
DataB <- data.frame(chr = c("chr1", "chr1", "chr1"), start = c(10,55,76), end = c(15,75,82), score = c(24,14,10))
luA <- Map(`:`, DataA$start, DataA$end)
luA <- data.frame(value = unlist(luA),
index = rep(seq_along(luA), lapply(luA, length)))
DataA[luA$index[match(DataB$start, luA$value)],]
DataB[luA$index[match(DataB$start, luA$value)],]