查找 2 个范围及其重叠区域长度之间的重叠?
Finding overlaps between 2 ranges and their overlapped region lengths?
我需要找到 2 组(gp1 和 gp2)之间相同染色体上重叠区域的长度。 (Whosebug 中的类似问题与我的目标不同,因为我想找到重叠区域而不是 TRUE/FALSE 答案)。
例如:
gp1:
chr start end id1
chr1 580 600 1
chr1 900 970 2
chr3 400 600 3
chr2 100 700 4
gp2:
chr start end id2
chr1 590 864 1
chr3 550 670 2
chr2 897 1987 3
我正在寻找一种方法来比较这两组并获得如下结果:
id1 id2 chr overlapped_length
1 1 chr1 10
3 2 chr3 50
应该为您指明正确的方向:
加载库
# install.packages("BiocManager")
# BiocManager::install("GenomicRanges")
library(GenomicRanges)
library(IRanges)
生成数据
gp1 <- read.table(text =
"
chr start end id1
chr1 580 600 1
chr1 900 970 2
chr3 400 600 3
chr2 100 700 4
", header = TRUE)
gp2 <- read.table(text =
"
chr start end id2
chr1 590 864 1
chr3 550 670 2
chr2 897 1987 3
", header = TRUE)
计算范围
gr1 <- GenomicRanges::makeGRangesFromDataFrame(
gp1,
seqnames.field = "chr",
start.field = "start",
end.field = "end"
)
gr2 <- GenomicRanges::makeGRangesFromDataFrame(
gp2,
seqnames.field = "chr",
start.field = "start",
end.field = "end"
)
计算重叠
hits <- findOverlaps(gr1, gr2)
p <- Pairs(gr1, gr2, hits = hits)
i <- pintersect(p)
结果
> as.data.frame(i)
seqnames start end width strand hit
1 chr1 590 600 11 * TRUE
2 chr3 550 600 51 * TRUE
我需要找到 2 组(gp1 和 gp2)之间相同染色体上重叠区域的长度。 (Whosebug 中的类似问题与我的目标不同,因为我想找到重叠区域而不是 TRUE/FALSE 答案)。
例如:
gp1:
chr start end id1
chr1 580 600 1
chr1 900 970 2
chr3 400 600 3
chr2 100 700 4
gp2:
chr start end id2
chr1 590 864 1
chr3 550 670 2
chr2 897 1987 3
我正在寻找一种方法来比较这两组并获得如下结果:
id1 id2 chr overlapped_length
1 1 chr1 10
3 2 chr3 50
应该为您指明正确的方向:
加载库
# install.packages("BiocManager")
# BiocManager::install("GenomicRanges")
library(GenomicRanges)
library(IRanges)
生成数据
gp1 <- read.table(text =
"
chr start end id1
chr1 580 600 1
chr1 900 970 2
chr3 400 600 3
chr2 100 700 4
", header = TRUE)
gp2 <- read.table(text =
"
chr start end id2
chr1 590 864 1
chr3 550 670 2
chr2 897 1987 3
", header = TRUE)
计算范围
gr1 <- GenomicRanges::makeGRangesFromDataFrame(
gp1,
seqnames.field = "chr",
start.field = "start",
end.field = "end"
)
gr2 <- GenomicRanges::makeGRangesFromDataFrame(
gp2,
seqnames.field = "chr",
start.field = "start",
end.field = "end"
)
计算重叠
hits <- findOverlaps(gr1, gr2)
p <- Pairs(gr1, gr2, hits = hits)
i <- pintersect(p)
结果
> as.data.frame(i)
seqnames start end width strand hit
1 chr1 590 600 11 * TRUE
2 chr3 550 600 51 * TRUE