如何获取 findOverlapped 区域?
How to get findOverlapped region?
您好,我正在使用 GRanges 并使用 IRanges 的 findOverlaps 函数查找重叠。我得到了查询和主题重叠的命中,但我还想知道它们重叠的查询和主题的坐标,这样我就可以检索它的序列。
如何获取主题和查询重叠的坐标。我正在使用以下功能:
library(GenomicRanges)
library(regioneR) # toGRanges
fo <- findOverlaps(query = toGRanges(df1),subject = toGRanges(df2),type = "within")
df1 <- structure(list(df1c = c("chr2", "chr2", "chr2", "chr2"), df1c2 = c(2800,
3600, 3719, 3893), df1c3 = c(3270, 4152, 5092, 4547)), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(df2c = c("chr2", "chr2", "chr2", "chr2", "chr2L"
), df2c2 = c(263, 342, 424, 846, 1030), df2c3 = c(20091, 17222,
2612, 4265, 11575)), class = "data.frame", row.names = c(NA,
-5L))
The expected output should be like
chr CoDF1 CoDF2
1 100-200 90-210
1 150-280 100-285
CoDF1 = Coordinates of df1 file where its overlapped with df2 reads
CoDF2 = Coordinates of df1 file where its overlapped with df1 reads
你最好使用intersect()
:
> intersect(toGRanges(df1),toGRanges(df2))
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 2800-3270 *
[2] chr2 3600-5092 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
但请注意,您的 data.frames colnames 不正确以创建 GRanges 对象,它们应该是 seqnames/start/end
已编辑:
查看所有坐标的所有交点:
intersection = findOverlaps(query = toGRanges(df1), subject = toGRanges(df2), type = "any")
df = data.frame(df1[queryHits(intersection),], df2[subjectHits(intersection),])
df
seqnames start end seqnames.1 start.1 end.1
1 chr2 2800 3270 chr2 263 20091
1.1 chr2 2800 3270 chr2 342 17222
1.2 chr2 2800 3270 chr2 846 4265
2 chr2 3600 4152 chr2 263 20091
2.1 chr2 3600 4152 chr2 342 17222
2.2 chr2 3600 4152 chr2 846 4265
3 chr2 3719 5092 chr2 263 20091
3.1 chr2 3719 5092 chr2 342 17222
3.2 chr2 3719 5092 chr2 846 4265
4 chr2 3893 4547 chr2 263 20091
4.1 chr2 3893 4547 chr2 342 17222
4.2 chr2 3893 4547 chr2 846 4265
您好,我正在使用 GRanges 并使用 IRanges 的 findOverlaps 函数查找重叠。我得到了查询和主题重叠的命中,但我还想知道它们重叠的查询和主题的坐标,这样我就可以检索它的序列。
如何获取主题和查询重叠的坐标。我正在使用以下功能:
library(GenomicRanges)
library(regioneR) # toGRanges
fo <- findOverlaps(query = toGRanges(df1),subject = toGRanges(df2),type = "within")
df1 <- structure(list(df1c = c("chr2", "chr2", "chr2", "chr2"), df1c2 = c(2800,
3600, 3719, 3893), df1c3 = c(3270, 4152, 5092, 4547)), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(df2c = c("chr2", "chr2", "chr2", "chr2", "chr2L"
), df2c2 = c(263, 342, 424, 846, 1030), df2c3 = c(20091, 17222,
2612, 4265, 11575)), class = "data.frame", row.names = c(NA,
-5L))
The expected output should be like
chr CoDF1 CoDF2
1 100-200 90-210
1 150-280 100-285
CoDF1 = Coordinates of df1 file where its overlapped with df2 reads
CoDF2 = Coordinates of df1 file where its overlapped with df1 reads
你最好使用intersect()
:
> intersect(toGRanges(df1),toGRanges(df2))
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 2800-3270 *
[2] chr2 3600-5092 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
但请注意,您的 data.frames colnames 不正确以创建 GRanges 对象,它们应该是 seqnames/start/end
已编辑:
查看所有坐标的所有交点:
intersection = findOverlaps(query = toGRanges(df1), subject = toGRanges(df2), type = "any")
df = data.frame(df1[queryHits(intersection),], df2[subjectHits(intersection),])
df
seqnames start end seqnames.1 start.1 end.1
1 chr2 2800 3270 chr2 263 20091
1.1 chr2 2800 3270 chr2 342 17222
1.2 chr2 2800 3270 chr2 846 4265
2 chr2 3600 4152 chr2 263 20091
2.1 chr2 3600 4152 chr2 342 17222
2.2 chr2 3600 4152 chr2 846 4265
3 chr2 3719 5092 chr2 263 20091
3.1 chr2 3719 5092 chr2 342 17222
3.2 chr2 3719 5092 chr2 846 4265
4 chr2 3893 4547 chr2 263 20091
4.1 chr2 3893 4547 chr2 342 17222
4.2 chr2 3893 4547 chr2 846 4265