Select 行表示两个位置之间的范围,以便仅包括至少包含另一个 table 的一个位置的间隔

Select rows that represent the range between two positions so as to include only intervals that contain at least one position of another table

仅当 INTERVALS (start/end) 包含(在)至少一个 "position"(开始)的 "map" 时,我才需要保存来自 "ref" 的行table:

仿照 "ref" table:

ref<-"chr start end
chr1 1 10 
chr1 20 30  
chr1 30 40 
chr1 40 50 
chr2 20 30 
chr2 40 50  
chr2 80 90"
ref<-read.table(text=ref,header=T)

仿照 "map" table:

map<-"chr start
chr1 1
chr1 3 
chr1 5
chr1 31
chr1 32
chr2 1
chr2 2
chr2 89"
map<-read.table(text=map,header=T)

我需要一个像这样的最终 table(只有 INTERVALS 至少包含 "map" table 值中的一个值):

final<-"chr start end
chr1 1 10 
chr1 30 40 
chr2 80 90"
final<-read.table(text=final,header=T)

请注意,我也考虑了染色体数目。并且,考虑的值是 "ref" 上的 "start" 和 "end" 值之间的间隔,而不仅仅是 "start" 和 "end" 值本身。

为了解决染色体的问题,我把 chr+start 和 chr+end 分别看作是 "tag" 和 tag1。

ref$tag <- paste0(ref$chr, "-", ref$start)
ref$tag1 <- paste0(ref$chr, "-", ref$end)
map$tag <- paste0(map$chr, "-", map$start)
ref[ref$start %in% map$start | ref$end %in% map$start, ]

更详细:

rows_to_keep <- ref$start %in% map$start | ref$end %in% map$start
rows_to_keep
# [1]  TRUE  TRUE FALSE  TRUE

ref[rows_to_keep, ]
#    chr start end
# 1 chr1     1   2
# 2 chr2     2  10
# 4 chr2     6  10

根据这个话题 “Finding overlapping ranges between two interval data” "In general, it's very appropriate to use the bioconductor package IRanges to deal with problems related to intervals" 所以,你在这里:

library("GenomicRanges")
library("data.table")


gr1 = with(ref, GRanges(Rle(factor(chr, 
                                       levels=c("chr1", "chr2"))), IRanges(start, end)))

gr2 = with(map, GRanges(Rle(factor(chr, 
                                        levels=c("chr1", "chr2"))), IRanges(start, start)))

olaps<-subsetByOverlaps(gr1, gr2)

olaps <- as.data.frame(olaps)
col_headings <- c('chr','start', 'end', 'width', 'strand')
names(olaps) <- col_headings

final <- subset(olaps, select = c("chr", "start", "end"))

    > final
       chr start end
    1 chr1     1  10
    2 chr1    30  40
    3 chr2    80  90