使用 R data.table foverlaps() 或 IRanges 按预期计算重叠
Counting overlaps as expected with R data.table foverlaps() or IRanges
我在计算间隔重叠时遇到了困难,正如我所期望的那样。这是一个 R data.table,间隔由开始到结束定义:
> library(data.table)
> dt1 = data.table(start=c(1, 5, 3), end=c(10, 15, 8))
> print(dt1)
start end
1: 1 10
2: 5 15
3: 3 8
以下是我将如何考虑这些间隔的重叠,从 0 到 20:
[0, 1]: 0 (there are no intervals here)
[1, 3]: 1 (there is only one interval here, from [1, 10])
[3, 5]: 2 (two intervals here, both [1, 10] and [3, 8])
[5, 8]: 3
[8, 10]: 1
[10, 15]: 1
[15, 20]: 0
所以,我想用算法输出这个。类似于:
start end overlaps
1: 0 1 0
2: 1 3 1
3: 3 5 2
4: 5 8 3
5: 8 10 2
6: 10 15 1
7: 15 20 0
但是,我无法找到如何使用 R data.table 中的 foverlaps()
或 IRanges
的各种函数来执行此操作。
> setkey(dt1, start, end)
> foverlaps(dt1, dt1, type="any")
start end i.start i.end
1: 1 10 1 10
2: 3 8 1 10
3: 5 15 1 10
4: 1 10 3 8
5: 3 8 3 8
6: 5 15 3 8
7: 1 10 5 15
8: 3 8 5 15
9: 5 15 5 15
> foverlaps(dt1, dt1, type="within")
start end i.start i.end
1: 1 10 1 10
2: 1 10 3 8
3: 3 8 3 8
4: 5 15 5 15
这些似乎都与计算某个时间间隔内的重叠无关。
查看 IRanges
也没有给出预期的重叠间隔计数:
> library(IRanges)
> range1
IRanges object with 3 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 1 10 10
[2] 3 8 6
[3] 5 15 11
> countOverlaps(range1, range1)
[1] 3 3 3
> countOverlaps(range1, range1, type="within")
[1] 1 2 1
如何计算重叠间隔?
> # Where do the 0 and the 20 come from?
> points <- c(0, sort(c(dt1$start, dt1$end)), 20)
> x <- do.call(IRanges,
+ transpose(Map(c, start=head(points, -1), end=tail(points, -1))))
> x
IRanges object with 7 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 0 1 2
[2] 1 3 3
[3] 3 5 3
[4] 5 8 4
[5] 8 10 3
[6] 10 15 6
[7] 15 20 6
> y <- do.call(IRanges, dt1)
> y
IRanges object with 3 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 1 10 10
[2] 3 8 6
[3] 5 15 11
> countOverlaps(x, y, type="within")
[1] 0 1 2 3 2 1 0
第 5 个结果略有不同,但确实有 2 个重叠,因为 [8, 10] 与 [1, 10] 和 [5, 15] 重叠。
我在计算间隔重叠时遇到了困难,正如我所期望的那样。这是一个 R data.table,间隔由开始到结束定义:
> library(data.table)
> dt1 = data.table(start=c(1, 5, 3), end=c(10, 15, 8))
> print(dt1)
start end
1: 1 10
2: 5 15
3: 3 8
以下是我将如何考虑这些间隔的重叠,从 0 到 20:
[0, 1]: 0 (there are no intervals here)
[1, 3]: 1 (there is only one interval here, from [1, 10])
[3, 5]: 2 (two intervals here, both [1, 10] and [3, 8])
[5, 8]: 3
[8, 10]: 1
[10, 15]: 1
[15, 20]: 0
所以,我想用算法输出这个。类似于:
start end overlaps
1: 0 1 0
2: 1 3 1
3: 3 5 2
4: 5 8 3
5: 8 10 2
6: 10 15 1
7: 15 20 0
但是,我无法找到如何使用 R data.table 中的 foverlaps()
或 IRanges
的各种函数来执行此操作。
> setkey(dt1, start, end)
> foverlaps(dt1, dt1, type="any")
start end i.start i.end
1: 1 10 1 10
2: 3 8 1 10
3: 5 15 1 10
4: 1 10 3 8
5: 3 8 3 8
6: 5 15 3 8
7: 1 10 5 15
8: 3 8 5 15
9: 5 15 5 15
> foverlaps(dt1, dt1, type="within")
start end i.start i.end
1: 1 10 1 10
2: 1 10 3 8
3: 3 8 3 8
4: 5 15 5 15
这些似乎都与计算某个时间间隔内的重叠无关。
查看 IRanges
也没有给出预期的重叠间隔计数:
> library(IRanges)
> range1
IRanges object with 3 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 1 10 10
[2] 3 8 6
[3] 5 15 11
> countOverlaps(range1, range1)
[1] 3 3 3
> countOverlaps(range1, range1, type="within")
[1] 1 2 1
如何计算重叠间隔?
> # Where do the 0 and the 20 come from?
> points <- c(0, sort(c(dt1$start, dt1$end)), 20)
> x <- do.call(IRanges,
+ transpose(Map(c, start=head(points, -1), end=tail(points, -1))))
> x
IRanges object with 7 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 0 1 2
[2] 1 3 3
[3] 3 5 3
[4] 5 8 4
[5] 8 10 3
[6] 10 15 6
[7] 15 20 6
> y <- do.call(IRanges, dt1)
> y
IRanges object with 3 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 1 10 10
[2] 3 8 6
[3] 5 15 11
> countOverlaps(x, y, type="within")
[1] 0 1 2 3 2 1 0
第 5 个结果略有不同,但确实有 2 个重叠,因为 [8, 10] 与 [1, 10] 和 [5, 15] 重叠。