Pyranges 中重叠和相交方法的区别

Difference between overlap and intersect methods in Pyranges

Pyranges class 来自类似名称的包有两种功能略有不同的方法: intersectoverlap。 相交方法描述与重叠方法描述非常相似:Return overlapping subintervals. vs Return overlapping intervals. 我看不出这两者之间的区别(是的,我注意到 sub 前缀)。

overlap 是否旨在揭示至少在一个位置重叠的完整区间?

设置:

>>> import pyranges as pr
>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
...                    "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr
+--------------+-----------+-----------+------------+
|   Chromosome |     Start |       End | ID         |
|   (category) |   (int32) |   (int32) | (object)   |
|--------------+-----------+-----------+------------|
|         chr1 |         1 |         3 | a          |
|         chr1 |         4 |         9 | b          |
|         chr1 |        10 |        11 | c          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int32) |   (int32) |
|--------------+-----------+-----------|
| chr1         |         2 |         3 |
| chr1         |         2 |         9 |
| chr1         |         9 |        10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

使用 overlap,您可以找回 self 中与 other 中重叠的间隔。如果一个区间重叠不止一次,它仍然只返回一次(默认):

>>> gr.overlap(gr2)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int32) |   (int32) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         3 | a          |
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

intersect 返回的区间是 self 和 other 中重叠区间的交集。默认返回所有重叠:

>>> gr.intersect(gr2)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int32) |   (int32) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         2 |         3 | a          |
| chr1         |         2 |         3 | a          |
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

有关详细信息,请参阅文档: