Pyranges 中重叠和相交方法的区别
Difference between overlap and intersect methods in Pyranges
Pyranges class 来自类似名称的包有两种功能略有不同的方法:
intersect 和
overlap。
相交方法描述与重叠方法描述非常相似:Return overlapping subintervals.
vs Return overlapping intervals.
我看不出这两者之间的区别(是的,我注意到 sub
前缀)。
overlap
是否旨在揭示至少在一个位置重叠的完整区间?
设置:
>>> import pyranges as pr
>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
... "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr
+--------------+-----------+-----------+------------+
| Chromosome | Start | End | ID |
| (category) | (int32) | (int32) | (object) |
|--------------+-----------+-----------+------------|
| chr1 | 1 | 3 | a |
| chr1 | 4 | 9 | b |
| chr1 | 10 | 11 | c |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome | Start | End |
| (category) | (int32) | (int32) |
|--------------+-----------+-----------|
| chr1 | 2 | 3 |
| chr1 | 2 | 9 |
| chr1 | 9 | 10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
使用 overlap
,您可以找回 self 中与 other 中重叠的间隔。如果一个区间重叠不止一次,它仍然只返回一次(默认):
>>> gr.overlap(gr2)
+--------------+-----------+-----------+------------+
| Chromosome | Start | End | ID |
| (category) | (int32) | (int32) | (object) |
|--------------+-----------+-----------+------------|
| chr1 | 1 | 3 | a |
| chr1 | 4 | 9 | b |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
intersect
返回的区间是 self 和 other 中重叠区间的交集。默认返回所有重叠:
>>> gr.intersect(gr2)
+--------------+-----------+-----------+------------+
| Chromosome | Start | End | ID |
| (category) | (int32) | (int32) | (object) |
|--------------+-----------+-----------+------------|
| chr1 | 2 | 3 | a |
| chr1 | 2 | 3 | a |
| chr1 | 4 | 9 | b |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
有关详细信息,请参阅文档:
Pyranges class 来自类似名称的包有两种功能略有不同的方法:
intersect 和
overlap。
相交方法描述与重叠方法描述非常相似:Return overlapping subintervals.
vs Return overlapping intervals.
我看不出这两者之间的区别(是的,我注意到 sub
前缀)。
overlap
是否旨在揭示至少在一个位置重叠的完整区间?
设置:
>>> import pyranges as pr
>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
... "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr
+--------------+-----------+-----------+------------+
| Chromosome | Start | End | ID |
| (category) | (int32) | (int32) | (object) |
|--------------+-----------+-----------+------------|
| chr1 | 1 | 3 | a |
| chr1 | 4 | 9 | b |
| chr1 | 10 | 11 | c |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome | Start | End |
| (category) | (int32) | (int32) |
|--------------+-----------+-----------|
| chr1 | 2 | 3 |
| chr1 | 2 | 9 |
| chr1 | 9 | 10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
使用 overlap
,您可以找回 self 中与 other 中重叠的间隔。如果一个区间重叠不止一次,它仍然只返回一次(默认):
>>> gr.overlap(gr2)
+--------------+-----------+-----------+------------+
| Chromosome | Start | End | ID |
| (category) | (int32) | (int32) | (object) |
|--------------+-----------+-----------+------------|
| chr1 | 1 | 3 | a |
| chr1 | 4 | 9 | b |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
intersect
返回的区间是 self 和 other 中重叠区间的交集。默认返回所有重叠:
>>> gr.intersect(gr2)
+--------------+-----------+-----------+------------+
| Chromosome | Start | End | ID |
| (category) | (int32) | (int32) | (object) |
|--------------+-----------+-----------+------------|
| chr1 | 2 | 3 | a |
| chr1 | 2 | 3 | a |
| chr1 | 4 | 9 | b |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
有关详细信息,请参阅文档: