使用 Bioconductor 的 BED 文件中的平均间隔长度
Average interval lengths in BED files with Bioconductor
我正在尝试执行一个非常简单的操作,但我还没有弄明白。我正在尝试获取我在 R 中导入的特定 BED 文件中所有间隔的平均间隔长度。此 BED 文件不包含重叠间隔。这是文件的样子:
GRanges object with 12917252 ranges and 3 metadata columns:
seqnames ranges strand | name score thick
<Rle> <IRanges> <Rle> | <character> <numeric> <IRanges>
[1] chr1 [10524, 10551] + | 1:10524-10551 122 [10538, 10538]
[2] chr1 [11236, 11258] + | 1:11236-11258 43 [11247, 11247]
[3] chr1 [11456, 11474] + | 1:11456-11474 47 [11465, 11465]
[4] chr1 [12054, 12099] + | 1:12054-12099 206 [12077, 12077]
[5] chr1 [12276, 12330] + | 1:12276-12330 249 [12303, 12303]
任何操作都适用于 ranges
列
使用 IRanges::width():
library(GenomicRanges) #loads IRanges, too.
#dummy data
gr = GRanges("chr1",IRanges(
start = c(11236, 11456, 12054, 12276),
end = c(11258, 11474, 12099, 12330)))
#get mean of ranges' "lengths" using width(), then take the mean().
mean(width(gr))
# [1] 35.75
?width
width(x): The number of integer values in each range. This is a vector
of non-negative integers of the same length as x.
我正在尝试执行一个非常简单的操作,但我还没有弄明白。我正在尝试获取我在 R 中导入的特定 BED 文件中所有间隔的平均间隔长度。此 BED 文件不包含重叠间隔。这是文件的样子:
GRanges object with 12917252 ranges and 3 metadata columns:
seqnames ranges strand | name score thick
<Rle> <IRanges> <Rle> | <character> <numeric> <IRanges>
[1] chr1 [10524, 10551] + | 1:10524-10551 122 [10538, 10538]
[2] chr1 [11236, 11258] + | 1:11236-11258 43 [11247, 11247]
[3] chr1 [11456, 11474] + | 1:11456-11474 47 [11465, 11465]
[4] chr1 [12054, 12099] + | 1:12054-12099 206 [12077, 12077]
[5] chr1 [12276, 12330] + | 1:12276-12330 249 [12303, 12303]
任何操作都适用于 ranges
列
使用 IRanges::width():
library(GenomicRanges) #loads IRanges, too.
#dummy data
gr = GRanges("chr1",IRanges(
start = c(11236, 11456, 12054, 12276),
end = c(11258, 11474, 12099, 12330)))
#get mean of ranges' "lengths" using width(), then take the mean().
mean(width(gr))
# [1] 35.75
?width
width(x): The number of integer values in each range. This is a vector of non-negative integers of the same length as x.