获取 S4 对象 R 中的最终条目

Question

我知道一个df，我可以轻松做到：

df[-1,]

但这似乎不适用于 S4 对象（我正在处理具体的 granges 对象，但这无关紧要）。是否有某种 -1 等价物？

解决方法是：

S4[[2]][length(S4)]

示例：

gr <- GRanges(
seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10))

我想select“插槽”(?) b-j.

如果是df我会做：

gr[2:-1,]

Answer 1

要了解如何对 GRanges 对象进行操作，您应该参考 ?GRanges 中描述的方法。打印 gr 时看到的输出是由 show 方法生成的：

show(gr)
## GRanges object with 10 ranges and 2 metadata columns:
##     seqnames    ranges strand |     score        GC
##        <Rle> <IRanges>  <Rle> | <integer> <numeric>
##   a     chr1   101-111      - |         1  1.000000
##   b     chr2   102-112      + |         2  0.888889
##   c     chr2   103-113      + |         3  0.777778
##   d     chr2   104-114      * |         4  0.666667
##   e     chr1   105-115      * |         5  0.555556
##   f     chr1   106-116      + |         6  0.444444
##   g     chr3   107-117      + |         7  0.333333
##   h     chr3   108-118      + |         8  0.222222
##   i     chr3   109-119      - |         9  0.111111
##   j     chr3   110-120      - |        10  0.000000

输出给人的印象是 gr 是一个数据框，但它不是：你看到的是从 gr 的槽值（属性）中提取出来的，并以矩形显示您的方便。

names(gr)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

slotNames(gr)
## [1] "seqnames"        "ranges"          "strand"          "seqinfo"        
## [5] "elementMetadata" "elementType"     "metadata"       

gr@seqnames
## factor-Rle of length 10 with 4 runs
##   Lengths:    1    3    2    4
##   Values : chr1 chr2 chr1 chr3
## Levels(3): chr1 chr2 chr3

子集 gr 有几种方法。不要指望它们的行为与数据帧的相应方法完全一样。要获得第二个 GRanges 对象来描述除第一个序列以外的所有序列（在本例中为 a），您可以执行 gr[-1L] 或 gr[-1L, ]:

gr[-1L]
## GRanges object with 9 ranges and 2 metadata columns:
##     seqnames    ranges strand |     score        GC
##        <Rle> <IRanges>  <Rle> | <integer> <numeric>
##   b     chr2   102-112      + |         2  0.888889
##   c     chr2   103-113      + |         3  0.777778
##   d     chr2   104-114      * |         4  0.666667
##   e     chr1   105-115      * |         5  0.555556
##   f     chr1   106-116      + |         6  0.444444
##   g     chr3   107-117      + |         7  0.333333
##   h     chr3   108-118      + |         8  0.222222
##   i     chr3   109-119      - |         9  0.111111
##   j     chr3   110-120      - |        10  0.000000
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

identical(gr[-1L], gr[-1L, ])
## [1] TRUE

如果序列被分配了唯一的名称，那么您也可以按名称进行子集化，例如gr[names(gr)[-1L]]或gr[names(gr)[-1L], ].

Run-length 竖线左侧列的编码存储在 so-named 槽中，并使用 so-named 方法提取：

identical(gr@seqnames, seqnames(gr))
## [1] TRUE

竖线右侧的列称为“元数据”。它们一起存储在插槽 elementMetadata 中，您应该使用方法 mcols:

提取它们

mcols(gr)
## DataFrame with 10 rows and 2 columns
##       score        GC
##   <integer> <numeric>
## a         1  1.000000
## b         2  0.888889
## c         3  0.777778
## d         4  0.666667
## e         5  0.555556
## f         6  0.444444
## g         7  0.333333
## h         8  0.222222
## i         9  0.111111
## j        10  0.000000

元数据存储在 DataFrame 对象中。你会发现，关于子集，DataFrame 比 GRanges 更忠实于 data.frame 语义。 ?DataFrame 解释差异。

mcols(gr)$score
## [1]  1  2  3  4  5  6  7  8  9 10

获取 S4 对象 R 中的最终条目

Get the final entry in a S4 object R

indexing

r

dataframe

s4