rpy2 子集 RS4 对象 (expressionSet)

rpy2 to subset RS4 object (expressionSet)

我正在使用 rpy2 构建一个 ExpressionSet class,遵循相关的 tutorial 作为指南。我对 Eset 对象所做的最常见的事情之一是子集化,这在原生 R 中就像

eset2<-eset1[1:10,1:5] # first ten features, first five samples

其中 returns 一个新的 ExpressionSet 对象,其中包含表达式和表型数据的子集,使用给定的索引。 Rpy2 的 RS4 对象似乎不允许直接子集化,或者具有 rx/rx2 不同于例如RS3 向量。我尝试添加了一个“_subset”函数(下方),成功率约为 50%,该函数分别创建这两个数据集的子集并将它们分配回 Eset,但是我是否缺少更直接的方法?

from rpy2 import (robjects, rinterface)
from rpy2.robjects import (r, pandas2ri, Formula)
from rpy2.robjects.packages import (importr,)
from rpy2.robjects.methods import (RS4,)

class ExpressionSet(RS4):
    # funcs to get the attributes
    def _assay_get(self): # returns an environment, use ['exprs'] key to access
        return self.slots["assayData"]
    def _pdata_get(self): # returns an RS4 object, use .slots("data") to access
        return self.slots["phenoData"]
    def _feats_get(self): # returns an RS4 object, use .slots("data") to access
        return self.slots["featureData"]
    def _annot_get(self): # slots returns a tuple, just pick 1st (only) element
        return self.slots["annotation"][0]
    def _class_get(self): # slots returns a tuple, just pick 1st (only) element
        return self.slots["class"][0]

    # funcs to set the attributes
    def _assay_set(self, value):
        self.slots["assayData"] = value
    def _pdata_set(self, value):
        self.slots["phenoData"] = value
    def _feats_set(self,value):
        self.slots["featureData"] = value
    def _annot_set(self, value):
        self.slots["annotation"] = value
    def _class_set(self, value):
        self.slots["class"]  = value

    # funcs to work with the above to get/set the data
    def _exprs_get(self):
        return self.assay["exprs"]
    def _pheno_get(self):
        pdata = self.pData
        return pdata.slots["data"]

    def _exprs_set(self, value):
        assay = self.assay
        assay["exprs"] = value
    def _pheno_set(self, value):
        pdata = self.pData
        pdata.slots["data"] = value


    assay = property(_assay_get, _assay_set, None, "R attribute 'assayData'")
    pData = property(_pdata_get, _pdata_set, None, "R attribute 'phenoData'")    
    fData = property(_feats_get, _feats_set, None, "R attribute 'featureData'")
    annot = property(_annot_get, _annot_set, None, "R attribute 'annotation'")    
    exprs = property(_exprs_get, _exprs_set, None, "R attribute 'exprs'")
    pheno = property(_pheno_get, _pheno_set, None, "R attribute 'pheno")


    def _subset(self, features=None, samples=None):

        features = features if features else self.exprs.rownames
        samples  = samples if samples else self.exprs.colnames

        fx = robjects.BoolVector([f in features for f in self.exprs.rownames])
        sx = robjects.BoolVector([s in samples for s in self.exprs.colnames])

        self.pheno = self.pheno.rx(sx, self.pheno.colnames)        
        self.exprs = self.exprs.rx(fx,sx) # can't assign back to exprs this way 

做的时候

eset2<-eset1[1:10,1:5]

在 R 中,提取带有签名 (ExpressionSet") 的 R S4 方法“[”,并使用您提供的参数值运行。

文档建议使用 getmethod(参见 http://rpy2.readthedocs.org/en/version_2.7.x/generated_rst/s4class.html#methods)来促进获取相关 S4 方法的任务,但在编写文档后其行为似乎发生了变化(分辨率通过继承进行的调度不再完成)。

下面应该这样做:

from rpy2.robjects.packages import importr
methods = importr('methods')
r_subset_expressionset = methods.selectMethod("[", "ExpressionSet")

感谢@lgautier 的回答,这是我上面的代码片段,经过修改以允许对 RS4 对象进行子集化:

from multipledispatch import dispatch 

@dispatch(RS4)
def eset_subset(eset, features=None, samples=None):
    """
    subset an RS4 eset object
    """
    features = features if features else eset.exprs.rownames
    samples  = samples if samples else eset.exprs.colnames

    fx = robjects.BoolVector([f in features for f in eset.exprs.rownames])
    sx = robjects.BoolVector([s in samples for s in eset.exprs.colnames])  

    esub=methods.selectMethod("[", signature="ExpressionSet")(eset, fx,sx)
    return esub