将 R 函数输出读取为列

Read R function output as columns

我正在想办法解决我昨天问的这个问题:

我的目标是从 python.

中检查 R 中是否安装了某些软件包

根据 Dirk Eddelbuettel given in a comment on 的建议,我正在使用 R 中的 installed.packages() 函数来列出所有可用的包。

这是我目前得到的:

from rpy2.rinterface import RRuntimeError
from rpy2.robjects.packages import importr
utils = importr('utils')

def importr_tryhard(packname, contriburl):
    try:
        rpack = utils.installed_packages()
    except RRuntimeError:
        rpack = []
    return rpack

contriburl = 'http://cran.stat.ucla.edu/'
rpack = importr_tryhard(packname, contriburl)
print rpack

其中 returns 形式相当大的输出:

           Package      LibPath                         Version   
ks         "ks"         "/usr/local/lib/R/site-library" "1.8.13"  
misc3d     "misc3d"     "/usr/local/lib/R/site-library" "0.8-4"   
mvtnorm    "mvtnorm"    "/usr/local/lib/R/site-library" "0.9-9996"
rgl        "rgl"        "/usr/local/lib/R/site-library" "0.93.986"
base       "base"       "/usr/lib/R/library"            "3.0.1"   
boot       "boot"       "/usr/lib/R/library"            "1.3-9"   
class      "class"      "/usr/lib/R/library"            "7.3-9"   
cluster    "cluster"    "/usr/lib/R/library"            "1.14.4"  
codetools  "codetools"  "/usr/lib/R/library"            "0.2-8"   
compiler   "compiler"   "/usr/lib/R/library"            "3.0.1"   
datasets   "datasets"   "/usr/lib/R/library"            "3.0.1"   
foreign    "foreign"    "/usr/lib/R/library"            "0.8-49"  
graphics   "graphics"   "/usr/lib/R/library"            "3.0.1"   
grDevices  "grDevices"  "/usr/lib/R/library"            "3.0.1"   
grid       "grid"       "/usr/lib/R/library"            "3.0.1"   
KernSmooth "KernSmooth" "/usr/lib/R/library"            "2.23-10" 
lattice    "lattice"    "/usr/lib/R/library"            "0.20-23" 
MASS       "MASS"       "/usr/lib/R/library"            "7.3-29"  
Matrix     "Matrix"     "/usr/lib/R/library"            "1.0-14"  
methods    "methods"    "/usr/lib/R/library"            "3.0.1"   
mgcv       "mgcv"       "/usr/lib/R/library"            "1.7-26"  
nlme       "nlme"       "/usr/lib/R/library"            "3.1-111" 
nnet       "nnet"       "/usr/lib/R/library"            "7.3-7"   
parallel   "parallel"   "/usr/lib/R/library"            "3.0.1"   
rpart      "rpart"      "/usr/lib/R/library"            "4.1-3"   
spatial    "spatial"    "/usr/lib/R/library"            "7.3-6"   
splines    "splines"    "/usr/lib/R/library"            "3.0.1"   
stats      "stats"      "/usr/lib/R/library"            "3.0.1"   
stats4     "stats4"     "/usr/lib/R/library"            "3.0.1"   
survival   "survival"   "/usr/lib/R/library"            "2.37-4"  
tcltk      "tcltk"      "/usr/lib/R/library"            "3.0.1"   
tools      "tools"      "/usr/lib/R/library"            "3.0.1"   
utils      "utils"      "/usr/lib/R/library"            "3.0.1"   
           Priority     
ks         NA           
misc3d     NA           
mvtnorm    NA           
rgl        NA           
base       "base"       
boot       "recommended"
class      "recommended"
cluster    "recommended"
...

我只需要提取已安装软件包的名称,所以第一列或第二列对我来说就足够了。

我已经尝试使用 np.loadtxt()np.genfromtxt()with open(rpack) as csvfile:,但是 none 能够返回一个 list/array,其中列或行已正确分隔(实际上它们都因不同的错误而失败)。

我如何以列形式读取此输出,或者更确切地说,在 list/array 中提取已安装软件包的名称?

我以前没有使用过 r2py,但它看起来像是某种 r2py 对象,并且可能有一个选项可以只获取第一列。

不过您可以像解析文本文件一样愉快地解析它;当您调用 print XXX 时,它会获取对象的字符串表示形式。

尝试做这样的事情:

s = str(rpack)
packages = [line.split()[0] for line in s.split("\n")[1:]]

您应该同时尝试 strrepr 方法来获取字符串表示形式,有些人不会同时使用这两种方法,或者使用不同的方法。

尽管这感觉不是最简洁的方法,但您必须确保正确解析数据。尝试打印 dir(rpack) 并查看其中是否有任何听起来像是包含您想要的内容的属性。

一点挖掘、installed_packages 文档和快速浏览 R 教程表明您可以这样做:

print mpack[,"Package"]

rpack 在你的例子中是一个 rpy2.robjects.vectors.Matrix 对象。因此,您可以简单地使用 rpy2 class 方法 .rx() 来提取列:

mylist = list(rpack.rx(True, 1))

试一试。