rpy2/dataframe 访问中等效的命名列表
Named list equivalent in rpy2/dataframe access
我正在尝试以两种不同的方式从 rpy2 复制 R 中 MNP 包中的示例。首先,我只是将 robjects.r
与一个完全复制并粘贴 R 代码的字符串一起使用:
import rpy2.robjects as robjects
import rpy2.robjects.numpy2ri
import rpy2.robjects.pandas2ri
import rpy2.robjects.packages as rpackages
robjects.pandas2ri.activate()
mnp = rpackages.importr('MNP')
base = rpackages.importr('base')
r = robjects.r
r.data('detergent')
rcmd = '''\
mnp(choice ~ 1, choiceX = list(Surf=SurfPrice, Tide=TidePrice,
Wisk=WiskPrice, EraPlus=EraPlusPrice,
Solo=SoloPrice, All=AllPrice),
cXnames = "price", data = detergent, n.draws = 500, burnin = 100,
thin = 3, verbose = TRUE)'''
res = r(rcmd)
这工作正常并重现了我可以直接在 R 中做的事情。我还想尝试 运行 使用 python 可访问对象的这段代码,从数据帧传递数据:
import rpy2.rlike.container as rlc
df = robjects.pandas2ri.ri2py(r['detergent'])
choiceX = rlc.TaggedList(['SurfPrice', 'TidePrice', 'WiskPrice', 'EraPlusPrice', 'SoloPrice', 'AllPrice'],
tags=('Surf', 'Tide', 'Wisk', 'EraPlus', 'Solo', 'All'))
res = mnp.mnp('choice ~ 1',
choiceX=['SurfPrice', 'TidePrice', 'WiskPrice', 'EraPlusPrice', 'SoloPrice', 'AllPrice'],
cXnames='price',
data=df, n_draws=500, burnin=100,
thin=3, verbose=True)
失败并出现错误:
Error in xmatrix.mnp(formula, data = eval.parent(data), choiceX = call$choiceX, :
Error: Invalid input for `choiceX.'
You must specify the choice-specific varaibles at least for all non-base categories.
在另一个 SO response 中建议用 rpy2 TaggedList 替换 R 命名列表。如果我删除 MNP 的 choiceX
和 cXnames
参数(它们是可选的),代码会运行,因此看起来 pandas 数据帧正在正确传递。
我不确定 TaggedList 进入 R 后是否没有被正确解释为命名列表,或者 MNP 是否存在一些问题没有将 choiceX
的内容与 pandas 数据框。
有人知道这里会发生什么吗?
更新
根据@lgautier 的建议,我将代码修改为:
choiceX = rlc.TaggedList([base.as_symbol('SurfPrice'), base.as_symbol('TidePrice'),
base.as_symbol('WiskPrice'), base.as_symbol('EraPlusPrice'),
base.as_symbol('SoloPrice'), base.as_symbol('AllPrice')],
tags=('Surf', 'Tide', 'Wisk', 'EraPlus', 'Solo', 'All'))
res = mnp.mnp(robjects.Formula('choice ~ 1'),
choiceX=choiceX,
cXnames='price',
data=df, n_draws=500, burnin=100,
thin=3, verbose=True)
但是,我收到了与之前发布的相同的错误。
更新 2
按照@lgautier 建议的解决方法,以下代码:
choiceX = rlc.TaggedList([base.as_symbol('SurfPrice'),
base.as_symbol('TidePrice'),
base.as_symbol('WiskPrice'),
base.as_symbol('EraPlusPrice'),
base.as_symbol('SoloPrice'),
base.as_symbol('AllPrice')],
tags=('Surf', 'Tide', 'Wisk',
'EraPlus', 'Solo', 'All'))
choiceX = robjects.conversion.py2ro(choiceX)
# add the names
choiceX.names = robjects.vectors.StrVector(('Surf', 'Tide',
'Wisk', 'EraPlus',
'Solo', 'All'))
res = mnp.mnp(robjects.Formula('choice ~ 1'),
choiceX=choiceX,
cXnames='price',
data=df, n_draws=500, burnin=100,
thin=3, verbose=True)
仍然产生错误(尽管是不同的错误):
Error in as.vector(x, mode) :
cannot coerce type 'symbol' to vector of type 'any'
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-21-7de5ad805801> in <module>()
3 cXnames='price',
4 data=df, n_draws=500, burnin=100,
----> 5 thin=3, verbose=True)
/Users/lev/anaconda/envs/rmnptest/lib/python2.7/site-packages/rpy2-2.5.6-py2.7-macosx-10.5-x86_64.egg/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
168 v = kwargs.pop(k)
169 kwargs[r_k] = v
--> 170 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
171
172 pattern_link = re.compile(r'\link\{(.+?)\}')
/Users/lev/anaconda/envs/rmnptest/lib/python2.7/site-packages/rpy2-2.5.6-py2.7-macosx-10.5-x86_64.egg/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
98 for k, v in kwargs.items():
99 new_kwargs[k] = conversion.py2ri(v)
--> 100 res = super(Function, self).__call__(*new_args, **new_kwargs)
101 res = conversion.ri2ro(res)
102 return res
RRuntimeError: Error in as.vector(x, mode) :
cannot coerce type 'symbol' to vector of type 'any'
Python 代码与您的 R 不对应。您在发帖后就明白了这一点,请在下面提供详细信息。总结是 R 符号和 Python 字符串是不等价的(尽管 R 在某些地方允许这两种符号使自己的用户感到困惑——例如,library("MNP")
和 library(MNP)
都可以工作)。
这与这个问题没什么不同:
...除了 choiceX
将是一个未计算的 R 表达式,而不仅仅是一个符号。
R代码是:
data(detergent)
mnp(choice ~ 1,
# ^- this is a "formula", which is an expression in R
choiceX = list(Surf=SurfPrice, Tide=TidePrice,
Wisk=WiskPrice, EraPlus=EraPlusPrice,
Solo=SoloPrice, All=AllPrice),
# ^- this is a list of objects, but with the cautionary note
# that R evaluates expressions in argument lazily. Therefore
# the safest is to have it as an R expression (it may or may
# not work if evaluated, but this depends on the code in
# `mnp`)
cXnames = "price",
# ^- this is a string
data = detergent,
n.draws = 500, burnin = 100,
thin = 3, verbose = TRUE)
你的 Python 是(关于差异的评论):
choiceX = rlc.TaggedList(['SurfPrice', 'TidePrice', 'WiskPrice',
'EraPlusPrice', 'SoloPrice', 'AllPrice'],
tags=('Surf', 'Tide', 'Wisk',
'EraPlus', 'Solo', 'All'))
# ^- this is a "tagged list", and the R equivalent would be
# list(Surf="SurfPrice", Tide="TidePrice", Wisk="WiskPrice",
# EraPlus="EraPlusPrice", Solo="SoloPrice", All="AllPrice")
# Something closer to your R code above would be:
# rlc.TaggedList([as_symbol('SurfPrice'), as_symbol('TidePrice'),
# ...
# tags=('Surf', 'Tide', ...))
res = mnp.mnp('choice ~ 1',
# ^- this is a string. To make it an R formula, do
# robjects.Formula('choice ~ 1')
choiceX=['SurfPrice', 'TidePrice', 'WiskPrice',
'EraPlusPrice', 'SoloPrice', 'AllPrice'],
# ^- this should be choiceX defined above, I guess
cXnames='price',
# ^- this is a string, like in R
data=df,
n_draws=500, burnin=100,
thin=3, verbose=True)
编辑:
现在这意味着以下应该可以工作
choiceX = robjects.rinterface.parse("""
list(Surf=SurfPrice, Tide=TidePrice,
Wisk=WiskPrice, EraPlus=EraPlusPrice,
Solo=SoloPrice, All=AllPrice)""")
目前 rpy2
没有提供很多用于构建 R 表达式的实用程序。如果变量名是Python级别的参数
你可以考虑这样的事情:
rcode = 'list('+''.join('%s=%s' % (k,v) \
for k,v in \
(('Surf','SurfPrice'),
('Tide', 'TidePrice'),
('Wisk','WiskPrice'),
('EraPlus','EraPlusPrice'),
('Solo','SoloPrice'),
('All','AllPrice'))) + ')'
choiceX = robjects.rinterface.parse(rcode)
我正在尝试以两种不同的方式从 rpy2 复制 R 中 MNP 包中的示例。首先,我只是将 robjects.r
与一个完全复制并粘贴 R 代码的字符串一起使用:
import rpy2.robjects as robjects
import rpy2.robjects.numpy2ri
import rpy2.robjects.pandas2ri
import rpy2.robjects.packages as rpackages
robjects.pandas2ri.activate()
mnp = rpackages.importr('MNP')
base = rpackages.importr('base')
r = robjects.r
r.data('detergent')
rcmd = '''\
mnp(choice ~ 1, choiceX = list(Surf=SurfPrice, Tide=TidePrice,
Wisk=WiskPrice, EraPlus=EraPlusPrice,
Solo=SoloPrice, All=AllPrice),
cXnames = "price", data = detergent, n.draws = 500, burnin = 100,
thin = 3, verbose = TRUE)'''
res = r(rcmd)
这工作正常并重现了我可以直接在 R 中做的事情。我还想尝试 运行 使用 python 可访问对象的这段代码,从数据帧传递数据:
import rpy2.rlike.container as rlc
df = robjects.pandas2ri.ri2py(r['detergent'])
choiceX = rlc.TaggedList(['SurfPrice', 'TidePrice', 'WiskPrice', 'EraPlusPrice', 'SoloPrice', 'AllPrice'],
tags=('Surf', 'Tide', 'Wisk', 'EraPlus', 'Solo', 'All'))
res = mnp.mnp('choice ~ 1',
choiceX=['SurfPrice', 'TidePrice', 'WiskPrice', 'EraPlusPrice', 'SoloPrice', 'AllPrice'],
cXnames='price',
data=df, n_draws=500, burnin=100,
thin=3, verbose=True)
失败并出现错误:
Error in xmatrix.mnp(formula, data = eval.parent(data), choiceX = call$choiceX, :
Error: Invalid input for `choiceX.'
You must specify the choice-specific varaibles at least for all non-base categories.
在另一个 SO response 中建议用 rpy2 TaggedList 替换 R 命名列表。如果我删除 MNP 的 choiceX
和 cXnames
参数(它们是可选的),代码会运行,因此看起来 pandas 数据帧正在正确传递。
我不确定 TaggedList 进入 R 后是否没有被正确解释为命名列表,或者 MNP 是否存在一些问题没有将 choiceX
的内容与 pandas 数据框。
有人知道这里会发生什么吗?
更新
根据@lgautier 的建议,我将代码修改为:
choiceX = rlc.TaggedList([base.as_symbol('SurfPrice'), base.as_symbol('TidePrice'),
base.as_symbol('WiskPrice'), base.as_symbol('EraPlusPrice'),
base.as_symbol('SoloPrice'), base.as_symbol('AllPrice')],
tags=('Surf', 'Tide', 'Wisk', 'EraPlus', 'Solo', 'All'))
res = mnp.mnp(robjects.Formula('choice ~ 1'),
choiceX=choiceX,
cXnames='price',
data=df, n_draws=500, burnin=100,
thin=3, verbose=True)
但是,我收到了与之前发布的相同的错误。
更新 2
按照@lgautier 建议的解决方法,以下代码:
choiceX = rlc.TaggedList([base.as_symbol('SurfPrice'),
base.as_symbol('TidePrice'),
base.as_symbol('WiskPrice'),
base.as_symbol('EraPlusPrice'),
base.as_symbol('SoloPrice'),
base.as_symbol('AllPrice')],
tags=('Surf', 'Tide', 'Wisk',
'EraPlus', 'Solo', 'All'))
choiceX = robjects.conversion.py2ro(choiceX)
# add the names
choiceX.names = robjects.vectors.StrVector(('Surf', 'Tide',
'Wisk', 'EraPlus',
'Solo', 'All'))
res = mnp.mnp(robjects.Formula('choice ~ 1'),
choiceX=choiceX,
cXnames='price',
data=df, n_draws=500, burnin=100,
thin=3, verbose=True)
仍然产生错误(尽管是不同的错误):
Error in as.vector(x, mode) :
cannot coerce type 'symbol' to vector of type 'any'
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-21-7de5ad805801> in <module>()
3 cXnames='price',
4 data=df, n_draws=500, burnin=100,
----> 5 thin=3, verbose=True)
/Users/lev/anaconda/envs/rmnptest/lib/python2.7/site-packages/rpy2-2.5.6-py2.7-macosx-10.5-x86_64.egg/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
168 v = kwargs.pop(k)
169 kwargs[r_k] = v
--> 170 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
171
172 pattern_link = re.compile(r'\link\{(.+?)\}')
/Users/lev/anaconda/envs/rmnptest/lib/python2.7/site-packages/rpy2-2.5.6-py2.7-macosx-10.5-x86_64.egg/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
98 for k, v in kwargs.items():
99 new_kwargs[k] = conversion.py2ri(v)
--> 100 res = super(Function, self).__call__(*new_args, **new_kwargs)
101 res = conversion.ri2ro(res)
102 return res
RRuntimeError: Error in as.vector(x, mode) :
cannot coerce type 'symbol' to vector of type 'any'
Python 代码与您的 R 不对应。您在发帖后就明白了这一点,请在下面提供详细信息。总结是 R 符号和 Python 字符串是不等价的(尽管 R 在某些地方允许这两种符号使自己的用户感到困惑——例如,library("MNP")
和 library(MNP)
都可以工作)。
这与这个问题没什么不同:
...除了 choiceX
将是一个未计算的 R 表达式,而不仅仅是一个符号。
R代码是:
data(detergent)
mnp(choice ~ 1,
# ^- this is a "formula", which is an expression in R
choiceX = list(Surf=SurfPrice, Tide=TidePrice,
Wisk=WiskPrice, EraPlus=EraPlusPrice,
Solo=SoloPrice, All=AllPrice),
# ^- this is a list of objects, but with the cautionary note
# that R evaluates expressions in argument lazily. Therefore
# the safest is to have it as an R expression (it may or may
# not work if evaluated, but this depends on the code in
# `mnp`)
cXnames = "price",
# ^- this is a string
data = detergent,
n.draws = 500, burnin = 100,
thin = 3, verbose = TRUE)
你的 Python 是(关于差异的评论):
choiceX = rlc.TaggedList(['SurfPrice', 'TidePrice', 'WiskPrice',
'EraPlusPrice', 'SoloPrice', 'AllPrice'],
tags=('Surf', 'Tide', 'Wisk',
'EraPlus', 'Solo', 'All'))
# ^- this is a "tagged list", and the R equivalent would be
# list(Surf="SurfPrice", Tide="TidePrice", Wisk="WiskPrice",
# EraPlus="EraPlusPrice", Solo="SoloPrice", All="AllPrice")
# Something closer to your R code above would be:
# rlc.TaggedList([as_symbol('SurfPrice'), as_symbol('TidePrice'),
# ...
# tags=('Surf', 'Tide', ...))
res = mnp.mnp('choice ~ 1',
# ^- this is a string. To make it an R formula, do
# robjects.Formula('choice ~ 1')
choiceX=['SurfPrice', 'TidePrice', 'WiskPrice',
'EraPlusPrice', 'SoloPrice', 'AllPrice'],
# ^- this should be choiceX defined above, I guess
cXnames='price',
# ^- this is a string, like in R
data=df,
n_draws=500, burnin=100,
thin=3, verbose=True)
编辑:
现在这意味着以下应该可以工作
choiceX = robjects.rinterface.parse("""
list(Surf=SurfPrice, Tide=TidePrice,
Wisk=WiskPrice, EraPlus=EraPlusPrice,
Solo=SoloPrice, All=AllPrice)""")
目前 rpy2
没有提供很多用于构建 R 表达式的实用程序。如果变量名是Python级别的参数
你可以考虑这样的事情:
rcode = 'list('+''.join('%s=%s' % (k,v) \
for k,v in \
(('Surf','SurfPrice'),
('Tide', 'TidePrice'),
('Wisk','WiskPrice'),
('EraPlus','EraPlusPrice'),
('Solo','SoloPrice'),
('All','AllPrice'))) + ')'
choiceX = robjects.rinterface.parse(rcode)