转换为 pandas 数据帧时保留 R 数据帧索引值
Retain R dataframe index values when converting to a pandas dataframe
使用 R(基本版本 3.5.2)包 LME4 拟合混合效果模型,运行 通过 rpy2 2.9.4 来自 Python 3.6
能够将随机效应打印为索引数据框,其中索引值是用于定义组的分类变量的值(使用 radon data):
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri, default_converter
from rpy2.robjects.conversion import localconverter
from rpy2.robjects.packages import importr
lme4 = importr('lme4')
mod = lme4.lmer(**kwargs) # Omitting arguments for brevity
r_ranef = ro.r['ranef']
re = r_ranef(mod)
print(re[1])
Uppm (Intercept) floor (Intercept)
AITKIN -0.0026783361 -2.588735e-03 1.742426e-09 -0.0052003670
ANOKA -0.0056688495 -6.418760e-03 -4.482764e-09 -0.0128942943
BECKER 0.0021906431 1.190746e-03 1.211201e-09 0.0023920238
BELTRAMI 0.0093246041 8.190172e-03 5.135196e-09 0.0164527872
BENTON 0.0018747838 1.049496e-03 1.746748e-09 0.0021082742
BIG STONE -0.0073756824 -2.430404e-03 0.000000e+00 -0.0048823057
BLUE EARTH 0.0112939204 4.176931e-03 5.507525e-09 0.0083908075
BROWN 0.0069223055 2.544912e-03 4.911563e-11 0.0051123339
将其转换为 pandas DataFrame,分类值从索引中丢失并替换为整数:
pandas2ri.ri2py_dataframe(r_ranef[1]) # r_ranef is a dict of dataframes
Uppm (Intercept) floor (Intercept)
0 -0.002678 -0.002589 1.742426e-09 -0.005200
1 -0.005669 -0.006419 -4.482764e-09 -0.012894
2 0.002191 0.001191 1.211201e-09 0.002392
3 0.009325 0.008190 5.135196e-09 0.016453
4 0.001875 0.001049 1.746748e-09 0.002108
5 -0.007376 -0.002430 0.000000e+00 -0.004882
6 0.011294 0.004177 5.507525e-09 0.008391
7 0.006922 0.002545 4.911563e-11 0.005112
如何保留原始索引的值?
doc 建议 as.data.frame
可以包含 grp
,这可能是我想要的值,但我正在努力通过 rpy2 实现它;例如,
r_ranef = ro.r['ranef.as.data.frame']
无效
考虑在 R 数据框中添加 row.names
作为新列,然后将此列用于 Pandas 数据框中的 set_index
:
base = importr('base')
# ADD NEW COLUMN TO R DATA FRAME
re[1] = base.transform(re[1], index = base.row_names(re[1]))
# SET INDEX IN PANDAS DATA FRAME
py_df = (pandas2ri.ri2py_dataframe(re[1])
.set_index('index')
.rename_axis(None)
)
要对列表中的所有数据帧执行此操作,请使用 R 的 lapply
循环,然后使用 Python 的列表推导式 Pandas 索引数据帧的新列表。
base = importr('base')
mod = lme4.lmer(**kwargs) # Omitting arguments for brevity
r_ranef = lme4.ranef(mod)
# R LAPPLY
new_r_ranef = base.lapply(r_ranef, lambda df:
base.transform(df, index=base.row_names(df)))
# PYTHON LIST COMPREHENSION
py_df_list = [(pandas2ri.ri2py_dataframe(df)
.set_index('index')
.rename_axis(None)
) for df in new_r_ranef]
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri, default_converter
from rpy2.robjects.conversion import localconverter
r_dataf = ro.r("""
data.frame(
Uppm = rnorm(5),
row.names = letters[1:5]
)
""")
with localconverter(default_converter + pandas2ri.converter) as conv:
pd_dataf = conv.rpy2py(r_dataf)
# row names are "a".."f"
print(r_dataf)
# row names / indexes are now 0..4
print(pd_dataf)
这可能是 rpy2 中的一个小 bug/missing 功能,但解决方法相当简单:
with localconverter(default_converter + pandas2ri.converter) as conv:
pd_dataf = conv.rpy2py(r_dataf)
pd_dataf.index = r_dataf.rownames
使用 R(基本版本 3.5.2)包 LME4 拟合混合效果模型,运行 通过 rpy2 2.9.4 来自 Python 3.6
能够将随机效应打印为索引数据框,其中索引值是用于定义组的分类变量的值(使用 radon data):
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri, default_converter
from rpy2.robjects.conversion import localconverter
from rpy2.robjects.packages import importr
lme4 = importr('lme4')
mod = lme4.lmer(**kwargs) # Omitting arguments for brevity
r_ranef = ro.r['ranef']
re = r_ranef(mod)
print(re[1])
Uppm (Intercept) floor (Intercept)
AITKIN -0.0026783361 -2.588735e-03 1.742426e-09 -0.0052003670
ANOKA -0.0056688495 -6.418760e-03 -4.482764e-09 -0.0128942943
BECKER 0.0021906431 1.190746e-03 1.211201e-09 0.0023920238
BELTRAMI 0.0093246041 8.190172e-03 5.135196e-09 0.0164527872
BENTON 0.0018747838 1.049496e-03 1.746748e-09 0.0021082742
BIG STONE -0.0073756824 -2.430404e-03 0.000000e+00 -0.0048823057
BLUE EARTH 0.0112939204 4.176931e-03 5.507525e-09 0.0083908075
BROWN 0.0069223055 2.544912e-03 4.911563e-11 0.0051123339
将其转换为 pandas DataFrame,分类值从索引中丢失并替换为整数:
pandas2ri.ri2py_dataframe(r_ranef[1]) # r_ranef is a dict of dataframes
Uppm (Intercept) floor (Intercept)
0 -0.002678 -0.002589 1.742426e-09 -0.005200
1 -0.005669 -0.006419 -4.482764e-09 -0.012894
2 0.002191 0.001191 1.211201e-09 0.002392
3 0.009325 0.008190 5.135196e-09 0.016453
4 0.001875 0.001049 1.746748e-09 0.002108
5 -0.007376 -0.002430 0.000000e+00 -0.004882
6 0.011294 0.004177 5.507525e-09 0.008391
7 0.006922 0.002545 4.911563e-11 0.005112
如何保留原始索引的值?
doc 建议 as.data.frame
可以包含 grp
,这可能是我想要的值,但我正在努力通过 rpy2 实现它;例如,
r_ranef = ro.r['ranef.as.data.frame']
无效
考虑在 R 数据框中添加 row.names
作为新列,然后将此列用于 Pandas 数据框中的 set_index
:
base = importr('base')
# ADD NEW COLUMN TO R DATA FRAME
re[1] = base.transform(re[1], index = base.row_names(re[1]))
# SET INDEX IN PANDAS DATA FRAME
py_df = (pandas2ri.ri2py_dataframe(re[1])
.set_index('index')
.rename_axis(None)
)
要对列表中的所有数据帧执行此操作,请使用 R 的 lapply
循环,然后使用 Python 的列表推导式 Pandas 索引数据帧的新列表。
base = importr('base')
mod = lme4.lmer(**kwargs) # Omitting arguments for brevity
r_ranef = lme4.ranef(mod)
# R LAPPLY
new_r_ranef = base.lapply(r_ranef, lambda df:
base.transform(df, index=base.row_names(df)))
# PYTHON LIST COMPREHENSION
py_df_list = [(pandas2ri.ri2py_dataframe(df)
.set_index('index')
.rename_axis(None)
) for df in new_r_ranef]
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri, default_converter
from rpy2.robjects.conversion import localconverter
r_dataf = ro.r("""
data.frame(
Uppm = rnorm(5),
row.names = letters[1:5]
)
""")
with localconverter(default_converter + pandas2ri.converter) as conv:
pd_dataf = conv.rpy2py(r_dataf)
# row names are "a".."f"
print(r_dataf)
# row names / indexes are now 0..4
print(pd_dataf)
这可能是 rpy2 中的一个小 bug/missing 功能,但解决方法相当简单:
with localconverter(default_converter + pandas2ri.converter) as conv:
pd_dataf = conv.rpy2py(r_dataf)
pd_dataf.index = r_dataf.rownames