如何在不丢失 R 矩阵的列名的情况下,将保存为 RData 的矩阵从 R 导入 pandas 数据框?

How can I import a matrix from R saved as RData to a pandas data frame without losing the column names of the R matrix?

如何将保存为 RData 的 R 矩阵导入 pandas 数据框而不丢失 R 矩阵的列名?

例如,如果我在 R 中保存了这个矩阵:

A = matrix( 
     c(2, 4, 3, 1, 5, 7), # the data elements 
     nrow=2,              # number of rows 
     ncol=3,              # number of columns 
     byrow = TRUE)        # fill matrix by rows 

dimnames(A) = list( 
     c("row1", "row2"),         # row names 
     c("col1", "col2", "col3")) # column names 

A
save (A, file = 'matrix.RData')

输出:

> A
     col1 col2 col3
row1    2    4    3
row2    1    5    7

然后在python中用rpy2加载如下:

from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects

def main():
    pandas2ri.activate()
    r['load']('matrix.RData')
    variables = tuple(robjects.globalenv.keys())
    print('variables: {0}'.format(variables))
    matrix = robjects.globalenv['A']
    frame = pandas2ri.ri2py(matrix)
    print(frame)
    print('type(frame): {0}'.format(type(frame)))

if __name__ == "__main__":
    main()

打印:

variables: ('A',)
[[ 2.  4.  3.]
 [ 1.  5.  7.]]
type(frame): <type 'numpy.ndarray'>

矩阵丢失了他的列名。我想通过将 R 加载到 pandas 数据框中来保留它们。

有一个名为 feather 的包,它以 R 和 Pandas 数据帧可读的格式保存数据帧。

在 R 中:

write_feather(as.data.frame(A), 'path/df.feather')

在Python中:

df = pd.read_feather('path/df.feather')

.

您可以在此处找到更多详细信息:

您可以使用 colnames(使用 python 2.7 测试):

from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects
import pandas as pd

def load_r_matrix_into_pandas_dataframe(r_matrix):
    '''
    Import a matrix from R saved as RData to a pandas data frame without losing the column names of the R matrix
    
     - Input: R matrix object
     - Output: Pandas DataFrame
    '''
    numpy_matrix = pandas2ri.ri2py(r_matrix)
    frame_column_names = r_matrix.colnames
    frame = pd.DataFrame(data=numpy_matrix, columns=list(frame_column_names))
    return frame

def main():
    pandas2ri.activate()
    r['load']('matrix.RData')
    variables = tuple(robjects.globalenv.keys())
    print('variables: {0}'.format(variables))
    matrix = robjects.globalenv['A']

    frame = load_r_matrix_into_pandas_dataframe(matrix)
    print('frame: {0}'.format(frame))

if __name__ == "__main__":
    main()