如何在不丢失 R 矩阵的列名的情况下,将保存为 RData 的矩阵从 R 导入 pandas 数据框?
How can I import a matrix from R saved as RData to a pandas data frame without losing the column names of the R matrix?
如何将保存为 RData 的 R 矩阵导入 pandas 数据框而不丢失 R 矩阵的列名?
例如,如果我在 R 中保存了这个矩阵:
A = matrix(
c(2, 4, 3, 1, 5, 7), # the data elements
nrow=2, # number of rows
ncol=3, # number of columns
byrow = TRUE) # fill matrix by rows
dimnames(A) = list(
c("row1", "row2"), # row names
c("col1", "col2", "col3")) # column names
A
save (A, file = 'matrix.RData')
输出:
> A
col1 col2 col3
row1 2 4 3
row2 1 5 7
然后在python中用rpy2加载如下:
from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects
def main():
pandas2ri.activate()
r['load']('matrix.RData')
variables = tuple(robjects.globalenv.keys())
print('variables: {0}'.format(variables))
matrix = robjects.globalenv['A']
frame = pandas2ri.ri2py(matrix)
print(frame)
print('type(frame): {0}'.format(type(frame)))
if __name__ == "__main__":
main()
打印:
variables: ('A',)
[[ 2. 4. 3.]
[ 1. 5. 7.]]
type(frame): <type 'numpy.ndarray'>
矩阵丢失了他的列名。我想通过将 R 加载到 pandas 数据框中来保留它们。
有一个名为 feather
的包,它以 R 和 Pandas 数据帧可读的格式保存数据帧。
在 R 中:
write_feather(as.data.frame(A), 'path/df.feather')
在Python中:
df = pd.read_feather('path/df.feather')
.
您可以在此处找到更多详细信息:
您可以使用 colnames
(使用 python 2.7 测试):
from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects
import pandas as pd
def load_r_matrix_into_pandas_dataframe(r_matrix):
'''
Import a matrix from R saved as RData to a pandas data frame without losing the column names of the R matrix
- Input: R matrix object
- Output: Pandas DataFrame
'''
numpy_matrix = pandas2ri.ri2py(r_matrix)
frame_column_names = r_matrix.colnames
frame = pd.DataFrame(data=numpy_matrix, columns=list(frame_column_names))
return frame
def main():
pandas2ri.activate()
r['load']('matrix.RData')
variables = tuple(robjects.globalenv.keys())
print('variables: {0}'.format(variables))
matrix = robjects.globalenv['A']
frame = load_r_matrix_into_pandas_dataframe(matrix)
print('frame: {0}'.format(frame))
if __name__ == "__main__":
main()
如何将保存为 RData 的 R 矩阵导入 pandas 数据框而不丢失 R 矩阵的列名?
例如,如果我在 R 中保存了这个矩阵:
A = matrix(
c(2, 4, 3, 1, 5, 7), # the data elements
nrow=2, # number of rows
ncol=3, # number of columns
byrow = TRUE) # fill matrix by rows
dimnames(A) = list(
c("row1", "row2"), # row names
c("col1", "col2", "col3")) # column names
A
save (A, file = 'matrix.RData')
输出:
> A
col1 col2 col3
row1 2 4 3
row2 1 5 7
然后在python中用rpy2加载如下:
from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects
def main():
pandas2ri.activate()
r['load']('matrix.RData')
variables = tuple(robjects.globalenv.keys())
print('variables: {0}'.format(variables))
matrix = robjects.globalenv['A']
frame = pandas2ri.ri2py(matrix)
print(frame)
print('type(frame): {0}'.format(type(frame)))
if __name__ == "__main__":
main()
打印:
variables: ('A',)
[[ 2. 4. 3.]
[ 1. 5. 7.]]
type(frame): <type 'numpy.ndarray'>
矩阵丢失了他的列名。我想通过将 R 加载到 pandas 数据框中来保留它们。
有一个名为 feather
的包,它以 R 和 Pandas 数据帧可读的格式保存数据帧。
在 R 中:
write_feather(as.data.frame(A), 'path/df.feather')
在Python中:
df = pd.read_feather('path/df.feather')
.
您可以在此处找到更多详细信息:
您可以使用 colnames
(使用 python 2.7 测试):
from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects
import pandas as pd
def load_r_matrix_into_pandas_dataframe(r_matrix):
'''
Import a matrix from R saved as RData to a pandas data frame without losing the column names of the R matrix
- Input: R matrix object
- Output: Pandas DataFrame
'''
numpy_matrix = pandas2ri.ri2py(r_matrix)
frame_column_names = r_matrix.colnames
frame = pd.DataFrame(data=numpy_matrix, columns=list(frame_column_names))
return frame
def main():
pandas2ri.activate()
r['load']('matrix.RData')
variables = tuple(robjects.globalenv.keys())
print('variables: {0}'.format(variables))
matrix = robjects.globalenv['A']
frame = load_r_matrix_into_pandas_dataframe(matrix)
print('frame: {0}'.format(frame))
if __name__ == "__main__":
main()