使用 pandas 数据帧时，无法将存储为 excel 中的字符串的矩阵转换为 numpy 数组

Question

我很难用 pandas DataFrame 读取 excel 文件并将存储的矩阵转换为 numpy array。我认为部分问题是矩阵存储不当。我无法控制电子表格，但是它是这样发送给我的。

例如这是存储在单元格中的字符串

[[[ 0.        0.        0.107851]
  [ 0.        0.       -0.862809]]]

我读入了 DataFrame 的行，并将每个单元格保存到一个变量中。然后我尝试将这个特定变量转换为 np.array，因为这些数字代表两组 x、y、z 坐标。

我试过np.fromstring和np.asarray都没有用。它会将字符串转换为一个 numpy 数组，但是如果括号仍然作为字符在里面，那将是一团糟。我试过使用 np.squeeze 去掉括号，但它说维度不是 1.

如果我使用 np.asarray(item._coord, dtype=float) 那么它会失败，说它不能将字符串转换为浮点数。

ValueError: could not convert string to float: '[[[ 0. 0. 0.107851] [ 0. 0. -0.862809]]]'

有一个'\n'出现在它的中间，在两个列表之间。我在尝试数据转换之前使用 df = df.replace(r'\n', ' ',regex=True)' to clean out the\n`'s。

我卡住了

Answer 1

在 read_excel:

之后使用自定义函数转换为 numpy array

a= np.array([[[ 0.,        0.,        0.107851],
              [ 0.,        0.,       -0.862809]]])
print (a)
[[[ 0.        0.        0.107851]
  [ 0.        0.       -0.862809]]]

df = pd.DataFrame({'col':[a,a,a]})
print (df)
                                               col
0  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
1  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
2  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]

df.to_excel('test.xlsx', index=False)

import re
import ast
import numpy as np

#
def str2array(s):
    # Remove space after [
    s=re.sub('\[ +', '[', s.strip())
    # Replace commas and spaces
    s=re.sub('[,\s]+', ', ', s)
    return np.array(ast.literal_eval(s))

df = pd.read_excel('test.xlsx')

df['col'] = df['col'].apply(str2array)
print (df)
                                               col
0  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
1  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
2  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]

使用 pandas 数据帧时，无法将存储为 excel 中的字符串的矩阵转换为 numpy 数组

Trouble converting matrix stored as string in excel to numpy array when using pandas dataframe

numpy

string-conversion

dataframe

python-3.x

pandas