numpy 数组迭代中的元素不受 replace() 的影响

Elements in numpy array iteration aren't affected by replace()

为什么英语 CSV 导出器将小数点标记为逗号。我需要这些值作为浮点数或四舍五入为整数。但是当标记是逗号时,我找不到办法做到这一点。我去迭代并用点替换所有逗号,但似乎在我这样做之前有些已经带有点,而在我这样做之后带有逗号的不会被替换(它遍历所有 matrix/array/thing 和值都是字符串)。请指出我的错误并揭开这些谜团,或者为我提供更好的方法来完成我需要的事情。

我使用的是 jupyter 笔记本,所以以下代码来自三个单元格:

data = pd.read_csv(processed_data_path, header=None)
print(data)

data = pd.DataFrame.to_numpy(data)

print(data)

for row in data:
  for cell in row:
    cell = cell.replace(",", ".")

print(data)

这是第一次打印出来的内容:

        0       1       2       3       4       5       6       7       8    \
0      27,4  27,471  27,458  27,478  27,491  27,491  27,523  27,503  27,516   
1    27,433  27,433  27,433  27,503  27,491   27,51  27,491  27,503  27,516   
2    27,381  27,407    27,4  27,471  27,452  27,458  27,536  27,561  27,497   
3    27,413  27,413  27,426  27,439  27,426  27,491   27,51  27,465  27,471   
4    27,375  27,388  27,355  27,368  27,458  27,433  27,439  27,433  27,458   
..      ...     ...     ...     ...     ...     ...     ...     ...     ...   
475   26,17  26,183  26,176   26,17  26,183  26,196  26,222  26,241  26,261   
476   26,17  26,144  26,157  26,183  26,196  26,215  26,222  26,235  26,254   
477   26,17  26,144  26,176  26,183  26,196  26,209  26,235  26,248  26,261   
478  26,157  26,157  26,176  26,196  26,189  26,209  26,235  26,241  26,261   
479  26,105   26,15  26,157  26,183  26,196  26,209  26,228  26,248  26,281   

        9    ...     630     631     632     633     634     635     636  \
0    27,536  ...  32,863  32,863  32,912  32,851  32,845  32,832  32,826   
1    27,516  ...  32,857  32,881  32,832  32,845  32,839  32,851  32,845   
2    27,529  ...  32,851  32,857  32,826  32,857  32,845  32,851  32,839   
3     27,51  ...  32,863  32,826   32,82  32,839  32,796   32,79  32,808   
4    27,355  ...  32,839  32,814  32,802  32,814  32,783   32,82  32,802   
..      ...  ...     ...     ...     ...     ...     ...     ...     ...   
475  26,241  ...   27,31   27,31  27,336  27,349  27,343  27,355  27,317   
476  26,254  ...  27,304   27,31   27,33  27,349  27,355  27,355  27,349   
477  26,274  ...  27,291  27,323  27,349  27,349  27,381  27,362  27,388   
478  26,294  ...  27,297  27,317   27,33  27,349  27,368  27,407    27,4   
479    26,3  ...  27,297  27,323  27,336  27,349  27,394  27,394  27,394   

        637     638     639  
0     32,82  32,814  32,888  
1    32,826  32,826  32,851  
2    32,777   32,82  32,845  
3    32,796   32,82  32,808  
4    32,808  32,802   32,82  
..      ...     ...     ...  
475  27,362   27,33  27,394  
476  27,375  27,381  27,368  
477  27,394  27,388    27,4  
478  27,407    27,4    27,4  
479  27,413  27,407    27,4  

[480 rows x 640 columns]

这是第二次印刷的结果:

[['27.4' '27.471' '27.458' ... '32,82' '32,814' '32,888']
 ['27.433' '27.433' '27.433' ... '32,826' '32,826' '32,851']
 ['27.381' '27.407' '27.4' ... '32,777' '32,82' '32,845']
 ...
 ['26.17' '26.144' '26.176' ... '27,394' '27,388' '27,4']
 ['26.157' '26.157' '26.176' ... '27,407' '27,4' '27,4']
 ['26.105' '26.15' '26.157' ... '27,413' '27,407' '27,4']]

这是第三次印刷的结果:

[['27.4' '27.471' '27.458' ... '32,82' '32,814' '32,888']
['27.433' '27.433' '27.433' ... '32,826' '32,826' '32,851']
['27.381' '27.407' '27.4' ... '32,777' '32,82' '32,845']
...
['26.17' '26.144' '26.176' ... '27,394' '27,388' '27,4']
['26.157' '26.157' '26.176' ... '27,407' '27,4' '27,4']
['26.105' '26.15' '26.157' ... '27,413' '27,407' '27,4']]

使用Pandas替换函数:

data = pd.read_csv(processed_data_path, header=None)
data = data.replace(',','.', regex=True)

此外,请考虑为您处理 , 的小数选项:

data = pd.read_csv(processed_data_path, header=None, decimal=',')
for row in data:
  for cell in row:
    cell = cell.replace(",", ".")

这些行正在修改变量 cell,它根本没有“附加”到您的 data 对象。您实际上只是在某个变量 cell 中将 , 替换为 .。你可以做这样的事情,在 pandas.DataFrame 转换为 NumPy 之前用 . 替换 , 的所有实例(虽然不清楚为什么您首先选择了这些数据的基本 NumPy 表示形式)。

data = pd.read_csv(processed_data_path, header=None)
data = data.replace(',', '.')

如果您真的想以这种迭代方式修改 NumPy 对象本身,您可以这样做:

# For each row
for i in range(data.shape[0]):
    # For each column
    for j in range(data.shape[1]):
        data[i, j] = data[i, j].replace(',', '.')

尽管强烈建议坚持使用上面显示的“矢量化”pandas.DataFrame.replace 方法。