numpy 数组迭代中的元素不受 replace() 的影响
Elements in numpy array iteration aren't affected by replace()
为什么英语 CSV 导出器将小数点标记为逗号。我需要这些值作为浮点数或四舍五入为整数。但是当标记是逗号时,我找不到办法做到这一点。我去迭代并用点替换所有逗号,但似乎在我这样做之前有些已经带有点,而在我这样做之后带有逗号的不会被替换(它遍历所有 matrix/array/thing 和值都是字符串)。请指出我的错误并揭开这些谜团,或者为我提供更好的方法来完成我需要的事情。
我使用的是 jupyter 笔记本,所以以下代码来自三个单元格:
data = pd.read_csv(processed_data_path, header=None)
print(data)
data = pd.DataFrame.to_numpy(data)
print(data)
for row in data:
for cell in row:
cell = cell.replace(",", ".")
print(data)
这是第一次打印出来的内容:
0 1 2 3 4 5 6 7 8 \
0 27,4 27,471 27,458 27,478 27,491 27,491 27,523 27,503 27,516
1 27,433 27,433 27,433 27,503 27,491 27,51 27,491 27,503 27,516
2 27,381 27,407 27,4 27,471 27,452 27,458 27,536 27,561 27,497
3 27,413 27,413 27,426 27,439 27,426 27,491 27,51 27,465 27,471
4 27,375 27,388 27,355 27,368 27,458 27,433 27,439 27,433 27,458
.. ... ... ... ... ... ... ... ... ...
475 26,17 26,183 26,176 26,17 26,183 26,196 26,222 26,241 26,261
476 26,17 26,144 26,157 26,183 26,196 26,215 26,222 26,235 26,254
477 26,17 26,144 26,176 26,183 26,196 26,209 26,235 26,248 26,261
478 26,157 26,157 26,176 26,196 26,189 26,209 26,235 26,241 26,261
479 26,105 26,15 26,157 26,183 26,196 26,209 26,228 26,248 26,281
9 ... 630 631 632 633 634 635 636 \
0 27,536 ... 32,863 32,863 32,912 32,851 32,845 32,832 32,826
1 27,516 ... 32,857 32,881 32,832 32,845 32,839 32,851 32,845
2 27,529 ... 32,851 32,857 32,826 32,857 32,845 32,851 32,839
3 27,51 ... 32,863 32,826 32,82 32,839 32,796 32,79 32,808
4 27,355 ... 32,839 32,814 32,802 32,814 32,783 32,82 32,802
.. ... ... ... ... ... ... ... ... ...
475 26,241 ... 27,31 27,31 27,336 27,349 27,343 27,355 27,317
476 26,254 ... 27,304 27,31 27,33 27,349 27,355 27,355 27,349
477 26,274 ... 27,291 27,323 27,349 27,349 27,381 27,362 27,388
478 26,294 ... 27,297 27,317 27,33 27,349 27,368 27,407 27,4
479 26,3 ... 27,297 27,323 27,336 27,349 27,394 27,394 27,394
637 638 639
0 32,82 32,814 32,888
1 32,826 32,826 32,851
2 32,777 32,82 32,845
3 32,796 32,82 32,808
4 32,808 32,802 32,82
.. ... ... ...
475 27,362 27,33 27,394
476 27,375 27,381 27,368
477 27,394 27,388 27,4
478 27,407 27,4 27,4
479 27,413 27,407 27,4
[480 rows x 640 columns]
这是第二次印刷的结果:
[['27.4' '27.471' '27.458' ... '32,82' '32,814' '32,888']
['27.433' '27.433' '27.433' ... '32,826' '32,826' '32,851']
['27.381' '27.407' '27.4' ... '32,777' '32,82' '32,845']
...
['26.17' '26.144' '26.176' ... '27,394' '27,388' '27,4']
['26.157' '26.157' '26.176' ... '27,407' '27,4' '27,4']
['26.105' '26.15' '26.157' ... '27,413' '27,407' '27,4']]
这是第三次印刷的结果:
[['27.4' '27.471' '27.458' ... '32,82' '32,814' '32,888']
['27.433' '27.433' '27.433' ... '32,826' '32,826' '32,851']
['27.381' '27.407' '27.4' ... '32,777' '32,82' '32,845']
...
['26.17' '26.144' '26.176' ... '27,394' '27,388' '27,4']
['26.157' '26.157' '26.176' ... '27,407' '27,4' '27,4']
['26.105' '26.15' '26.157' ... '27,413' '27,407' '27,4']]
使用Pandas替换函数:
data = pd.read_csv(processed_data_path, header=None)
data = data.replace(',','.', regex=True)
此外,请考虑为您处理 ,
的小数选项:
data = pd.read_csv(processed_data_path, header=None, decimal=',')
for row in data:
for cell in row:
cell = cell.replace(",", ".")
这些行正在修改变量 cell
,它根本没有“附加”到您的 data
对象。您实际上只是在某个变量 cell
中将 ,
替换为 .
。你可以做这样的事情,在 pandas.DataFrame
转换为 NumPy 之前用 .
替换 ,
的所有实例(虽然不清楚为什么您首先选择了这些数据的基本 NumPy 表示形式)。
data = pd.read_csv(processed_data_path, header=None)
data = data.replace(',', '.')
如果您真的想以这种迭代方式修改 NumPy 对象本身,您可以这样做:
# For each row
for i in range(data.shape[0]):
# For each column
for j in range(data.shape[1]):
data[i, j] = data[i, j].replace(',', '.')
尽管强烈建议坚持使用上面显示的“矢量化”pandas.DataFrame.replace
方法。
为什么英语 CSV 导出器将小数点标记为逗号。我需要这些值作为浮点数或四舍五入为整数。但是当标记是逗号时,我找不到办法做到这一点。我去迭代并用点替换所有逗号,但似乎在我这样做之前有些已经带有点,而在我这样做之后带有逗号的不会被替换(它遍历所有 matrix/array/thing 和值都是字符串)。请指出我的错误并揭开这些谜团,或者为我提供更好的方法来完成我需要的事情。
我使用的是 jupyter 笔记本,所以以下代码来自三个单元格:
data = pd.read_csv(processed_data_path, header=None)
print(data)
data = pd.DataFrame.to_numpy(data)
print(data)
for row in data:
for cell in row:
cell = cell.replace(",", ".")
print(data)
这是第一次打印出来的内容:
0 1 2 3 4 5 6 7 8 \
0 27,4 27,471 27,458 27,478 27,491 27,491 27,523 27,503 27,516
1 27,433 27,433 27,433 27,503 27,491 27,51 27,491 27,503 27,516
2 27,381 27,407 27,4 27,471 27,452 27,458 27,536 27,561 27,497
3 27,413 27,413 27,426 27,439 27,426 27,491 27,51 27,465 27,471
4 27,375 27,388 27,355 27,368 27,458 27,433 27,439 27,433 27,458
.. ... ... ... ... ... ... ... ... ...
475 26,17 26,183 26,176 26,17 26,183 26,196 26,222 26,241 26,261
476 26,17 26,144 26,157 26,183 26,196 26,215 26,222 26,235 26,254
477 26,17 26,144 26,176 26,183 26,196 26,209 26,235 26,248 26,261
478 26,157 26,157 26,176 26,196 26,189 26,209 26,235 26,241 26,261
479 26,105 26,15 26,157 26,183 26,196 26,209 26,228 26,248 26,281
9 ... 630 631 632 633 634 635 636 \
0 27,536 ... 32,863 32,863 32,912 32,851 32,845 32,832 32,826
1 27,516 ... 32,857 32,881 32,832 32,845 32,839 32,851 32,845
2 27,529 ... 32,851 32,857 32,826 32,857 32,845 32,851 32,839
3 27,51 ... 32,863 32,826 32,82 32,839 32,796 32,79 32,808
4 27,355 ... 32,839 32,814 32,802 32,814 32,783 32,82 32,802
.. ... ... ... ... ... ... ... ... ...
475 26,241 ... 27,31 27,31 27,336 27,349 27,343 27,355 27,317
476 26,254 ... 27,304 27,31 27,33 27,349 27,355 27,355 27,349
477 26,274 ... 27,291 27,323 27,349 27,349 27,381 27,362 27,388
478 26,294 ... 27,297 27,317 27,33 27,349 27,368 27,407 27,4
479 26,3 ... 27,297 27,323 27,336 27,349 27,394 27,394 27,394
637 638 639
0 32,82 32,814 32,888
1 32,826 32,826 32,851
2 32,777 32,82 32,845
3 32,796 32,82 32,808
4 32,808 32,802 32,82
.. ... ... ...
475 27,362 27,33 27,394
476 27,375 27,381 27,368
477 27,394 27,388 27,4
478 27,407 27,4 27,4
479 27,413 27,407 27,4
[480 rows x 640 columns]
这是第二次印刷的结果:
[['27.4' '27.471' '27.458' ... '32,82' '32,814' '32,888']
['27.433' '27.433' '27.433' ... '32,826' '32,826' '32,851']
['27.381' '27.407' '27.4' ... '32,777' '32,82' '32,845']
...
['26.17' '26.144' '26.176' ... '27,394' '27,388' '27,4']
['26.157' '26.157' '26.176' ... '27,407' '27,4' '27,4']
['26.105' '26.15' '26.157' ... '27,413' '27,407' '27,4']]
这是第三次印刷的结果:
[['27.4' '27.471' '27.458' ... '32,82' '32,814' '32,888']
['27.433' '27.433' '27.433' ... '32,826' '32,826' '32,851']
['27.381' '27.407' '27.4' ... '32,777' '32,82' '32,845']
...
['26.17' '26.144' '26.176' ... '27,394' '27,388' '27,4']
['26.157' '26.157' '26.176' ... '27,407' '27,4' '27,4']
['26.105' '26.15' '26.157' ... '27,413' '27,407' '27,4']]
使用Pandas替换函数:
data = pd.read_csv(processed_data_path, header=None)
data = data.replace(',','.', regex=True)
此外,请考虑为您处理 ,
的小数选项:
data = pd.read_csv(processed_data_path, header=None, decimal=',')
for row in data:
for cell in row:
cell = cell.replace(",", ".")
这些行正在修改变量 cell
,它根本没有“附加”到您的 data
对象。您实际上只是在某个变量 cell
中将 ,
替换为 .
。你可以做这样的事情,在 pandas.DataFrame
转换为 NumPy 之前用 .
替换 ,
的所有实例(虽然不清楚为什么您首先选择了这些数据的基本 NumPy 表示形式)。
data = pd.read_csv(processed_data_path, header=None)
data = data.replace(',', '.')
如果您真的想以这种迭代方式修改 NumPy 对象本身,您可以这样做:
# For each row
for i in range(data.shape[0]):
# For each column
for j in range(data.shape[1]):
data[i, j] = data[i, j].replace(',', '.')
尽管强烈建议坚持使用上面显示的“矢量化”pandas.DataFrame.replace
方法。