按行对值求和

Question

我有如下排列的三列数据：

输入文件：

>>>>>
1.0 2.0 3.0
2.0 2.0 4.0
3.0 4.5 8.0
>>>>>
1.0 2.5 6.8
2.0 3.5 6.8
3.0 1.2 1.9
>>>>>
1.0 1.2 1.3
2.0 2.7 1.8
3.0 4.5 8.5

在上面的输入文件中，第一列的值是重复的，所以我只想取一次那个值，并且想按行对第三列的值求和，而不想取任何第二列的值。

我还想用固定值 1.0 附加第三列

最后想将结果保存在另一个名为 output.txt 的测试文件中。

输出：

1.0  11.1  1.0
2.0  12.6  1.0
3.0  18.4  1.0

输出的第二列值如下：

3.0+6.8+1.3
4.0+6.8+1.8
8.0+1.9+8.5

我尝试使用 numpy 但出现错误：

import numpy as np
import pandas as pd
import glob
data=np.loadtxt("input.txt")

Answer 1

尝试：

df[2].groupby(np.arange(len(df)) % 3).sum()
# or df.iloc[:, 2].groupby(np.arange(len(df)) % 3).sum()

0    11.1
1    12.6
2    18.4
Name: 2, dtype: float64

Answer 2

将 groupby 与 reset index

结合使用

dfNew = df.groupby(0)[2].sum().reset_index()
dfNew.to_csv('output.txt', index= False)

Answer 3

您需要使用 pandas.read_csv 读取输入文件，您需要将分隔符设置为 " "，不指定 header 和 ">" 作为注释行。

然后执行groupby/sum操作，不带header导出使用pandas.to_csv

import pandas as pd

# input
df = pd.read_csv('filename.csv', delimiter=' ', header=None, comment='>')

# output
(df.groupby(0)[[2]].sum()
   .assign(col=1.0)
   .to_csv('output.txt', header=False, sep=' ', float_format='%.2f')
)

output.txt:

1.00 11.10 1.00
2.00 12.60 1.00
3.00 18.40 1.00

按行对值求和

summing the values row wise

python

glob

numpy

pandas