(python) 如果条件成立，则在连续行中添加元素

Question

我有一个包含 "N" 行和“3”列的列表。如果连续行的前两个元素相同，那么我想在第三列和 return 一行中添加元素并添加 "third column" 值。

例如

120.638000      -21.541700      0.3  
120.638000      -21.541700      0.8       
121.331001      -21.795500      0.5       
120.688004      -21.587400      0.1        
120.688004      -21.587400      0.5      
120.688004      -21.587400      0.9     
121.525002      -21.504200      0.9

到

120.638000      -21.541700      1.1  (add third column of row 1 and 2)       
121.331001      -21.795500      0.5       
120.688004      -21.587400      1.5  (sum(0.1,0.5,0.9))       
121.525002      -21.504200      0.9

有什么关于在 python 中实施的建议吗？

Answer 1

您可以使用 csvreader 读取数据，然后您可以使用 defaultdict 根据 column1,2:

中的相同元组对 column3 求和

from collections import defaultdict
from csv import csvreader

result = defaultdict(float)
with open("<datafile>") as f:
    data = csvreader(f, delimiter='\t')
    for a,b,c in data:
        result[(a,b)] += float(c)

for (a,b),c in result.items():
    print(a, b, c)

这不一定会以相同的顺序出现，因为字典没有排序。

Answer 2

使用库 csv 生成行列表。

使用字典键来维护唯一的 1-2 列值对。聚合字典值中第三列的总值。

totals = {}
for (a,b,c) in list_of_rows:
    if (a,b) in totals:
        totals[(a,b)] += c
    else:
        totals[(a,b)] = c

如果您想要在 3 元素列表中的结果，

totals_list = [[key[0], key[1], totals[key]] for key in totals]

我认为指望浮点数相等是不可靠的。如果精度始终与示例数据中指示的精度相同——而且我不太关心处理时间以保证调查——我可能会使用

for (int(a*1e6), int(b*1e6), c) in list_of_rows: 对我的方法更有信心。

Answer 3

import operator
import itertools
with open('blah') as infile, open('blahout', 'w') as outfile:
    writer = csv.writer(outfile, delimiter='\t')
    for k,group in itertools.groupby(csv.reader(infile, delimiter='\t'), operator.itemgetter(0,1)):
        writer.writerow(list(k) + [sum(float(r[-1]) for r in group)])

(python) 如果条件成立，则在连续行中添加元素

(python) add elements in consecutive rows if the condition holds true

python

arrays

sorting

rows