查找重复项，添加到变量并删除

Question

我有一个脚本可以将销售值写入文件中的不同行，最终目标是将数据保存到数据库中。我遇到的问题运行是相同销售人员、日期、产品、价格和数量的重复条目。

我的代码是这样写到文件中的：

John 07-15-2016 Tool Belt 0 2
Sara 07-15-2016 Hammer 0 3
John 07-15-2016 Tool Belt 0 2
John 07-15-2016 Tool Belt 0 2
Sara 07-15-2016 Hammer 0 3

如何删除重复项并将它们加在一起？即输出为：

John 07-15-2016 Tool Belt 0 6
Sara 07-15-2016 Hammer 0 6

我使用过计数器，但它没有捕捉到多个实例，我也找不到将两者相加的方法。

如有任何帮助，我们将不胜感激。

脚本：

for line in s:
        var = re.compile(r'($)',re.M)
        line = re.sub(var, "", line)
        var = re.compile(r'(\,)',re.M)
        line = re.sub(var, "", line)
        line = line.rstrip('\n')
        line = line.split("|")
        if line[0] != '':
            salesperson = str(salesperson)
            date = dt.now()
            t = line[0].split()
            print t
            t = str(t[0])
            try:
                s = dt.strptime(t, "%H:%M:%S")
            except:
                s = dt.strptime(t, "%H:%M")
            s = s.time()
            date = dt.combine(date, s)
            date = str(date)
            price = line[1]
            quantity = line[2]
        fn.write("%s %s %s %s \n" % (salesperson, date, price, quantity))
    fn.close()

Answer 1

假设您的文件名为 records.txt

要将文件拆分为每个销售人员的单独文件：

awk '{print > }' records.txt

然后计算每个销售人员的特定项目：

cat Sara | grep 'Hammer' | awk '{print $NF,sum}' | awk '{s+=} END {print s}'

Answer 2

sample.csv

John 07-15-2016 Tool Belt 0 2
Sara 07-15-2016 Hammer 0 3
John 07-15-2016 Tool Belt 0 2
John 07-15-2016 Tool Belt 0 2
Sara 07-15-2016 Hammer 0 3

test.py

with open("sample.csv") as inputs:
    mydict = dict()
    for line in inputs:
        elements = line.strip().split()
        key = " ".join(elements[0: len(elements) - 1]) 
        mydict[key] = mydict.get(key, 0) + int(elements[-1])

    # iterate the dictionary and print out result
    for key, value in mydict.iteritems():
        print "{0} {1}".format(key, value)

我用字典，拆分每一行，使用第一个len(elements) - 1个元素作为键，然后在迭代所有行时增加最后一个元素。

mydict.get(key, 0) returns value if key exist in the dictionary, otherwise return value 0

结果： python2.7 test.py

Sara 07-15-2016 Hammer 0 6
John 07-15-2016 Tool Belt 0 6

因此在你的情况下你需要：

elements = line.strip().split()
key = " ".join(elements[0: len(elements) - 1]) 
mydict[key] = mydict.get(key, 0) + int(elements[-1])

查找重复项，添加到变量并删除

Find duplicates, add to variable and remove

python

django

counter