仅使用标准库按另一列中分组值的一列累计总数对文本文件进行排序?
Sorting a text file by cumulative total of one column from grouped values in another column using standard library only?
我有一个包含这样行的文件
id, car_type, cost
1, benz, 60000
2, benz, 55000
3, bmw, 30000
4, benz, 25000
5, bmw, 26000
6, ford, 5000
我想按每个 car_type
的总成本对该文件进行排序。例如,“benz
”的总费用为 60000 + 55000 + 25000 = 14000
所以最终输出应该是
benz, 140000
bmw, 56000
ford, 5000
到目前为止,这是我拥有的:
file = "small_sample.txt"
f=open(file,"r")
lines=f.readlines()[1:]
car_and_cost ={}
for x in lines:
cost = x.split(',')[4].rstrip('\n')
car_and_cost.update({x.split(',')[3]:float(cost)})
f.close()
print(car_and_cost)
new_dic = {}
for key,lis in car_and_cost.items():
new_dic[key] = sum(lis)
print(new_dic)
我几乎被困住了。首先,我由此生成的字典总计不正确,而且我根本不知道如何按值
对字典进行排序
这是一种使用 csv
和 collections
模块的方法
例如:
import csv
from collections import defaultdict, OrderedDict
result = defaultdict(int)
with open(filename) as infile:
reader = csv.DictReader(infile)
for row in reader: #Iterate Each row
result[row[" car_type"]] += int(row[" cost"]) #Add costs
print(OrderedDict(sorted(result.items(), key=lambda x: x[1], reverse=True)))
输出:
OrderedDict([(' benz', 140000), (' bmw', 56000), (' ford', 5000)])
使用pandas:
import pandas as pd
df = pd.read_csv(logFile)
result = df.groupby(' car_type').sum()
print(result)
输出:
id cost
car_type
benz 7 140000
bmw 8 56000
ford 6 5000
编辑:
logFile = "tem.csv"
array = []
import csv
with open("tem.csv", "r+") as fin:
for row in csv.reader(fin):
array.append(row[1:])
dd = {k: 0 for k in dict(array).keys()}
for x in array: dd[x[0]] += int(x[1])
print(dd)
输出:
{' benz': 140000, ' bmw': 56000, ' ford': 5000}
或者,如果您希望它们出现在列表中:
print([[k,v] for k,v in dd.items()])
输出:
[[' benz', 140000], [' bmw', 56000], [' ford', 5000]]
我有一个包含这样行的文件
id, car_type, cost
1, benz, 60000
2, benz, 55000
3, bmw, 30000
4, benz, 25000
5, bmw, 26000
6, ford, 5000
我想按每个 car_type
的总成本对该文件进行排序。例如,“benz
”的总费用为 60000 + 55000 + 25000 = 14000
所以最终输出应该是
benz, 140000
bmw, 56000
ford, 5000
到目前为止,这是我拥有的:
file = "small_sample.txt"
f=open(file,"r")
lines=f.readlines()[1:]
car_and_cost ={}
for x in lines:
cost = x.split(',')[4].rstrip('\n')
car_and_cost.update({x.split(',')[3]:float(cost)})
f.close()
print(car_and_cost)
new_dic = {}
for key,lis in car_and_cost.items():
new_dic[key] = sum(lis)
print(new_dic)
我几乎被困住了。首先,我由此生成的字典总计不正确,而且我根本不知道如何按值
对字典进行排序这是一种使用 csv
和 collections
模块的方法
例如:
import csv
from collections import defaultdict, OrderedDict
result = defaultdict(int)
with open(filename) as infile:
reader = csv.DictReader(infile)
for row in reader: #Iterate Each row
result[row[" car_type"]] += int(row[" cost"]) #Add costs
print(OrderedDict(sorted(result.items(), key=lambda x: x[1], reverse=True)))
输出:
OrderedDict([(' benz', 140000), (' bmw', 56000), (' ford', 5000)])
使用pandas:
import pandas as pd
df = pd.read_csv(logFile)
result = df.groupby(' car_type').sum()
print(result)
输出:
id cost
car_type
benz 7 140000
bmw 8 56000
ford 6 5000
编辑:
logFile = "tem.csv"
array = []
import csv
with open("tem.csv", "r+") as fin:
for row in csv.reader(fin):
array.append(row[1:])
dd = {k: 0 for k in dict(array).keys()}
for x in array: dd[x[0]] += int(x[1])
print(dd)
输出:
{' benz': 140000, ' bmw': 56000, ' ford': 5000}
或者,如果您希望它们出现在列表中:
print([[k,v] for k,v in dd.items()])
输出:
[[' benz', 140000], [' bmw', 56000], [' ford', 5000]]