从 itertools groupby 聚合的结果生成总和,and/or 产品总和等
Generating sums, and/or sums of products etc. from the results of itertools groupby aggrigation
是否有一种内置(或天真的)方法来处理(求和、计数)由 itertools.groupby 生成的聚合?
例如鉴于示例代码中的 table 折扣为 10%...
我想:
# Select each city...
for city,city_purchases_d in itertools.groupby(transaction_l,
lambda d: d["city"]):
print Aggregate( city,sum(|qty|),sum(|qty * price|)*(1-discount) ) *
city_purchases_d
输入数据:
discount=0.10 # 10%
transaction_l=(
dict(trans=201, name="Anne", city="LAX", item="Apple", qty=10, price=1.33),
dict(trans=202, name="Betty", city="LAX", item="Banana",qty=20, price=2.33),
dict(trans=203, name="Carol", city="LAX", item="Cherry",qty=30, price=3.33),
dict(trans=101, name="Andy", city="NYC", item="Avodado",qty=1, price=1.32),
dict(trans=102, name="Andy", city="NYC", item=u"Açaí", qty=1, price=1.70),
dict(trans=103, name="Bob", city="NYC", item="Bacuri", qty=3, price=2.10),
dict(trans=104, name="Cliff", city="NYC", item="Carrot", qty=4, price=2.22),
dict(trans=105, name="David", city="NYC", item="Donut", qty=5, price=3.00)
)
输出将是:
('LAX',60,143.82)
('NYC',14,29.88)
即
In LAX purchased 60 fruit at the total price of 3.82
In NYC purchased 14 fruit at the total price of .88
ps。我注意到有很多类似的问题...但是 none 只是采用(类似于)一个天真的表达式 city,sum(|qty|),sum(|qty * price|)*(1-discount)
进行聚合。
编辑:(以使用生成器理解为代价)几乎可以达到如下效果:
discount=0.10 # 10%
desc_f="In %s purchased %s fruit at the total price of $%.2f"
for city,city_purchases_d in itertools.groupby(transaction_l, lambda d: d["city"]):
# alternatively - Plan B: manually creating aggregation DOES also work:
qty_x_price=list(trans["qty"]*trans["price"] for trans in list(city_purchases_d))
qty=(trans["qty"] for trans in city_purchases_d)
print desc_f%(city,sum(qty),sum(qty_x_price)*(1-discount))
我假设您在数据的聚合处理中需要一些灵活性,也许是由用户输入的?否则,使用 itertools.groupby
:
很容易做到这一点
from itertools import groupby
discount=0.10
transaction_l=(
dict(trans=201, name="Anne", city="LAX", item="Apple", qty=10, price=1.33),
dict(trans=202, name="Betty", city="LAX", item="Banana",qty=20, price=2.33),
dict(trans=203, name="Carol", city="LAX", item="Cherry",qty=30, price=3.33),
dict(trans=101, name="Andy", city="NYC", item="Avodado",qty=1, price=1.32),
dict(trans=102, name="Andy", city="NYC", item=u"Açaí", qty=1, price=1.70),
dict(trans=103, name="Bob", city="NYC", item="Bacuri", qty=3, price=2.10),
dict(trans=104, name="Cliff", city="NYC", item="Carrot", qty=4, price=2.22),
dict(trans=105, name="David", city="NYC", item="Donut", qty=5, price=3.00)
)
desc_f = 'In %s purchased %s fruit at the total price of $%.2f'
for city, transactions in groupby(transaction_l, key=lambda d: d['city']):
transactions = list(transactions)
print desc_f % (city,
sum(t['qty'] for t in transactions),
sum( (t['qty']*t['price'])*(1-discount)
for t in transactions))
输出
In LAX purchased 60 fruit at the total price of 3.82
In NYC purchased 14 fruit at the total price of .88
如果您需要对数据执行任意 "queries" 的灵活性,这可能是一个天真的(甚至是奇怪的)建议,但是使用 SQL 查询内存中的 SQL站点数据库?
import sqlite3
discount=0.10 # 10%
transaction_l=(
dict(trans=201, name="Anne", city="LAX", item="Apple", qty=10, price=1.33),
dict(trans=202, name="Betty", city="LAX", item="Banana",qty=20, price=2.33),
dict(trans=203, name="Carol", city="LAX", item="Cherry",qty=30, price=3.33),
dict(trans=101, name="Andy", city="NYC", item="Avodado",qty=1, price=1.32),
dict(trans=102, name="Andy", city="NYC", item=u"Açaí", qty=1, price=1.70),
dict(trans=103, name="Bob", city="NYC", item="Bacuri", qty=3, price=2.10),
dict(trans=104, name="Cliff", city="NYC", item="Carrot", qty=4, price=2.22),
dict(trans=105, name="David", city="NYC", item="Donut", qty=5, price=3.00)
)
memdb = sqlite3.connect(':memory:')
cursor = memdb.cursor()
# create an in-memory table
r = cursor.execute('create table transactions (trans int, name varchar(30), city char(3), item varchar(20), qty int, price numeric)')
result = cursor.executemany('insert into transactions (trans, name, city, item, qty, price) values (:trans, :name, :city, :item, :qty, :price)', transaction_l)
assert result.rowcount == len(transaction_l)
result = cursor.execute('select city, sum(qty), sum(qty*price)*(1-{}) from transactions group by city'.format(discount))
desc_f = 'In {} purchased {} fruit at the total price of ${:.2f}'
for row in result:
print desc_f.format(*row)
memdb.close()
输出
In LAX purchased 60 fruit at the total price of 3.82
In NYC purchased 14 fruit at the total price of .88
所以,现在您的任务是创建一个 SQL 查询,如下所示:
select city, sum(qty), sum(qty*price)*(1-0.1) from transactions group by city
来自这个:
city,sum(|qty|),sum(|qty * price|)*(1-discount)
这看起来很可行。
可以使用 Pandas 模块以更简单的方式存档所需的结果,
import pandas as pd
discount = 0.2
df = pd.DataFrame(list(transaction_l))
df['total_price'] = df.qty*df.price*(1-discount)
res = df.groupby('city').sum()[['qty', 'total_price']]
print(res)
# qty total_price
#city
#LAX 60 127.84
#NYC 14 26.56
我添加以下代码示例只是出于兴趣...
def eval_iter(expr_str, global_d, local_d, dict_iter, sep="|"):
expr_l=expr_str.split(sep)
if isinstance(dict_iter, dict): dict_iter=dict_iter.itervalues()
aggregation_l=[]
for eval_locals in dict_iter:
locals().update(eval_locals)
aggregation_l.append(
eval(",".join(expr_l[1::2]).join("[]"), globals(), locals()))
for key in eval_locals: del locals()[key] # cleanup a bit
expr_l[1::2]=["aggregation_l[%d]"%enum for enum in range(len(expr_l)/2)]
local_d["aggregation_l"]=zip(*aggregation_l)
return eval("".join(expr_l), global_d, local_d)
discount=0.10 # 10%
desc_f="In %s purchased %s fruit at the total price of $%.2f"
# The QUERY: -------- 8>< - - - - cut here - - - -
for city,city_purchases_d in itertools.groupby(transaction_l,
lambda d: d["city"]):
print desc_f%eval_iter("city,sum(|qty|),sum(|qty * price|)*(1-discount)",
globals(), locals(), city_purchases_d)
输出:根据需要...
In LAX purchased 60 fruit at the total price of 3.82
In NYC purchased 14 fruit at the total price of .88
是否有一种内置(或天真的)方法来处理(求和、计数)由 itertools.groupby 生成的聚合?
例如鉴于示例代码中的 table 折扣为 10%...
我想:
# Select each city...
for city,city_purchases_d in itertools.groupby(transaction_l,
lambda d: d["city"]):
print Aggregate( city,sum(|qty|),sum(|qty * price|)*(1-discount) ) *
city_purchases_d
输入数据:
discount=0.10 # 10%
transaction_l=(
dict(trans=201, name="Anne", city="LAX", item="Apple", qty=10, price=1.33),
dict(trans=202, name="Betty", city="LAX", item="Banana",qty=20, price=2.33),
dict(trans=203, name="Carol", city="LAX", item="Cherry",qty=30, price=3.33),
dict(trans=101, name="Andy", city="NYC", item="Avodado",qty=1, price=1.32),
dict(trans=102, name="Andy", city="NYC", item=u"Açaí", qty=1, price=1.70),
dict(trans=103, name="Bob", city="NYC", item="Bacuri", qty=3, price=2.10),
dict(trans=104, name="Cliff", city="NYC", item="Carrot", qty=4, price=2.22),
dict(trans=105, name="David", city="NYC", item="Donut", qty=5, price=3.00)
)
输出将是:
('LAX',60,143.82)
('NYC',14,29.88)
即
In LAX purchased 60 fruit at the total price of 3.82
In NYC purchased 14 fruit at the total price of .88
ps。我注意到有很多类似的问题...但是 none 只是采用(类似于)一个天真的表达式 city,sum(|qty|),sum(|qty * price|)*(1-discount)
进行聚合。
编辑:(以使用生成器理解为代价)几乎可以达到如下效果:
discount=0.10 # 10%
desc_f="In %s purchased %s fruit at the total price of $%.2f"
for city,city_purchases_d in itertools.groupby(transaction_l, lambda d: d["city"]):
# alternatively - Plan B: manually creating aggregation DOES also work:
qty_x_price=list(trans["qty"]*trans["price"] for trans in list(city_purchases_d))
qty=(trans["qty"] for trans in city_purchases_d)
print desc_f%(city,sum(qty),sum(qty_x_price)*(1-discount))
我假设您在数据的聚合处理中需要一些灵活性,也许是由用户输入的?否则,使用 itertools.groupby
:
from itertools import groupby
discount=0.10
transaction_l=(
dict(trans=201, name="Anne", city="LAX", item="Apple", qty=10, price=1.33),
dict(trans=202, name="Betty", city="LAX", item="Banana",qty=20, price=2.33),
dict(trans=203, name="Carol", city="LAX", item="Cherry",qty=30, price=3.33),
dict(trans=101, name="Andy", city="NYC", item="Avodado",qty=1, price=1.32),
dict(trans=102, name="Andy", city="NYC", item=u"Açaí", qty=1, price=1.70),
dict(trans=103, name="Bob", city="NYC", item="Bacuri", qty=3, price=2.10),
dict(trans=104, name="Cliff", city="NYC", item="Carrot", qty=4, price=2.22),
dict(trans=105, name="David", city="NYC", item="Donut", qty=5, price=3.00)
)
desc_f = 'In %s purchased %s fruit at the total price of $%.2f'
for city, transactions in groupby(transaction_l, key=lambda d: d['city']):
transactions = list(transactions)
print desc_f % (city,
sum(t['qty'] for t in transactions),
sum( (t['qty']*t['price'])*(1-discount)
for t in transactions))
输出
In LAX purchased 60 fruit at the total price of 3.82 In NYC purchased 14 fruit at the total price of .88
如果您需要对数据执行任意 "queries" 的灵活性,这可能是一个天真的(甚至是奇怪的)建议,但是使用 SQL 查询内存中的 SQL站点数据库?
import sqlite3
discount=0.10 # 10%
transaction_l=(
dict(trans=201, name="Anne", city="LAX", item="Apple", qty=10, price=1.33),
dict(trans=202, name="Betty", city="LAX", item="Banana",qty=20, price=2.33),
dict(trans=203, name="Carol", city="LAX", item="Cherry",qty=30, price=3.33),
dict(trans=101, name="Andy", city="NYC", item="Avodado",qty=1, price=1.32),
dict(trans=102, name="Andy", city="NYC", item=u"Açaí", qty=1, price=1.70),
dict(trans=103, name="Bob", city="NYC", item="Bacuri", qty=3, price=2.10),
dict(trans=104, name="Cliff", city="NYC", item="Carrot", qty=4, price=2.22),
dict(trans=105, name="David", city="NYC", item="Donut", qty=5, price=3.00)
)
memdb = sqlite3.connect(':memory:')
cursor = memdb.cursor()
# create an in-memory table
r = cursor.execute('create table transactions (trans int, name varchar(30), city char(3), item varchar(20), qty int, price numeric)')
result = cursor.executemany('insert into transactions (trans, name, city, item, qty, price) values (:trans, :name, :city, :item, :qty, :price)', transaction_l)
assert result.rowcount == len(transaction_l)
result = cursor.execute('select city, sum(qty), sum(qty*price)*(1-{}) from transactions group by city'.format(discount))
desc_f = 'In {} purchased {} fruit at the total price of ${:.2f}'
for row in result:
print desc_f.format(*row)
memdb.close()
输出
In LAX purchased 60 fruit at the total price of 3.82 In NYC purchased 14 fruit at the total price of .88
所以,现在您的任务是创建一个 SQL 查询,如下所示:
select city, sum(qty), sum(qty*price)*(1-0.1) from transactions group by city
来自这个:
city,sum(|qty|),sum(|qty * price|)*(1-discount)
这看起来很可行。
可以使用 Pandas 模块以更简单的方式存档所需的结果,
import pandas as pd
discount = 0.2
df = pd.DataFrame(list(transaction_l))
df['total_price'] = df.qty*df.price*(1-discount)
res = df.groupby('city').sum()[['qty', 'total_price']]
print(res)
# qty total_price
#city
#LAX 60 127.84
#NYC 14 26.56
我添加以下代码示例只是出于兴趣...
def eval_iter(expr_str, global_d, local_d, dict_iter, sep="|"):
expr_l=expr_str.split(sep)
if isinstance(dict_iter, dict): dict_iter=dict_iter.itervalues()
aggregation_l=[]
for eval_locals in dict_iter:
locals().update(eval_locals)
aggregation_l.append(
eval(",".join(expr_l[1::2]).join("[]"), globals(), locals()))
for key in eval_locals: del locals()[key] # cleanup a bit
expr_l[1::2]=["aggregation_l[%d]"%enum for enum in range(len(expr_l)/2)]
local_d["aggregation_l"]=zip(*aggregation_l)
return eval("".join(expr_l), global_d, local_d)
discount=0.10 # 10%
desc_f="In %s purchased %s fruit at the total price of $%.2f"
# The QUERY: -------- 8>< - - - - cut here - - - -
for city,city_purchases_d in itertools.groupby(transaction_l,
lambda d: d["city"]):
print desc_f%eval_iter("city,sum(|qty|),sum(|qty * price|)*(1-discount)",
globals(), locals(), city_purchases_d)
输出:根据需要...
In LAX purchased 60 fruit at the total price of 3.82
In NYC purchased 14 fruit at the total price of .88