两列的 Groupby 合计百分比
Percentage of Total with Groupby for two columns
我有一个数据框:
df = pd.DataFrame({
'Product': ['AA', 'AA', 'AA', 'AA', 'BB', 'BB', 'BB', 'BB'],
'Type': ['AC', 'AC', 'AD', 'AD', 'BC', 'BC', 'BD', 'BD'],
'Sales': [ 200, 100, 400, 100, 300, 100, 200, 500],
'Qty': [ 5, 3, 3, 6, 4, 7, 4, 1]})
我想尝试获取“销售额”和“数量”的“产品”和“类型”占总数的百分比。我可以分别获得“销售额”和“数量”占总数的百分比。但我想知道是否有办法对两列都这样做。
获取一列占总数的百分比,代码为:
df['Sales'] = df['Sales'].astype(float)
df['Qty'] = df['Qty'].astype(float)
df = df[['Product', 'Type', 'Sales']]
df = df.groupby(['Product', 'Type']).agg({'Sales': 'sum'})
pcts = df.groupby(level= [0]).apply(lambda x: 100 * x / float(x.sum()))
有没有办法一次性得到两列的数据?
您可以groupby
“产品”和“类型”获取每个组的总数。然后再次groupby
“Product”(级别=0)并转换sum
;然后用它除上一步的总和:
sm = df.groupby(['Product','Type']).sum()
out = sm / sm.groupby(level=0).transform('sum') * 100
输出:
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000
您可以链接 groupby
:
pct = lambda x: 100 * x / x.sum()
out = df.groupby(['Product', 'Type']).sum().groupby('Product').apply(pct)
print(out)
# Output
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000
一种选择是从各个 groupby 中获取值并除以:
numerator = df.groupby(["Product", "Type"]).sum()
denominator = df.groupby("Product").sum()
numerator.div(denominator, level = 0, axis = 'index') * 100
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000
我有一个数据框:
df = pd.DataFrame({
'Product': ['AA', 'AA', 'AA', 'AA', 'BB', 'BB', 'BB', 'BB'],
'Type': ['AC', 'AC', 'AD', 'AD', 'BC', 'BC', 'BD', 'BD'],
'Sales': [ 200, 100, 400, 100, 300, 100, 200, 500],
'Qty': [ 5, 3, 3, 6, 4, 7, 4, 1]})
我想尝试获取“销售额”和“数量”的“产品”和“类型”占总数的百分比。我可以分别获得“销售额”和“数量”占总数的百分比。但我想知道是否有办法对两列都这样做。
获取一列占总数的百分比,代码为:
df['Sales'] = df['Sales'].astype(float)
df['Qty'] = df['Qty'].astype(float)
df = df[['Product', 'Type', 'Sales']]
df = df.groupby(['Product', 'Type']).agg({'Sales': 'sum'})
pcts = df.groupby(level= [0]).apply(lambda x: 100 * x / float(x.sum()))
有没有办法一次性得到两列的数据?
您可以groupby
“产品”和“类型”获取每个组的总数。然后再次groupby
“Product”(级别=0)并转换sum
;然后用它除上一步的总和:
sm = df.groupby(['Product','Type']).sum()
out = sm / sm.groupby(level=0).transform('sum') * 100
输出:
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000
您可以链接 groupby
:
pct = lambda x: 100 * x / x.sum()
out = df.groupby(['Product', 'Type']).sum().groupby('Product').apply(pct)
print(out)
# Output
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000
一种选择是从各个 groupby 中获取值并除以:
numerator = df.groupby(["Product", "Type"]).sum()
denominator = df.groupby("Product").sum()
numerator.div(denominator, level = 0, axis = 'index') * 100
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000