Python、Pandas：标记自上而下的行占总销量的 80%

Question

我有 table 产品及其以美元计算的销售金额。我有总销量，想知道哪些产品占总销量的 80%，并在标签栏中将它们标记为 1。请务必按照从大到小的顺序标记 1。低于总销量的是 32，其中 80% 是 25.6。因此，如果我们添加第 2、4、5 和 7 行，从 sold$ 列中的最大数字到最小数字，它将是 26，这使得总销售的 80% 为 32，并将它们标记为 1，其他标记为 0。我想用 python 和 pandas 来完成。先感谢您。最好的祝福

Answer 1

计算每个产品的销售额分数，按分数排序，计算它们的累计总和并以此得到前 80%

cumsum = (df["sold"]/df["sold"].sum()).sort_values().cumsum()
df["label"] = pd.Series(0, index=cumsum.index).where(cumsum <= 0.2, 1)

Answer 2

你可以这样做：

import pandas as pd
import numpy as np

data = {'productID':[1,2,3,4,5,6,7],'sold$':[2,4,3,8,5,1,9]}

df=pd.DataFrame(data)
df.sort_values('sold$',inplace=True)

df['Label']=np.where(df['sold$'].cumsum()<=df['sold$'].sum() * 0.2,0,1)
df.sort_index(inplace=True)


print (df)

结果：

   productID  sold$  Label
0          1      2      0
1          2      4      1
2          3      3      0
3          4      8      1
4          5      5      1
5          6      1      0
6          7      9      1

Python、Pandas：标记自上而下的行占总销量的 80%

Python, Pandas: mark top down rows making 80 percent of total sell

python

label

pandas