有没有更简单的方法从 groupby 获取对象并放入字典?

Is there a simpler way to get object from groupby and putting in dictionary?

所以我的数据框看起来像这样: 我试图找到一种更简单的方法来从 Groupby 中获取对象,然后将其放入字典中。 我必须获取索引,然后执行 for 循环以获取 Product.

中每一行的确切字符串

如果需要更多详细信息: 我的目标是找到重复的订单 ID,然后从列中取出产品并添加到字典中:

(我不是在寻找优化查找重复项的方法,我知道我可以使用 df.duplicated

代码:

for date, df in df1.groupby('Order Date'):
    
    if  df.Product.count() > 1:

        indice = df.Product.index
        for data in indice:
            product = df.loc[data].at['Product']
            #update dictionary counter
            product_dict[product] = product_dict.get(product) + 1
      
    else:
        continue

为方便起见,您可以改用此 df。我列为字典:

{'Order ID': ['147268', '148041', '149343', '149964', '149350', '141732', '149620', '142451', '146039', '143498', '141316', '144804', '144804', '145270', '142789'],
 'Product': ['Wired Headphones', 'USB-C Charging Cable', 'Apple Airpods Headphones', 'AAA Batteries (4-pack)', 'USB-C Charging Cable', 'iPhone', 'Lightning Charging Cable', 'AAA Batteries (4-pack)', '34in Ultrawide Monitor', 'AA Batteries (4-pack)', 'AAA Batteries (4-pack)', 'Wired Headphones', 'iPhone', 'Google Phone', 'AAA Batteries (4-pack)']}

预期输出:

{'Wired Headphones': 8090, 'USB-C Charging Cable': 9425, 'Apple Airpods Headphones': 6374, 'AAA Batteries (4-pack)': 8266, 'iPhone': 3663, 'Lightning Charging Cable': 9074, '34in 超宽显示器': 2500, 'AA Batteries (4-pack)': 8167, 'Google Phone': 3091, 'Macbook Pro Laptop': 1878, 'ThinkPad Laptop': 1605, '27in FHD 显示器': 3010,'Bose SoundSport Headphones':5459,'Flatscreen TV':1827,“27 英寸 4K 游戏显示器”:2457,'LG Dryer':257,“20 英寸显示器”:1635,'LG Washing Machine':268, 'Vareebadd Phone': 1120}

# number of products per order 
prods_per_order = df.groupby(['Order ID'])["Product"].transform("count")

res = ( 
    df.loc[prods_per_order > 1, "Product"]   # Select only the products that were ordered together with another(s) product(s)
      .value_counts()      # count how many times were per product 
      .to_dict()           # convert the result to a dict 
)

输入

df = pd.DataFrame({
    'Order ID': ['147268', '148041', '149343', '149964', '149350', 
                 '141732', '149620', '142451', '146039', '143498', 
                 '141316', '144804', '144804', '145270', '142789'],
     'Product': ['Wired Headphones', 'USB-C Charging Cable', 'Apple Airpods Headphones', 
                 'AAA Batteries (4-pack)', 'USB-C Charging Cable', 'iPhone', 
                 'Lightning Charging Cable', 'AAA Batteries (4-pack)', '34in Ultrawide Monitor', 
                 'AA Batteries (4-pack)', 'AAA Batteries (4-pack)', 'Wired Headphones', 
                 'iPhone', 'Google Phone', 'AAA Batteries (4-pack)']
})

df = df.sort_values(['Order ID', 'Product'])
>>> df 

   Order ID                   Product
10   141316    AAA Batteries (4-pack)
5    141732                    iPhone
7    142451    AAA Batteries (4-pack)
14   142789    AAA Batteries (4-pack)
9    143498     AA Batteries (4-pack)
11   144804          Wired Headphones  # <-- Note that only these two products
12   144804                    iPhone  # <--    were ordered together 
13   145270              Google Phone
8    146039    34in Ultrawide Monitor
0    147268          Wired Headphones
1    148041      USB-C Charging Cable
2    149343  Apple Airpods Headphones
4    149350      USB-C Charging Cable
6    149620  Lightning Charging Cable
3    149964    AAA Batteries (4-pack)

输出

>>> res

{'iPhone': 1, 'Wired Headphones': 1}

也许我误解了,但这似乎可以通过使用 Counter:

来实现您想要实现的目标
from collections import Counter

mask = (
    df.groupby(["Order Date", "Order ID"], sort=False)["Product"]
      .transform("count")
      .gt(1)
)
product_dict = Counter(df.loc[mask, "Product"])

略微修改示例数据框的结果(添加了 Order Date 列)

   Order Date Order ID                   Product
0  2021-11-11   147268          Wired Headphones
1  2021-11-11   148041      USB-C Charging Cable
2  2021-11-11   149343  Apple Airpods Headphones
3  2021-11-11   149964    AAA Batteries (4-pack)
4  2021-11-11   149350      USB-C Charging Cable
5  2021-11-12   141732                    iPhone
6  2021-11-12   149620  Lightning Charging Cable
7  2021-11-12   142451    AAA Batteries (4-pack)
8  2021-11-12   146039    34in Ultrawide Monitor
9  2021-11-12   143498     AA Batteries (4-pack)
10 2021-11-12   141316    AAA Batteries (4-pack)
11 2021-11-12   144804          Wired Headphones
12 2021-11-12   144804                    iPhone
13 2021-11-12   145270              Google Phone
14 2021-11-12   142789    AAA Batteries (4-pack)

Counter({'Wired Headphones': 1, 'iPhone': 1})

也许 groupby 超过 Order ID 就足够了,但由于你在 Order Date 上分组,我怀疑它不够。