Python pandas if语句基于两个条件

Question

小猪回避这个问题

#code to re-create my example date
df = pd.DataFrame({'customer_id': ['abc','abc','xyz','xyz','xyz','xyz','thr','thr','abc','abc','urt','urt'],
                   'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678','E789','E789','D903','F865'], 
                   'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678,908354,909238,436758,346577],
                   'product_category': ['X','X','Y','Y','X','Y','Y','X','Y','Z','X','X']})

#example data
customer_id|    transaction_id | product_id | product_category
abc             A123              255472             X
abc             A123              251235             X
xyz             B345              253764             Y
xyz             B345              257344             Y
xyz             C567              221577             X
xyz             C567              209809             Y
thr             D678              223551             Y
thr             D678              290678             X
abc             E789              908354             Y
abc             E789              909238             Z
urt             D903              436758             X
urt             F865              346577             X

我想标记所有 customer_ids 在不同交易中（不在同一交易中）具有 X 和 Y 的交易。

#expected output
customer_id|    transaction_id | product_id | product_category | flag 
abc             A123              255472             X            1
abc             A123              251235             X            1
xyz             B345              253764             Y            0
xyz             B345              257344             Y            0
xyz             C567              221577             X            0
xyz             C567              209809             Y            0
thr             D678              223551             Y            0
thr             D678              290678             X            0
abc             E789              908354             Y            1
abc             E789              909238             Z            1
urt             D903              436758             X            0
urt             F865              346577             X            0

我想不出一个干净的解决方案。在上面的示例中，我们有客户 abc，他只与产品类别 X 进行交易，然后与产品类别 Y 和 Z 进行交易。这是我要标记的客户，他们有 X 和 Y，但在不同的 transaction_ids.

我想到的一种方法是使用我之前回答中的代码：

df['pre_flag']=df.groupby('transaction_id')['product_category'].transform(lambda x: x + ' only' if len(set(x)) < 2 else ' & '.join(set(x)))

然后将数据集一分为二：

df_1 = df.loc[df['pre_flag'] == 'X&Y'].copy()
df_2 = df.loc[df['pre_flag'] != 'X&Y'].copy()

... 并使用 isin 语句；但这很乱；必须有更好的方法。谢谢！

Answer 1

这是使用 groupby 和 pd.Series.apply 的一种方式。

df = pd.DataFrame({'customer_id': ['abc','abc','xyz','xyz','xyz','xyz','thr','thr','abc','abc','urt','urt'],
                   'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678','E789','E789','D903','F865'], 
                   'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678,908354,909238,436758,346577],
                   'product_category': ['X','X','Y','Y','X','Y','Y','X','Y','Z','X','X']})

g = df.groupby(['customer_id', 'transaction_id'])['product_category']\
      .apply(lambda x: {i for i in x if i in ('X', 'Y')}).reset_index()

g2 = g.groupby('customer_id')['product_category']\
      .apply(list).apply(lambda x: ({'X'} in x) and ({'Y'} in x))

print(g2)
# customer_id
# abc     True
# thr    False
# urt    False
# xyz    False
# Name: product_category, dtype: bool

df['flag'] = df['customer_id'].isin(g2[g2].index)

print(df)

#    customer_id product_category  product_id transaction_id   flag
# 0          abc                X      255472           A123   True
# 1          abc                X      251235           A123   True
# 2          xyz                Y      253764           B345  False
# 3          xyz                Y      257344           B345  False
# 4          xyz                X      221577           C567  False
# 5          xyz                Y      209809           C567  False
# 6          thr                Y      223551           D678  False
# 7          thr                X      290678           D678  False
# 8          abc                Y      908354           E789   True
# 9          abc                Z      909238           E789   True
# 10         urt                X      436758           D903  False
# 11         urt                X      346577           F865  False

Python pandas if语句基于两个条件

Python pandas if statement based on two conditions

python

conditional

if-statement

group-by

pandas