在另一列中删除转换值后日期的行以制作归因链
Delete rows with dates after conversion value in another column for making attribution chains
我有一个 pd.dataframe 看起来像这样:
cookie date channel goal_reached
cookie_1 2020-01-12 paid 0
cookie_1 2020-02-17 organic 0
cookie_1 2020-04-02 referral 1
cookie_1 2020-05-13 direct 0
cookie_1 2020-05-16 direct 0
cookie_2 2020-01-18 referral 0
cookie_2 2020-03-13 paid 1
cookie_2 2020-04-01 organic 0
cookie_2 2020-05-16 organic 0
cookie_2 2020-05-22 paid 0
cookie_3 2020-01-13 direct 0
cookie_3 2020-04-14 organic 0
cookie_3 2020-06-10 organic 0
我想对每个 cookie 值进行分组,并删除 goal_reached 值 1 日期之后的所有行。如果 cookie 没有 goal_reached 值 1,我需要全部行。
我想要这样的最终输出:
cookie channel goal_reached
cookie_1 paid > organic > referral 1
cookie_2 referral > paid 1
cookie_3 direct > organic > organic 0
我有以下代码,但它可以对所有行进行分组:
df = df.sort_values(['cookie', 'date'],
ascending=[False, True])
df = df.groupby('cookie', as_index=False).agg({'channel': lambda x: "%s" % ' > '.join(x), 'reg_goal': 'max'})
你可以试试这个:
df = df[df.groupby('cookie')['goal_reached'].transform(lambda x: x.cumsum().cumsum()).lt(2)]
df = df.groupby('cookie').agg({'channel': lambda x: ' > '.join(x), 'goal_reached': 'max'})
print(df)
channel goal_reached
cookie
cookie_1 paid > organic > referral 1
cookie_2 referral > paid 1
cookie_3 direct > organic > organic 0
我有一个 pd.dataframe 看起来像这样:
cookie date channel goal_reached
cookie_1 2020-01-12 paid 0
cookie_1 2020-02-17 organic 0
cookie_1 2020-04-02 referral 1
cookie_1 2020-05-13 direct 0
cookie_1 2020-05-16 direct 0
cookie_2 2020-01-18 referral 0
cookie_2 2020-03-13 paid 1
cookie_2 2020-04-01 organic 0
cookie_2 2020-05-16 organic 0
cookie_2 2020-05-22 paid 0
cookie_3 2020-01-13 direct 0
cookie_3 2020-04-14 organic 0
cookie_3 2020-06-10 organic 0
我想对每个 cookie 值进行分组,并删除 goal_reached 值 1 日期之后的所有行。如果 cookie 没有 goal_reached 值 1,我需要全部行。
我想要这样的最终输出:
cookie channel goal_reached
cookie_1 paid > organic > referral 1
cookie_2 referral > paid 1
cookie_3 direct > organic > organic 0
我有以下代码,但它可以对所有行进行分组:
df = df.sort_values(['cookie', 'date'],
ascending=[False, True])
df = df.groupby('cookie', as_index=False).agg({'channel': lambda x: "%s" % ' > '.join(x), 'reg_goal': 'max'})
你可以试试这个:
df = df[df.groupby('cookie')['goal_reached'].transform(lambda x: x.cumsum().cumsum()).lt(2)]
df = df.groupby('cookie').agg({'channel': lambda x: ' > '.join(x), 'goal_reached': 'max'})
print(df)
channel goal_reached
cookie
cookie_1 paid > organic > referral 1
cookie_2 referral > paid 1
cookie_3 direct > organic > organic 0