通过不同列的重复值之间的条件
Conditional between duplicated values through different columns
当一个客户有多个订阅时,它就是重复的。
我想为整个客户状态生成一个 new_status,而不是为每个订阅生成一个 new_status:
给已重新激活订阅的客户
以及已取消一项订阅但仍有另一项有效订阅的客户。
df:
Customer | Status | Canceled_at | Created | New_status
X | Active | |8/9/2017 |
X |Canceled | 8/3/2017 |6/19/2017 |
Y | Active | |2/13/2019 |
Y |Canceled | 11/28/2018 |10/14/2018|
Z | Active | |3/29/2018 |
Z |Canceled | 8/8/2018 |7/10/2018 |
A |Canceled | 9/2/2018 |7/10/2018 |
A |Canceled | 9/29/2018 |7/12/2018 |
A |Active | |5/31/2018 |
这些情况的条件是:
如果取消副本的 'canceled_at' 日期 > 活动的 'created' 日期:新_status 将是 'Downgrade'
如果取消副本的 'canceled_at' 日期 < 'created' 日期
活动:new_status 将是 'Reactivate'
期望的输出:
Customer | Status | Canceled_at | Created | New_status
X | Active | |8/9/2017 |Reactivate
X |Canceled | 8/3/2017 |6/19/2017 |Reactivate
Y | Active | |2/13/2019 |Reactivate
Y |Canceled | 11/28/2018 |10/14/2018|Reactivate
Z | Active | |3/29/2018 |Downgrade
Z |Canceled | 8/8/2018 |7/10/2018 |Downgrade
A |Canceled | 9/2/2018 |7/10/2018 |Downgrade
A |Canceled | 9/29/2018 |7/12/2018 |Downgrade
A |Active | |5/31/2018 |Downgrade
我太新了,无法发表评论,但我需要更多信息,为什么 'Y' 客户重新激活?也许我不明白你的解释,因为客户 'A' 处于类似情况,而你给了它 'Downgrade'。也许只是 re-type 你的问题,但假装它是给一个 8 岁的孩子阅读的(我)。
这是您想要的代码,它有效:
#convert columns to dates
df['Canceled_at'] = pd.to_datetime(df['Canceled_at'])
df['Created'] = pd.to_datetime(df['Created'])
#make customer a list so we can loop through it
customer = list(df['Customer'].drop_duplicates())
#super awesome for loop that give us the largest date (this is the part where maybe your logic is different than what I read it as)
for c in customer:
df.loc[(df['Customer'] == c), 'Most Recent Cancel'] = df.loc[(df['Customer'] == c)]['Canceled_at'].max()
df.loc[(df['Customer'] == c), 'Most Recent Created'] = df.loc[(df['Customer'] == c)]['Created'].max()
#Make 'New_status' column
df.loc[(df['Most Recent Created'] > df['Most Recent Cancel']), 'New_status'] = 'Reactivate'
df.loc[(df['New_status'] != 'Reactivate'), 'New_status'] = 'Downgrade'
当一个客户有多个订阅时,它就是重复的。 我想为整个客户状态生成一个 new_status,而不是为每个订阅生成一个 new_status: 给已重新激活订阅的客户 以及已取消一项订阅但仍有另一项有效订阅的客户。
df:
Customer | Status | Canceled_at | Created | New_status
X | Active | |8/9/2017 |
X |Canceled | 8/3/2017 |6/19/2017 |
Y | Active | |2/13/2019 |
Y |Canceled | 11/28/2018 |10/14/2018|
Z | Active | |3/29/2018 |
Z |Canceled | 8/8/2018 |7/10/2018 |
A |Canceled | 9/2/2018 |7/10/2018 |
A |Canceled | 9/29/2018 |7/12/2018 |
A |Active | |5/31/2018 |
这些情况的条件是: 如果取消副本的 'canceled_at' 日期 > 活动的 'created' 日期:新_status 将是 'Downgrade' 如果取消副本的 'canceled_at' 日期 < 'created' 日期 活动:new_status 将是 'Reactivate'
期望的输出:
Customer | Status | Canceled_at | Created | New_status
X | Active | |8/9/2017 |Reactivate
X |Canceled | 8/3/2017 |6/19/2017 |Reactivate
Y | Active | |2/13/2019 |Reactivate
Y |Canceled | 11/28/2018 |10/14/2018|Reactivate
Z | Active | |3/29/2018 |Downgrade
Z |Canceled | 8/8/2018 |7/10/2018 |Downgrade
A |Canceled | 9/2/2018 |7/10/2018 |Downgrade
A |Canceled | 9/29/2018 |7/12/2018 |Downgrade
A |Active | |5/31/2018 |Downgrade
我太新了,无法发表评论,但我需要更多信息,为什么 'Y' 客户重新激活?也许我不明白你的解释,因为客户 'A' 处于类似情况,而你给了它 'Downgrade'。也许只是 re-type 你的问题,但假装它是给一个 8 岁的孩子阅读的(我)。
这是您想要的代码,它有效:
#convert columns to dates
df['Canceled_at'] = pd.to_datetime(df['Canceled_at'])
df['Created'] = pd.to_datetime(df['Created'])
#make customer a list so we can loop through it
customer = list(df['Customer'].drop_duplicates())
#super awesome for loop that give us the largest date (this is the part where maybe your logic is different than what I read it as)
for c in customer:
df.loc[(df['Customer'] == c), 'Most Recent Cancel'] = df.loc[(df['Customer'] == c)]['Canceled_at'].max()
df.loc[(df['Customer'] == c), 'Most Recent Created'] = df.loc[(df['Customer'] == c)]['Created'].max()
#Make 'New_status' column
df.loc[(df['Most Recent Created'] > df['Most Recent Cancel']), 'New_status'] = 'Reactivate'
df.loc[(df['New_status'] != 'Reactivate'), 'New_status'] = 'Downgrade'