根据条件以有序方式填充 NA 值

Question

CustomerID	Buying_Round	Date	Purchase_amount$
1001	2	01/02/2020	20
1001	NaN	07/03/2020	42
1001	NaN	01/01/2020	15
1002	NaN	01/07/2020	10
1002	NaN	07/04/2020	40
1002	NaN	12/11/2020	25
1003	1	22/02/2020	30
1003	NaN	14/03/2020	52
1003	NaN	10/10/2020	45

Customer Id 是根据某些机密数据为每个客户指定的唯一编号。这是一家杂货店，它试图根据顾客进来的次数和购买量来了解顾客的购买倾向，以改善库存。 购买回合 是他们第 n 次 来店。

我所做的就是在这些混乱的信息中我对客户 ID 进行了排序我也可以对日期进行排序但希望尽可能保持问题的原始性。在对日期进行排序后，很明显日期会影响购买回合，我打算保留购买回合以计算回合与购买。现在我想以递增的顺序填写购买回合，从 1 到唯一客户出现的次数，然后为下一个客户再次从 1 开始。

预期输出：

CustomerID	Buying_Round	Date	Purchase_amount$
1001	2	01/02/2020	20
1001	3	07/03/2020	42
1001	1	01/01/2020	15
1002	2	01/07/2020	10
1002	1	07/04/2020	40
1002	3	12/11/2020	25
1003	1	22/02/2020	30
1003	2	14/03/2020	52
1003	3	10/10/2020	45

注意：1001 是一个示例，原始数据有 1001 次出现 12 次，1002 次出现 4 次，1003 次出现 15 次，今年共有 11000 个唯一客户，没有模式或固定值来确定每个客户的数量唯一 ID，我们确实有 value_counts，但希望使用一种比硬编码更简单的方法。

Answer 1

我们可以试试groupby rank after converting the Date column to_datetime:

df['Buying_Round'] = (
    pd.to_datetime(df['Date'], dayfirst=True)
        .groupby(df['CustomerID']).rank(method='dense')
        .astype(int)
)

或 sort_values and groupby cumcount:

df['Buying_Round'] = (
        pd.to_datetime(df['Date'], dayfirst=True)
        .sort_values()
        .groupby(df['CustomerID']).cumcount() + 1
)

两者都产生：

   CustomerID  Buying_Round        Date  Purchase_amount$
0        1001             2  01/02/2020                20
1        1001             3  07/03/2020                42
2        1001             1  01/01/2020                15
3        1002             2  01/07/2020                10
4        1002             1  07/04/2020                40
5        1002             3  12/11/2020                25
6        1003             1  22/02/2020                30
7        1003             2  14/03/2020                52
8        1003             3  10/10/2020                45

根据条件以有序方式填充 NA 值

Fill NA values in ordered manner based on condition

python

datetime

dataframe

pandas

fillna