如何计算列上的序数?

How to calculate ordinal number on columns?

我有包含列 user_id 和类型的数据集:

user_id type ordinal_number
1 request 1
1 request 1
1 request 1
1 request 1
1 payment 1
2 request 1
2 request 1
2 payment 1
2 request 2
2 payment 2

我想在表格中填充第 ordinal_number 列值。 如果 type == payment 则分配一个序号并在 user_id values 序号上填写所有前一行(type==request)。

对于某些用户,它可能只是请求,并且可能是连续多次付款。

IIUC,您想在“付款”上设置一个计数器并按组回填:

m = df['type'].eq('payment')

df['ordinal_number'] = (m.cumsum().where(m)
                         .groupby(df['user_id'])
                         .bfill().astype(int)
                       )

或使用值或“user_id”作为起始值:

df['ordinal_number'] = (df['user_id'].where(df['type'].eq('payment'))
                         .groupby(df['user_id'])
                         .bfill().astype(int)
                       )

输出:

   user_id     type  ordinal_number
0        1  request               1
1        1  request               1
2        1  request               1
3        1  request               1
4        1  payment               1
5        2  request               2
6        2  request               2
7        2  payment               2

您可以识别“付款”; groupby "user_id" 并在每组中,反转系列,找到 cumsum,然后将其反转回来。

def assign_num(x):
    s = x[::-1].cumsum()
    # must subtract from max value to get an ascending Series
    return s.iat[-1] + 1 - s[::-1]

df['ordinal_number'] = df['type'].eq('payment').groupby(df['user_id']).apply(assign_num)

输出:

   user_id     type  ordinal_number
0        1  request               1
1        1  request               1
2        1  request               1
3        1  request               1
4        1  payment               1
5        2  request               1
6        2  request               1
7        2  payment               1
8        2  request               2
9        2  payment               2