Python Group By 和 Case When Equivalent Relative R

Question

我正在尝试在 Python 中对分组数据框执行 case when/if-else 语句以创建新变量。如果我在 R 中编码并且我正在尝试在 Python 中找到一个类似且矢量化的操作，我想执行以下操作。 R代码：

dt %>% group_by(user,merchant,date) %>%
mutate(
new_variable = case_when(-amount == lag(amount) ~ 2,
                         True ~ 1)
) %>% ungroup()

在 Python 我试过使用 np.select:

    conditions = [
    (-us_trans['real_amount'] == us_trans['real_amount'].shift(-1)),
    (-us_trans['real_amount'] != us_trans['real_amount'].shift(-1))

]

    values = [
        2, 
        1
        
    ]

但我不知道如何在分组数据框上使用 np.select 来创建新变量。

我知道我可以使用 groupby(['user','merchant','date'].apply 并传递一个 if-else 语句，但我相信这将在一个循环中完成，我正在尝试以矢量化的方式来优化我的代码。

谢谢！

Answer 1

使用慢 pandas 选项：

df["new_variable"] = np.where(df.groupby(['user', 'merchant','date'])['amount'].apply(lambda g: g.shift(-1)==-g),2,1)

但是，使用 datatable、shift()、ifelse() 和 by() 会快得多

from datatable import dt, f, by

df = dt.Frame(df)

df[:,
   dt.update(new_variable=dt.ifelse(-1*dt.shift(f.amount)==f.amount,2,1)),
   by(f.user,f.merchant,f.date)
]

Python Group By 和 Case When Equivalent Relative R

Python Group By and Case When Equivalent Relative to R

python

if-statement

r

pandas-groupby