Pandas：仅当时间戳大于另一列时，获取该列的累计和

Question

对于每个客户，我只想在时间戳 1 小于时间戳 2 时获取列的累计总和（美元价值）。我可以根据客户对值进行笛卡尔连接或遍历dataframe，但想看看是否有更简单的方法可以使用 groupby 和 apply 来做到这一点。

数据帧：

df = pd.DataFrame({'Customer': ['Alice','Brian','Alice','Alice','Alice','Brian', 'Brian'], 'Timestamp': [1,2,3,4,5,3,6], 'Timestamp 2': [2,5,4,6,7,5,7], 'Dollar Value':[0,1,3,5,3,2,3]})

排序值：

df = df.sort_values(['Customer','Timestamp'])

预期结果：

df['Desired_result'] = [0,0,0,3,0,0,3]

Answer 1

这可行

获取条件匹配的行然后做cumsum

cond = df["Timestamp"]>df["Timestamp 2"]
df["Dollar Value"].where(cond, 0).groupby([cond, df["Customer"]]).cumsum()

编辑根据您的评论，这可能就是您想要的

df = pd.DataFrame({'Customer': ['Alice','Brian','Alice','Alice','Alice','Brian', 'Brian'], 'Timestamp': [1,2,3,4,5,3,6], 'Timestamp 2': [2,5,4,6,7,5,7], 'Dollar Value':[0,1,3,5,3,2,3]})

def sum_dollar_value(group):
    group = group.copy()
    last_row = group.iloc[-1, :]
    cond = group["Timestamp 2"]<last_row["Timestamp"]
    group.loc[last_row.name, "result"] = np.sum(group["Dollar Value"].where(cond, 0))
    return group

df.groupby("Customer").apply(sum_dollar_value).reset_index(level=0, drop=True)

Answer 2

我建议设置条件，然后按客户分组：

# set condition
cond = df["Timestamp"]<df["Timestamp 2"]
df[cond].groupby('Customer')['Dollar Value'].sum()

Note: I borrowed the syntax of condition from the previous answer by

Pandas：仅当时间戳大于另一列时，获取该列的累计和

Pandas: get the cumulative sum of a column only if the timestamp is greater than that of another column

python

pandas

sklearn-pandas