pandas groupby 对象的列操作

Question

我有一个数据框 df 看起来像这样：

     id   Category   Time
1    176       12      00:00:00
2    4956      2       00:00:00
3    583       4       00:00:04
4    9395      2       00:00:24
5    176       12      00:03:23

这基本上是一组 id 和他们在特定 Time 中使用的项目的 category。我使用 df.groupby['id'] 然后我想看看他们使用的是相同类别还是不同类别并分别分配 True 或 False（或者 NaN 如果那是那个特定 id 的第一个项目。我还过滤掉了数据，去掉了所有的 id，只有一个 Time。

例如其中一组可能看起来像

      id   Category   Time
1    176       12      00:00:00
2    176       12      00:03:23
3    176       2       00:04:34
4    176       2       00:04:54
5    176       2       00:05:23

我想执行一个操作来获取

      id   Category   Time          Transition
1    176       12      00:00:00       NaN
2    176       12      00:03:23       False
3    176       2       00:04:34       True
4    176       2       00:04:54       False
5    176       2       00:05:23       False

我想在 groupby 之后对 Category 列进行某种 apply，但我无法找出正确的函数。

Answer 1

这里不需要 groupby，只需要 sort 和 shift。

df.sort(['id', 'Time'], inplace=True)
df['Transition'] = df.Category != df.Category.shift(1)
df.loc[df.id != df.id.shift(1), 'Transition'] = np.nan

我还没有测试过这个，但它应该可以解决问题

pandas groupby 对象的列操作

Column operations on pandas groupby object

python

pandas

pandas-groupby