聚合和汇总 pandas 数据但在列中的非连续值之间进行切片
Aggregating and summarizing pandas data but slicing between non-continuous values in column
总而言之,我无法理解如何使用 groupby 来解决这个挑战,因为我看到的大多数 groupby 示例显然都没有处理非连续值的区别。
Timestamp 'Signal' 'Value
00:00:00 1 12
00:00:01 1 12.2
00:00:02 1 2.1
00:00:03 0 1.1
00:00:04 1 6.2
00:00:05 1 1.0
00:00:06 0 4.4
00:00:07 0 1.6
我想取最后一个值,在另一种情况下,取前 3 行的总和,因为信号为 1。然后我想用一个新的 sum/last 重新开始最后两行,因为信号是 1。
所以像这样:
Timestamp Signal Value Sum Last
00:00:00 1 12
00:00:01 1 12.2
00:00:02 1 2.1 26.3 2.1
00:00:03 0 1.1
00:00:04 1 6.2
00:00:05 1 1.0 7.2 1.0
00:00:06 0 4.4
00:00:07 0 1.6
提前致谢!
您需要 Series
,它首先由 cumsum
of shifted column A
by shift
创建:
a = df['Signal'].ne(df['Signal'].shift()).cumsum()
print (a)
0 1
1 1
2 1
3 2
4 3
5 3
6 4
7 4
Name: Signal, dtype: int32
然后通过 duplicated
与列 Signal
链接的值获取掩码,这些值被转换为从 0
到 False
s 和从 1
到True
s:
m = ~a.duplicated(keep='last') & df['Signal']
print (m)
0 False
1 False
2 True
3 False
4 False
5 True
6 False
7 False
Name: Signal, dtype: bool
上次 groupby
by Series and transform
sum
and last add NaN
s by where
:
df['Sum'] = df.groupby(a)['Value'].transform('sum')
df['Last'] = df['Value']
df[['Sum','Last']] = df[['Sum','Last']].where(m)
print (df)
Timestamp Signal Value Sum Last
0 00:00:00 1 12.0 NaN NaN
1 00:00:01 1 12.2 NaN NaN
2 00:00:02 1 2.1 26.3 2.1
3 00:00:03 0 1.1 NaN NaN
4 00:00:04 1 6.2 NaN NaN
5 00:00:05 1 1.0 7.2 1.0
6 00:00:03 0 4.4 NaN NaN
7 00:00:03 0 1.6 NaN NaN
总而言之,我无法理解如何使用 groupby 来解决这个挑战,因为我看到的大多数 groupby 示例显然都没有处理非连续值的区别。
Timestamp 'Signal' 'Value
00:00:00 1 12
00:00:01 1 12.2
00:00:02 1 2.1
00:00:03 0 1.1
00:00:04 1 6.2
00:00:05 1 1.0
00:00:06 0 4.4
00:00:07 0 1.6
我想取最后一个值,在另一种情况下,取前 3 行的总和,因为信号为 1。然后我想用一个新的 sum/last 重新开始最后两行,因为信号是 1。
所以像这样:
Timestamp Signal Value Sum Last
00:00:00 1 12
00:00:01 1 12.2
00:00:02 1 2.1 26.3 2.1
00:00:03 0 1.1
00:00:04 1 6.2
00:00:05 1 1.0 7.2 1.0
00:00:06 0 4.4
00:00:07 0 1.6
提前致谢!
您需要 Series
,它首先由 cumsum
of shifted column A
by shift
创建:
a = df['Signal'].ne(df['Signal'].shift()).cumsum()
print (a)
0 1
1 1
2 1
3 2
4 3
5 3
6 4
7 4
Name: Signal, dtype: int32
然后通过 duplicated
与列 Signal
链接的值获取掩码,这些值被转换为从 0
到 False
s 和从 1
到True
s:
m = ~a.duplicated(keep='last') & df['Signal']
print (m)
0 False
1 False
2 True
3 False
4 False
5 True
6 False
7 False
Name: Signal, dtype: bool
上次 groupby
by Series and transform
sum
and last add NaN
s by where
:
df['Sum'] = df.groupby(a)['Value'].transform('sum')
df['Last'] = df['Value']
df[['Sum','Last']] = df[['Sum','Last']].where(m)
print (df)
Timestamp Signal Value Sum Last
0 00:00:00 1 12.0 NaN NaN
1 00:00:01 1 12.2 NaN NaN
2 00:00:02 1 2.1 26.3 2.1
3 00:00:03 0 1.1 NaN NaN
4 00:00:04 1 6.2 NaN NaN
5 00:00:05 1 1.0 7.2 1.0
6 00:00:03 0 4.4 NaN NaN
7 00:00:03 0 1.6 NaN NaN