如何删除值小于每组最大值百分比的行
How to drop rows with a value of less than a percentage of the maximum per group
我有一个 pandas 数据帧,其中包含一个信号的时间序列,其中已识别出一些峰值:
Time (s) Intensity Peak
1 1 a
2 10 a
3 30 a
4 100 a
5 40 a
6 20 a
7 2 a
1 20 b
2 100 b
3 300 b
4 80 b
5 20 b
6 2 b
我想删除强度值小于每个峰最大强度值 10% 的行,以获得:
Time (s) Intensity Peak
3 30 a
4 200 a
5 40 a
6 25 a
2 100 b
3 300 b
4 80 b
我该怎么做?我试着寻找一个可以做到这一点的 groupby 函数,但我似乎找不到合适的东西。
谢谢!
使用groupby
生成掩码:
filtered = df[df.groupby('Peak')['Intensity'].apply(lambda x: x > x.max() / 10)]
输出:
>>> filtered
Time(s) Intensity Peak
2 3 30 a
3 4 100 a
4 5 40 a
5 6 20 a
8 2 100 b
9 3 300 b
10 4 80 b
你可以使用 GroupBy.transform
with max
to get max from each group and take 10% using Series.div
. Now, compare that with df['Intensity']
and use it for boolean indexing.
max_vals = df.groupby('Peak')['Intensity'].transform('max').div(10)
mask = df['Intensity'] > max_vals
df[mask]
# Time (s) Intensity Peak
# 2 3 30 a
# 3 4 100 a
# 4 5 40 a
# 5 6 20 a
# 8 2 100 b
# 9 3 300 b
# 10 4 80 b
我有一个 pandas 数据帧,其中包含一个信号的时间序列,其中已识别出一些峰值:
Time (s) Intensity Peak
1 1 a
2 10 a
3 30 a
4 100 a
5 40 a
6 20 a
7 2 a
1 20 b
2 100 b
3 300 b
4 80 b
5 20 b
6 2 b
我想删除强度值小于每个峰最大强度值 10% 的行,以获得:
Time (s) Intensity Peak
3 30 a
4 200 a
5 40 a
6 25 a
2 100 b
3 300 b
4 80 b
我该怎么做?我试着寻找一个可以做到这一点的 groupby 函数,但我似乎找不到合适的东西。 谢谢!
使用groupby
生成掩码:
filtered = df[df.groupby('Peak')['Intensity'].apply(lambda x: x > x.max() / 10)]
输出:
>>> filtered
Time(s) Intensity Peak
2 3 30 a
3 4 100 a
4 5 40 a
5 6 20 a
8 2 100 b
9 3 300 b
10 4 80 b
你可以使用 GroupBy.transform
with max
to get max from each group and take 10% using Series.div
. Now, compare that with df['Intensity']
and use it for boolean indexing.
max_vals = df.groupby('Peak')['Intensity'].transform('max').div(10)
mask = df['Intensity'] > max_vals
df[mask]
# Time (s) Intensity Peak
# 2 3 30 a
# 3 4 100 a
# 4 5 40 a
# 5 6 20 a
# 8 2 100 b
# 9 3 300 b
# 10 4 80 b