给定未排序的其他约束,规范化数据框中的值
Normalize values in data frame given unsorted other constraints
我有一个如下所示的数据框:
counter leg_rate pose_rate component approach rmse
0 proc/stat-stime d d test Baseline 1.583097
1 proc/stat-stime d r test AEW - MTEN 0.516108
2 proc/stat-stime d d test ASDF 0.705861
3 proc/stat-stime r r test ASDF 0.345816
4 proc/stat-utime d r test Baseline 1.128632
5 proc/stat-stime d r test Baseline 1.579803
6 proc/stat-stime r r test Baseline 1.345895
7 proc/stat-utime r r test AEW - MTEN 0.187236
8 proc/stat-utime d d test Baseline 1.193776
9 proc/stat-stime r d test ASDF 0.014975
10 proc/stat-utime r r test ASDF 0.985493
11 proc/stat-utime r d test AEW - MTEN 0.897336
12 proc/stat-stime r d test Baseline 1.415103
13 proc/stat-utime r d test Baseline 1.724266
14 proc/stat-utime r r test Baseline 1.294654
15 proc/stat-utime d d test AEW - MTEN 0.263845
16 proc/stat-utime r d test ASDF 0.497368
17 proc/stat-stime d d test AEW - MTEN 0.143402
18 proc/stat-utime d r test AEW - MTEN 0.233437
19 proc/stat-stime r d test AEW - MTEN 0.431739
20 proc/stat-utime d r test ASDF 0.002475
21 proc/stat-stime d r test ASDF 0.331700
22 proc/stat-stime r r test AEW - MTEN 0.985123
23 proc/stat-utime d d test ASDF 0.464989
我想通过将 rmse
除以名为 Baseline
的 approach
中的值来规范化它。最后应该有一个新列 rmse-norm
,其中包含各自的标准化值。所有其他的列基本上都提供了划分rmse
时需要匹配的上下文。这意味着行
1 proc/stat-stime d r test AEW - MTEN 0.516108
需要除以匹配其他列的行
5 proc/stat-stime d r test Baseline 1.579803
Baseline
方法总会有匹配的行。
我尝试了 groupby
的各种方法,并为其他列使用了索引,但由于列的顺序未知,我无法想出一些简洁的方法来以正确的顺序分配正确的值.
我认为你可以使用:
#filter all rows with Baseline to `MultiIndex` `Series`
cols = ['counter','leg_rate','pose_rate','component']
s = df[df.approach == 'Baseline'].set_index(cols)['rmse']
print (s)
counter leg_rate pose_rate component
proc/stat-stime d d test 1.583097
proc/stat-utime d r test 1.128632
proc/stat-stime d r test 1.579803
r r test 1.345895
proc/stat-utime d d test 1.193776
proc/stat-stime r d test 1.415103
proc/stat-utime r d test 1.724266
r test 1.294654
Name: rmse, dtype: float64
#sorting for matching, because set_index sort index
df = df.sort_values(cols)
#divide by s, output to numpy array for assign to rmse column
df['rmse'] = df.set_index(cols)['rmse'].div(s).values
#sort index to original unsorted df
print (df.sort_index())
counter leg_rate pose_rate component approach rmse
0 proc/stat-stime d d test Baseline 1.000000
1 proc/stat-stime d r test AEW - MTEN 0.326691
2 proc/stat-stime d d test ASDF 0.445873
3 proc/stat-stime r r test ASDF 0.256941
4 proc/stat-utime d r test Baseline 1.000000
5 proc/stat-stime d r test Baseline 1.000000
6 proc/stat-stime r r test Baseline 1.000000
7 proc/stat-utime r r test AEW - MTEN 0.144622
8 proc/stat-utime d d test Baseline 1.000000
9 proc/stat-stime r d test ASDF 0.010582
10 proc/stat-utime r r test ASDF 0.761202
11 proc/stat-utime r d test AEW - MTEN 0.520416
12 proc/stat-stime r d test Baseline 1.000000
13 proc/stat-utime r d test Baseline 1.000000
14 proc/stat-utime r r test Baseline 1.000000
15 proc/stat-utime d d test AEW - MTEN 0.221017
16 proc/stat-utime r d test ASDF 0.288452
17 proc/stat-stime d d test AEW - MTEN 0.090583
18 proc/stat-utime d r test AEW - MTEN 0.206832
19 proc/stat-stime r d test AEW - MTEN 0.305094
20 proc/stat-utime d r test ASDF 0.002193
21 proc/stat-stime d r test ASDF 0.209963
22 proc/stat-stime r r test AEW - MTEN 0.731946
23 proc/stat-utime d d test ASDF 0.389511
具有 groupby
和自定义函数 f
的另一个解决方案:
def f(x):
x.rmse = x['rmse'] / x.loc[x['approach'] == 'Baseline', 'rmse'].item()
return x
df = df.groupby(['counter','leg_rate','pose_rate','component']).apply(f)
print (df)
counter leg_rate pose_rate component approach rmse
0 proc/stat-stime d d test Baseline 1.000000
1 proc/stat-stime d r test AEW - MTEN 0.326691
2 proc/stat-stime d d test ASDF 0.445873
3 proc/stat-stime r r test ASDF 0.256941
4 proc/stat-utime d r test Baseline 1.000000
5 proc/stat-stime d r test Baseline 1.000000
6 proc/stat-stime r r test Baseline 1.000000
7 proc/stat-utime r r test AEW - MTEN 0.144622
8 proc/stat-utime d d test Baseline 1.000000
9 proc/stat-stime r d test ASDF 0.010582
10 proc/stat-utime r r test ASDF 0.761202
11 proc/stat-utime r d test AEW - MTEN 0.520416
12 proc/stat-stime r d test Baseline 1.000000
13 proc/stat-utime r d test Baseline 1.000000
14 proc/stat-utime r r test Baseline 1.000000
15 proc/stat-utime d d test AEW - MTEN 0.221017
16 proc/stat-utime r d test ASDF 0.288452
17 proc/stat-stime d d test AEW - MTEN 0.090583
18 proc/stat-utime d r test AEW - MTEN 0.206832
19 proc/stat-stime r d test AEW - MTEN 0.305094
20 proc/stat-utime d r test ASDF 0.002193
21 proc/stat-stime d r test ASDF 0.209963
22 proc/stat-stime r r test AEW - MTEN 0.731946
23 proc/stat-utime d d test ASDF 0.389511
我有一个如下所示的数据框:
counter leg_rate pose_rate component approach rmse
0 proc/stat-stime d d test Baseline 1.583097
1 proc/stat-stime d r test AEW - MTEN 0.516108
2 proc/stat-stime d d test ASDF 0.705861
3 proc/stat-stime r r test ASDF 0.345816
4 proc/stat-utime d r test Baseline 1.128632
5 proc/stat-stime d r test Baseline 1.579803
6 proc/stat-stime r r test Baseline 1.345895
7 proc/stat-utime r r test AEW - MTEN 0.187236
8 proc/stat-utime d d test Baseline 1.193776
9 proc/stat-stime r d test ASDF 0.014975
10 proc/stat-utime r r test ASDF 0.985493
11 proc/stat-utime r d test AEW - MTEN 0.897336
12 proc/stat-stime r d test Baseline 1.415103
13 proc/stat-utime r d test Baseline 1.724266
14 proc/stat-utime r r test Baseline 1.294654
15 proc/stat-utime d d test AEW - MTEN 0.263845
16 proc/stat-utime r d test ASDF 0.497368
17 proc/stat-stime d d test AEW - MTEN 0.143402
18 proc/stat-utime d r test AEW - MTEN 0.233437
19 proc/stat-stime r d test AEW - MTEN 0.431739
20 proc/stat-utime d r test ASDF 0.002475
21 proc/stat-stime d r test ASDF 0.331700
22 proc/stat-stime r r test AEW - MTEN 0.985123
23 proc/stat-utime d d test ASDF 0.464989
我想通过将 rmse
除以名为 Baseline
的 approach
中的值来规范化它。最后应该有一个新列 rmse-norm
,其中包含各自的标准化值。所有其他的列基本上都提供了划分rmse
时需要匹配的上下文。这意味着行
1 proc/stat-stime d r test AEW - MTEN 0.516108
需要除以匹配其他列的行
5 proc/stat-stime d r test Baseline 1.579803
Baseline
方法总会有匹配的行。
我尝试了 groupby
的各种方法,并为其他列使用了索引,但由于列的顺序未知,我无法想出一些简洁的方法来以正确的顺序分配正确的值.
我认为你可以使用:
#filter all rows with Baseline to `MultiIndex` `Series`
cols = ['counter','leg_rate','pose_rate','component']
s = df[df.approach == 'Baseline'].set_index(cols)['rmse']
print (s)
counter leg_rate pose_rate component
proc/stat-stime d d test 1.583097
proc/stat-utime d r test 1.128632
proc/stat-stime d r test 1.579803
r r test 1.345895
proc/stat-utime d d test 1.193776
proc/stat-stime r d test 1.415103
proc/stat-utime r d test 1.724266
r test 1.294654
Name: rmse, dtype: float64
#sorting for matching, because set_index sort index
df = df.sort_values(cols)
#divide by s, output to numpy array for assign to rmse column
df['rmse'] = df.set_index(cols)['rmse'].div(s).values
#sort index to original unsorted df
print (df.sort_index())
counter leg_rate pose_rate component approach rmse
0 proc/stat-stime d d test Baseline 1.000000
1 proc/stat-stime d r test AEW - MTEN 0.326691
2 proc/stat-stime d d test ASDF 0.445873
3 proc/stat-stime r r test ASDF 0.256941
4 proc/stat-utime d r test Baseline 1.000000
5 proc/stat-stime d r test Baseline 1.000000
6 proc/stat-stime r r test Baseline 1.000000
7 proc/stat-utime r r test AEW - MTEN 0.144622
8 proc/stat-utime d d test Baseline 1.000000
9 proc/stat-stime r d test ASDF 0.010582
10 proc/stat-utime r r test ASDF 0.761202
11 proc/stat-utime r d test AEW - MTEN 0.520416
12 proc/stat-stime r d test Baseline 1.000000
13 proc/stat-utime r d test Baseline 1.000000
14 proc/stat-utime r r test Baseline 1.000000
15 proc/stat-utime d d test AEW - MTEN 0.221017
16 proc/stat-utime r d test ASDF 0.288452
17 proc/stat-stime d d test AEW - MTEN 0.090583
18 proc/stat-utime d r test AEW - MTEN 0.206832
19 proc/stat-stime r d test AEW - MTEN 0.305094
20 proc/stat-utime d r test ASDF 0.002193
21 proc/stat-stime d r test ASDF 0.209963
22 proc/stat-stime r r test AEW - MTEN 0.731946
23 proc/stat-utime d d test ASDF 0.389511
具有 groupby
和自定义函数 f
的另一个解决方案:
def f(x):
x.rmse = x['rmse'] / x.loc[x['approach'] == 'Baseline', 'rmse'].item()
return x
df = df.groupby(['counter','leg_rate','pose_rate','component']).apply(f)
print (df)
counter leg_rate pose_rate component approach rmse
0 proc/stat-stime d d test Baseline 1.000000
1 proc/stat-stime d r test AEW - MTEN 0.326691
2 proc/stat-stime d d test ASDF 0.445873
3 proc/stat-stime r r test ASDF 0.256941
4 proc/stat-utime d r test Baseline 1.000000
5 proc/stat-stime d r test Baseline 1.000000
6 proc/stat-stime r r test Baseline 1.000000
7 proc/stat-utime r r test AEW - MTEN 0.144622
8 proc/stat-utime d d test Baseline 1.000000
9 proc/stat-stime r d test ASDF 0.010582
10 proc/stat-utime r r test ASDF 0.761202
11 proc/stat-utime r d test AEW - MTEN 0.520416
12 proc/stat-stime r d test Baseline 1.000000
13 proc/stat-utime r d test Baseline 1.000000
14 proc/stat-utime r r test Baseline 1.000000
15 proc/stat-utime d d test AEW - MTEN 0.221017
16 proc/stat-utime r d test ASDF 0.288452
17 proc/stat-stime d d test AEW - MTEN 0.090583
18 proc/stat-utime d r test AEW - MTEN 0.206832
19 proc/stat-stime r d test AEW - MTEN 0.305094
20 proc/stat-utime d r test ASDF 0.002193
21 proc/stat-stime d r test ASDF 0.209963
22 proc/stat-stime r r test AEW - MTEN 0.731946
23 proc/stat-utime d d test ASDF 0.389511