DataFrame 使用另一个 DataFrame 应用函数
DataFrame apply function using another DataFrame
我正在尝试应用一个函数到pandasDataFrame的所有列。该函数包括将每一列(视为 pandas 系列)除以另一个 DataFrame (df_reference) 上指示的参数,我通过 列名称 访问该参数(Series.name).
然而,该操作不起作用,最终的 df 充满了 NaN 值。我认为我在每次迭代中推断列名称的方式失败了。
这里我显示代码:
# This is an example of the df I'd like to operate over:
df = pd.DataFrame({'P01':np.random.random(50),
'P02':np.random.random(50)},
index=pd.period_range(start='2015-03-09', periods=50))
>>> df
P01 P02
2015-03-09 0.575955 0.735709
2015-03-10 0.290656 0.989249
2015-03-11 0.859850 0.387678
2015-03-12 0.939810 0.085914
2015-03-13 0.278855 0.031567
... ... ...
# This is an example of the reference df I'd like to consult about:
df_reference = pd.DataFrame({'ID':['P01', 'P02'], 'Lat':[37.261, 37.258],
'Lon':[-6.431, -6.433], 'Z':[-0.63, -0.825]})
>>> df_reference
ID Lat Lon Z
0 P01 37.261 -6.431 -0.630
1 P02 37.258 -6.433 -0.825
应用操作:
df.apply(lambda x: x/df_reference.loc[df_reference['ID']==x.name]['Z'], axis=1)
结果:
P01 P02
2015-03-09 NaN NaN
2015-03-10 NaN NaN
2015-03-11 NaN NaN
2015-03-12 NaN NaN
... ... ...
关于可能发生的事情的任何线索?
尝试:
>>> df / df_reference.set_index('ID')['Z']
# OR
>>> df.apply(lambda x: x/(df_reference.set_index('ID').loc[x.name].Z))
P01 P02
2015-03-09 -1.130257 -0.633978
2015-03-10 -0.367410 -0.655255
2015-03-11 -1.358091 -0.405920
2015-03-12 -0.085972 -0.637737
2015-03-13 -0.031896 -0.306626
2015-03-14 -0.934217 -0.257150
2015-03-15 -0.081206 -0.461807
2015-03-16 -1.100641 -1.202574
2015-03-17 -0.523478 -0.354512
2015-03-18 -0.303866 -1.030580
我正在尝试应用一个函数到pandasDataFrame的所有列。该函数包括将每一列(视为 pandas 系列)除以另一个 DataFrame (df_reference) 上指示的参数,我通过 列名称 访问该参数(Series.name).
然而,该操作不起作用,最终的 df 充满了 NaN 值。我认为我在每次迭代中推断列名称的方式失败了。
这里我显示代码:
# This is an example of the df I'd like to operate over:
df = pd.DataFrame({'P01':np.random.random(50),
'P02':np.random.random(50)},
index=pd.period_range(start='2015-03-09', periods=50))
>>> df
P01 P02
2015-03-09 0.575955 0.735709
2015-03-10 0.290656 0.989249
2015-03-11 0.859850 0.387678
2015-03-12 0.939810 0.085914
2015-03-13 0.278855 0.031567
... ... ...
# This is an example of the reference df I'd like to consult about:
df_reference = pd.DataFrame({'ID':['P01', 'P02'], 'Lat':[37.261, 37.258],
'Lon':[-6.431, -6.433], 'Z':[-0.63, -0.825]})
>>> df_reference
ID Lat Lon Z
0 P01 37.261 -6.431 -0.630
1 P02 37.258 -6.433 -0.825
应用操作:
df.apply(lambda x: x/df_reference.loc[df_reference['ID']==x.name]['Z'], axis=1)
结果:
P01 P02
2015-03-09 NaN NaN
2015-03-10 NaN NaN
2015-03-11 NaN NaN
2015-03-12 NaN NaN
... ... ...
关于可能发生的事情的任何线索?
尝试:
>>> df / df_reference.set_index('ID')['Z']
# OR
>>> df.apply(lambda x: x/(df_reference.set_index('ID').loc[x.name].Z))
P01 P02
2015-03-09 -1.130257 -0.633978
2015-03-10 -0.367410 -0.655255
2015-03-11 -1.358091 -0.405920
2015-03-12 -0.085972 -0.637737
2015-03-13 -0.031896 -0.306626
2015-03-14 -0.934217 -0.257150
2015-03-15 -0.081206 -0.461807
2015-03-16 -1.100641 -1.202574
2015-03-17 -0.523478 -0.354512
2015-03-18 -0.303866 -1.030580