加权平均 - 如果值或权重缺失则忽略数据
Weighted Average - omit data if missing from value or weight
我有这样的代码
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df1 = pd.DataFrame({'value':[10,20,np.nan,40],
... 'weight':[1,np.nan,3,4]})
>>> df1
value weight
0 10.0 1.0
1 20.0 NaN
2 NaN 3.0
3 40.0 4.0
>>> (df1["value"] * df1["weight"]).sum() / df1["weight"].sum()
21.25
如果缺少值或重量,我想在计算中省略数据。即我想要加权平均值
(10*1 + 40*4) /(1+4) = 34
如果可以使用 pandas 中的单个表达式,请提供帮助。
您可以先使用 boolean indexing
, mask is created by notnull
and all
进行过滤,以检查每行的所有 True
个值:
df1 = df1[df1.notnull().all(axis=1)]
print (df1)
value weight
0 10.0 1.0
3 40.0 4.0
df2 = (df1["value"] * df1["weight"]).sum() / df1["weight"].sum()
print (df2)
34.0
或分别检查两列:
df1 = df1[df1["value"].notnull() & df1["weight"].notnull()]
print (df1)
value weight
0 10.0 1.0
3 40.0 4.0
使用 dropna
的更简单的解决方案:
df1 = df1.dropna()
print (df1)
value weight
0 10.0 1.0
3 40.0 4.0
或者如果需要指定列:
df1 = df1.dropna(subset=['value','weight'])
print (df1)
value weight
0 10.0 1.0
3 40.0 4.0
我有这样的代码
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df1 = pd.DataFrame({'value':[10,20,np.nan,40],
... 'weight':[1,np.nan,3,4]})
>>> df1
value weight
0 10.0 1.0
1 20.0 NaN
2 NaN 3.0
3 40.0 4.0
>>> (df1["value"] * df1["weight"]).sum() / df1["weight"].sum()
21.25
如果缺少值或重量,我想在计算中省略数据。即我想要加权平均值 (10*1 + 40*4) /(1+4) = 34
如果可以使用 pandas 中的单个表达式,请提供帮助。
您可以先使用 boolean indexing
, mask is created by notnull
and all
进行过滤,以检查每行的所有 True
个值:
df1 = df1[df1.notnull().all(axis=1)]
print (df1)
value weight
0 10.0 1.0
3 40.0 4.0
df2 = (df1["value"] * df1["weight"]).sum() / df1["weight"].sum()
print (df2)
34.0
或分别检查两列:
df1 = df1[df1["value"].notnull() & df1["weight"].notnull()]
print (df1)
value weight
0 10.0 1.0
3 40.0 4.0
使用 dropna
的更简单的解决方案:
df1 = df1.dropna()
print (df1)
value weight
0 10.0 1.0
3 40.0 4.0
或者如果需要指定列:
df1 = df1.dropna(subset=['value','weight'])
print (df1)
value weight
0 10.0 1.0
3 40.0 4.0