使用必须在另一列的特定行上计算的公式填充 Pandas 数据框行,其值为 0 或 NaN
Fill Pandas dataframe rows, whose value is a 0 or NaN, with a formula that have to be calculated on specific rows of another column
我有一个日期框,其中 "price" 列中的值根据 "quantity" 和 "year" 列中的值而不同.例如,对于等于 2 的数量,我在 2017 年的价格等于 2,在 2018 年的价格等于 4。我想用 2018 年的值填充 2019 年的行,这些行具有 0 和 NaN 值。
df = pd.DataFrame({
'quantity': pd.Series([1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]),
'year': pd.Series([2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,]),
'price': pd.Series([1,2,3,4,5,6,7,8,9,2,4,6,8,10,12,14,16,18,np.NaN,np.NaN,0,0,np.NaN,0,np.NaN,0,np.NaN])
})
如果我不使用 2018 年的值,而是计算 2017 年和 2018 年之间的 平均值 呢?
我尝试重新适应 将其应用于第一个案例(以应用 2018 年的数据),但它不起作用:
df['price'][df['year']==2019].fillna(df['price'][df['year'] == 2018], inplace = True)
你能帮帮我吗?
预期的输出应该是如下所示的数据帧:
Df 具有 2018 年的值
df = pd.DataFrame({
'quantity': pd.Series([1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]),
'year': pd.Series([2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,]),
'price': pd.Series([1,2,3,4,5,6,7,8,9,2,4,6,8,10,12,14,16,18,2,4,6,8,10,12,14,16,18])
})
Df 的值为 2017 年和 2018 年之间的平均值
df = pd.DataFrame({
'quantity': pd.Series([1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]),
'year': pd.Series([2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,]),
'price': pd.Series([1,2,3,4,5,6,7,8,9,2,4,6,8,10,12,14,16,18,1.5,3,4.5,6,7.5,9,10.5,12,13.5])
})
这是一种用 2017
和 2018
的平均值填充的方法。
首先按数量对前一年的数据进行分组,然后汇总平均值:
m = df[df.year.isin([2017, 2018])].groupby('quantity').price.mean()
使用 set_index
to set the quantity
column as index, replace 0s
by NaNs
and use fillna
也接受字典来根据索引映射值:
ix = df[df.year.eq(2019)].index
df.loc[ix, 'price'] = (df.loc[ix].set_index('quantity').price
.replace(0, np.nan).fillna(m).values)
quantity year price
0 1 2017 1.0
1 2 2017 2.0
2 3 2017 3.0
3 4 2017 4.0
4 5 2017 5.0
5 6 2017 6.0
6 7 2017 7.0
7 8 2017 8.0
8 9 2017 9.0
9 1 2018 2.0
10 2 2018 4.0
11 3 2018 6.0
12 4 2018 8.0
13 5 2018 10.0
14 6 2018 12.0
15 7 2018 14.0
16 8 2018 16.0
17 9 2018 18.0
18 1 2019 1.5
19 2 2019 3.0
20 3 2019 4.5
21 4 2019 6.0
22 5 2019 7.5
23 6 2019 9.0
24 7 2019 10.5
25 8 2019 12.0
26 9 2019 13.5
我有一个日期框,其中 "price" 列中的值根据 "quantity" 和 "year" 列中的值而不同.例如,对于等于 2 的数量,我在 2017 年的价格等于 2,在 2018 年的价格等于 4。我想用 2018 年的值填充 2019 年的行,这些行具有 0 和 NaN 值。
df = pd.DataFrame({
'quantity': pd.Series([1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]),
'year': pd.Series([2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,]),
'price': pd.Series([1,2,3,4,5,6,7,8,9,2,4,6,8,10,12,14,16,18,np.NaN,np.NaN,0,0,np.NaN,0,np.NaN,0,np.NaN])
})
如果我不使用 2018 年的值,而是计算 2017 年和 2018 年之间的 平均值 呢?
我尝试重新适应
df['price'][df['year']==2019].fillna(df['price'][df['year'] == 2018], inplace = True)
你能帮帮我吗?
预期的输出应该是如下所示的数据帧:
Df 具有 2018 年的值
df = pd.DataFrame({
'quantity': pd.Series([1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]),
'year': pd.Series([2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,]),
'price': pd.Series([1,2,3,4,5,6,7,8,9,2,4,6,8,10,12,14,16,18,2,4,6,8,10,12,14,16,18])
})
Df 的值为 2017 年和 2018 年之间的平均值
df = pd.DataFrame({
'quantity': pd.Series([1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]),
'year': pd.Series([2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,]),
'price': pd.Series([1,2,3,4,5,6,7,8,9,2,4,6,8,10,12,14,16,18,1.5,3,4.5,6,7.5,9,10.5,12,13.5])
})
这是一种用 2017
和 2018
的平均值填充的方法。
首先按数量对前一年的数据进行分组,然后汇总平均值:
m = df[df.year.isin([2017, 2018])].groupby('quantity').price.mean()
使用 set_index
to set the quantity
column as index, replace 0s
by NaNs
and use fillna
也接受字典来根据索引映射值:
ix = df[df.year.eq(2019)].index
df.loc[ix, 'price'] = (df.loc[ix].set_index('quantity').price
.replace(0, np.nan).fillna(m).values)
quantity year price
0 1 2017 1.0
1 2 2017 2.0
2 3 2017 3.0
3 4 2017 4.0
4 5 2017 5.0
5 6 2017 6.0
6 7 2017 7.0
7 8 2017 8.0
8 9 2017 9.0
9 1 2018 2.0
10 2 2018 4.0
11 3 2018 6.0
12 4 2018 8.0
13 5 2018 10.0
14 6 2018 12.0
15 7 2018 14.0
16 8 2018 16.0
17 9 2018 18.0
18 1 2019 1.5
19 2 2019 3.0
20 3 2019 4.5
21 4 2019 6.0
22 5 2019 7.5
23 6 2019 9.0
24 7 2019 10.5
25 8 2019 12.0
26 9 2019 13.5