如何替换组中的空值？

Question

我创建了这个数据框，我计算了我正在寻找的差距，但问题是有些公寓价格相同，我得到的价格差为 0。我如何用差值替换值 0同组最低价

例如：

neighboorhood:a, bed:1, bath:1, price:5

neighboorhood:a, bed:1, bath:1, price:5

neighboorhood:a, bed:1, bath:1, price:3

neighboorhood:a, bed:1, bath:1, price:2

我得到 0,2,1,nan 的差价，我正在寻找 2,2,1,nan（简而言之，我不想比较价格相同的 2 个单位）

提前致谢，祝您愉快。

data=[
    [1,'a',1,1,5],[2,'a',1,1,5],[3,'a',1,1,4],[4,'a',1,1,2],[5,'b',1,2,6],[6,'b',1,2,6],[7,'b',1,2,3]
]
df = pd.DataFrame(data, columns = ['id','neighborhoodname', 'beds', 'baths', 'price']) 

df['difference_price'] = ( df.dropna()
                             .sort_values('price',ascending=False)
                             .groupby(['city','beds','baths'])['price'].diff(-1) )

Answer 1

我认为您可以先删除用于 groupby 和 diff 的所有列的重复项，在过滤后的数据中创建新列，最后使用左连接合并到原始数据：

df1 = (df.dropna()
         .sort_values('price',ascending=False)
         .drop_duplicates(['neighborhoodname','beds','baths', 'price']))

df1['difference_price']  = df1.groupby(['neighborhoodname','beds','baths'])['price'].diff(-1)

df = df.merge(df1[['neighborhoodname','beds','baths','price', 'difference_price']], how='left')
print (df)
   id neighborhoodname  beds  baths  price  difference_price
0   1                a     1      1      5               1.0
1   2                a     1      1      5               1.0
2   3                a     1      1      4               2.0
3   4                a     1      1      2               NaN
4   5                b     1      2      6               3.0
5   6                b     1      2      6               3.0
6   7                b     1      2      3               NaN

或者您可以使用 lambda 函数回填每组的 0 值以避免错误输出，如果一个行组（数据从另一组移动）：

df['difference_price'] = (df.sort_values('price',ascending=False)
                            .groupby(['neighborhoodname','beds','baths'])['price']
                            .apply(lambda x: x.diff(-1).replace(0, np.nan).bfill()))

print (df)     
   id neighborhoodname  beds  baths  price  difference_price
0   1                a     1      1      5               1.0
1   2                a     1      1      5               1.0
2   3                a     1      1      4               2.0
3   4                a     1      1      2               NaN
4   5                b     1      2      6               3.0
5   6                b     1      2      6               3.0
6   7                b     1      2      3               NaN

如何替换组中的空值？

How could I replace null value In a group?

python

replace

pandas

difference