运行 sql 查询 pandas 数据框

Question

我有一个数据框 df

ID	Price	Region
1	23	AUS
1	45	DXB
2	25	GER
2	18	TUN

我想在python中写一段代码得到如下输出

ID	Price	Region
1	45	DXB
2	25	TUN

我尝试使用 pandasql 获取输出，但它没有给出我想要的输出

我试过的密码是

import pandas as pd
import pandasql as ps

#to read table
df=pd.read_excel("test.xlsx")

ps.sqldf("select ID, max(Price), Region from df order by ID")

如果python本身有任何其他代码（不使用pandasql）可以得到上面的输出，请告诉我

Answer 1

你可以这样做：

df.sort_values('Price').drop_duplicates('ID', keep='last')

Answer 2

您可以使用groupby.transform

output_df = df[df['Price'].eq(df.groupby("ID")['Price'].transform("max"))]

或使用 ps.sqldf 使用 window 函数获取最高价格，然后 return 价格等于最高价格的行：

output_df  = ps.sqldf("""select ID,Price,Region from 
                        (select *, max(Price) over (partition by ID) max_Price from df)
                        where Price = max_Price""")

    ID  Price Region
0   1     45    DXB
1   2     25    GER

运行 sql 查询 pandas 数据框

Run sql query on pandas dataframe

python

pandasql