drop_duplicates 对于具有最新值的特定列甚至更多？

Question

有没有办法自定义 drop_duplicates 以便删除“某种”重复项？

示例：pandas df

Year	Name	ID	City
2011	Superman	101	Metropolis
2011	Batman	102	Gotham
2012	The Batman	102	Gotham
2011	Noobmaster69	103	Online
2011	Noobmaster69	103	Online

我尝试使用 drop_duplicates 所以我得到了这个

Year	Name	ID	City
2011	Superman	101	Metropolis
2011	Batman	102	Gotham
2012	The Batman	102	Gotham
2011	Noobmaster69	103	Online

我实际上更想压缩它，因为我只想在数据框上显示“蝙蝠侠”的“102”行，这是较新的信息 (2012>2011)。期待这样的事情

Year	Name	ID	City
2011	Superman	101	Metropolis
2012	The Batman	102	Gotham
2011	Noobmaster69	103	Online

Answer 1

试试这个，可以轻松删除带有 ID 列的重复项。

import pandas as pd

#reads your table data
read_file = pd.read_csv("your_filename.csv")

df = pd.DataFrame(read_file)
df = df.drop_duplicates(subset='ID', keep='last')

subset = "specific_col" 用于从特定列中删除项目，keep = "last" 用于保留最后一个重复项（删除第一个重复项）

drop_duplicates 对于具有最新值的特定列甚至更多？

drop_duplicates even more for a specific column with latest value?

duplicates

dataframe

python-3.x

pandas