如何删除其余的重复行，同时保留基于 A 列的第一行和最后一行？

Question

 df = pd.DataFrame({
        'Column A': [12,12,12, 15, 16, 141, 141, 141, 141],
         'Column B':['Apple' ,'Apple' ,'Apple' , 'Red', 'Blue', 'Yellow', 'Yellow', 'Yellow', 'Yellow'],
        'Column C':[100, 50, np.nan , 23 , np.nan , 199 , np.nan , 1,np.nan]
    })

或数据table如下：


    | Column A | Column B |Column C 
----| -------- | ---------|--------
0   | 12       | Apple    |100     
1   | 12       | Apple    |50      
2   | 12       | Apple    |NaN      
3   | 15       | Red      |23       
4   | 16       | Blue     |NaN      
5   | 141      | Yellow   |199      
6   | 141      | Yellow   |NaN      
7   | 141      | Yellow   |1        
8   | 141      | Yellow   |NaN

结果将是：


    | Column A | Column B |Column C 
----| -------- | ---------|--------
0   | 12       | Apple    |100         
2   | 12       | Apple    |NaN      
3   | 15       | Red      |23       
4   | 16       | Blue     |NaN      
5   | 141      | Yellow   |199           
8   | 141      | Yellow   |NaN

Answer 1

df.drop_duplicates(subset=['A'], keep='first')

你最后做同样的事情

df.drop_duplicates(subset=['A'], keep='last')

Answer 2

这是实现您想要的可能的方法：

result = (
    pd.concat([
        df.drop_duplicates('Column A', keep='first'),
        df.drop_duplicates('Column A', keep='last'),
    ]).reset_index()
      .drop_duplicates('index')
      .sort_values('index')
      .set_index('index')
      .rename_axis(None)
)

结果：

   Column A Column B  Column C
0        12    Apple     100.0
2        12    Apple       NaN
3        15      Red      23.0
4        16     Blue       NaN
5       141   Yellow     199.0
8       141   Yellow       NaN

Answer 3

一种选择是将 groupby 与 nth 函数一起使用：

df.groupby('Column A', sort = False, as_index = False).nth([0, -1])

   Column A Column B  Column C
0        12    Apple     100.0
2        12    Apple       NaN
3        15      Red      23.0
4        16     Blue       NaN
5       141   Yellow     199.0
8       141   Yellow       NaN

如何删除其余的重复行，同时保留基于 A 列的第一行和最后一行？

How can I delete the rest duplicate rows while keeping the first and last row based on Column A?

python

row

duplicates

dataframe

pandas