python pandas 按条件删除重复的列

python pandas drop duplicate columns by condition

我想按条件删除重复的列 所以我想做的是在“类型”相同(重复)的地方删除“数字”

我明白了

data={"col1":[2,3,4,5,9,2,6],
"col2":[4,2,4,6,0,1,5],
"col3":[7,6,0,11,3,6,7],
"col4":[14,11,22,8,6,3,9],
"col5":[0,5,7,3,8,2,9],
"type":["A","A","C","D","B","B","E"],
"number":["one","two","two","one","one","two","two"]}
df=pd.DataFrame.from_dict(data)

我想要这个

data={"col1":[3,4,5,2,6],
"col2":[2,4,6,1,5],
"col3":[6,0,11,6,7],
"col4":[11,22,8,3,9],
"col5":[5,7,3,2,9],
"type":["A","C","D","B","E"],
"number":["two","two","one","two","two"]}
df=pd.DataFrame.from_dict(data)

您可以链接 2 个条件 - select 所有非 one 值通过比较 Series.ne and inverted mask with Series.duplicated:

df1 = df[df['number'].ne('one') | ~df['type'].duplicated(keep=False)]
print (df1)
   col1  col2  col3  col4  col5 type number
1     3     2     6    11     5    A    two
2     4     4     0    22     7    C    two
3     5     6    11     8     3    D    one
5     2     1     6     3     2    B    two
6     6     5     7     9     9    E    two

关于有序分类的另一个想法:

cats = pd.unique(['one'] + df['number'].unique().tolist())

df['number'] = pd.Categorical(df['number'], categories=cats, ordered=True)

df2 = df.sort_values('number').drop_duplicates(subset=['type'], keep='last').sort_index()
print (df2)
   col1  col2  col3  col4  col5 type number
1     3     2     6    11     5    A    two
2     4     4     0    22     7    C    two
3     5     6    11     8     3    D    one
5     2     1     6     3     2    B    two
6     6     5     7     9     9    E    two

试试这个:

df = df.drop_duplicates(subset=['type'],keep='last')
print(df)

输出:

    col1    col2    col3    col4    col5    type    number
1   3       2       6       11      5       A       two
2   4       4       0       22      7       C       two
3   5       6       11      8       3       D       one
5   2       1       6       3       2       B       two
6   6       5       7       9       9       E       two