python pandas 按条件删除重复的列
python pandas drop duplicate columns by condition
我想按条件删除重复的列
所以我想做的是在“类型”相同(重复)的地方删除“数字”
我明白了
data={"col1":[2,3,4,5,9,2,6],
"col2":[4,2,4,6,0,1,5],
"col3":[7,6,0,11,3,6,7],
"col4":[14,11,22,8,6,3,9],
"col5":[0,5,7,3,8,2,9],
"type":["A","A","C","D","B","B","E"],
"number":["one","two","two","one","one","two","two"]}
df=pd.DataFrame.from_dict(data)
我想要这个
data={"col1":[3,4,5,2,6],
"col2":[2,4,6,1,5],
"col3":[6,0,11,6,7],
"col4":[11,22,8,3,9],
"col5":[5,7,3,2,9],
"type":["A","C","D","B","E"],
"number":["two","two","one","two","two"]}
df=pd.DataFrame.from_dict(data)
您可以链接 2 个条件 - select 所有非 one
值通过比较 Series.ne
and inverted mask with Series.duplicated
:
df1 = df[df['number'].ne('one') | ~df['type'].duplicated(keep=False)]
print (df1)
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
关于有序分类的另一个想法:
cats = pd.unique(['one'] + df['number'].unique().tolist())
df['number'] = pd.Categorical(df['number'], categories=cats, ordered=True)
df2 = df.sort_values('number').drop_duplicates(subset=['type'], keep='last').sort_index()
print (df2)
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
试试这个:
df = df.drop_duplicates(subset=['type'],keep='last')
print(df)
输出:
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
我想按条件删除重复的列 所以我想做的是在“类型”相同(重复)的地方删除“数字”
我明白了
data={"col1":[2,3,4,5,9,2,6],
"col2":[4,2,4,6,0,1,5],
"col3":[7,6,0,11,3,6,7],
"col4":[14,11,22,8,6,3,9],
"col5":[0,5,7,3,8,2,9],
"type":["A","A","C","D","B","B","E"],
"number":["one","two","two","one","one","two","two"]}
df=pd.DataFrame.from_dict(data)
我想要这个
data={"col1":[3,4,5,2,6],
"col2":[2,4,6,1,5],
"col3":[6,0,11,6,7],
"col4":[11,22,8,3,9],
"col5":[5,7,3,2,9],
"type":["A","C","D","B","E"],
"number":["two","two","one","two","two"]}
df=pd.DataFrame.from_dict(data)
您可以链接 2 个条件 - select 所有非 one
值通过比较 Series.ne
and inverted mask with Series.duplicated
:
df1 = df[df['number'].ne('one') | ~df['type'].duplicated(keep=False)]
print (df1)
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
关于有序分类的另一个想法:
cats = pd.unique(['one'] + df['number'].unique().tolist())
df['number'] = pd.Categorical(df['number'], categories=cats, ordered=True)
df2 = df.sort_values('number').drop_duplicates(subset=['type'], keep='last').sort_index()
print (df2)
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
试试这个:
df = df.drop_duplicates(subset=['type'],keep='last')
print(df)
输出:
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two