删除熊猫记录中的重复值
Remove Duplicates values in a Panda's Record
我想删除动物列每行中的重复项。
我需要这样的东西 post,但在 python 中。由于某种原因,我现在无法解决这个问题,我遇到了障碍。
我试过使用 drop duplicates、unique、nunique 等。没有成功。
df.drop_duplicates(subset=None, keep="first", inplace=False)
df
df = pd.DataFrame ({'animals':['pink pig, pink pig, pink pig','brown cow, brown cow','pink pig, black cow','brown horse, pink pig, brown cow, black cow, brown cow']})
#input:
animals
0 pink pig, pink pig, pink pig
1 brown cow, brown cow
2 pink pig, black cow
3 brown horse, pink pig, brown cow, black cow, brown cow
#I would like the output to look like this:
animals
0 pink pig
1 brown cow
2 pink pig, black cow
3 brown horse, pink pig, brown cow, black cow
这样做:
df = pd.DataFrame ({'animals':['pink pig, pink pig, pink pig','brown cow, brown cow','pink pig, black cow','brown horse, pink pig, brown cow, black cow, brown cow']})
df['animals2'] = df.animals.apply(lambda x: ', '.join(list(set(x.split(', ')))))
输出:
0 pink pig
1 brown cow
2 pink pig, black cow
3 brown cow, brown horse, pink pig, black cow
解释:
我把你的字符串变成了一个列表。然后我把列表变成一个集合来删除重复项。然后我把这个集合变成一个列表,然后我把列表拆分成一个字符串。有什么不明白的地方请告诉我!
如果您希望保留项目的原始顺序(转换为集合会使它们无序),以下函数应该有效。
def drop_duplicates(items):
# `items` is a comma separated string, e.g. "dog, dog, cat".
result = []
seen = set()
for item in items.split(','):
item = item.strip()
if item not in seen:
seen.update([item])
result.append(item)
return ', '.join(result)
>>> df['animals'].apply(drop_duplicates)
0 pig
1 cow
2 pig, cow
3 horse, pig, cow
Name: animals, dtype: object
我想删除动物列每行中的重复项。
我需要这样的东西 post,但在 python 中。由于某种原因,我现在无法解决这个问题,我遇到了障碍。
我试过使用 drop duplicates、unique、nunique 等。没有成功。
df.drop_duplicates(subset=None, keep="first", inplace=False) df
df = pd.DataFrame ({'animals':['pink pig, pink pig, pink pig','brown cow, brown cow','pink pig, black cow','brown horse, pink pig, brown cow, black cow, brown cow']})
#input:
animals
0 pink pig, pink pig, pink pig
1 brown cow, brown cow
2 pink pig, black cow
3 brown horse, pink pig, brown cow, black cow, brown cow
#I would like the output to look like this:
animals
0 pink pig
1 brown cow
2 pink pig, black cow
3 brown horse, pink pig, brown cow, black cow
这样做:
df = pd.DataFrame ({'animals':['pink pig, pink pig, pink pig','brown cow, brown cow','pink pig, black cow','brown horse, pink pig, brown cow, black cow, brown cow']})
df['animals2'] = df.animals.apply(lambda x: ', '.join(list(set(x.split(', ')))))
输出:
0 pink pig
1 brown cow
2 pink pig, black cow
3 brown cow, brown horse, pink pig, black cow
解释:
我把你的字符串变成了一个列表。然后我把列表变成一个集合来删除重复项。然后我把这个集合变成一个列表,然后我把列表拆分成一个字符串。有什么不明白的地方请告诉我!
如果您希望保留项目的原始顺序(转换为集合会使它们无序),以下函数应该有效。
def drop_duplicates(items):
# `items` is a comma separated string, e.g. "dog, dog, cat".
result = []
seen = set()
for item in items.split(','):
item = item.strip()
if item not in seen:
seen.update([item])
result.append(item)
return ', '.join(result)
>>> df['animals'].apply(drop_duplicates)
0 pig
1 cow
2 pig, cow
3 horse, pig, cow
Name: animals, dtype: object