pandas:删除两列中具有相同索引的行中的重复值
pandas: removing duplicate values in rows with same index in two columns
我有一个数据框如下:
import numpy as np
import pandas as pd
df = pd.DataFrame({'text':['she is good', 'she is bad'], 'label':['she is good', 'she is good']})
我想按行进行比较,如果两个相同索引的行具有相同的值,请将 'label' 列中的重复项替换为单词 'same'。
期望的输出:
pos label
0 she is good same
1 she is bad she is good
到目前为止,我已经尝试了以下方法,但 returns 出现错误:
ValueError: Length of values (1) does not match length of index (2)
df['label'] =np.where(df.query("text == label"), df['label']== ' ',df['label']==df['label'] )
您的语法不正确,请查看 numpy.where
的文档。
检查两列之间的相等性,并替换标签列中的值:
import numpy as np
df['label'] = np.where(df['text'].eq(df['label']),'same',df['label'])
打印:
text label
0 she is good same
1 she is bad she is good
我有一个数据框如下:
import numpy as np
import pandas as pd
df = pd.DataFrame({'text':['she is good', 'she is bad'], 'label':['she is good', 'she is good']})
我想按行进行比较,如果两个相同索引的行具有相同的值,请将 'label' 列中的重复项替换为单词 'same'。
期望的输出:
pos label
0 she is good same
1 she is bad she is good
到目前为止,我已经尝试了以下方法,但 returns 出现错误:
ValueError: Length of values (1) does not match length of index (2)
df['label'] =np.where(df.query("text == label"), df['label']== ' ',df['label']==df['label'] )
您的语法不正确,请查看 numpy.where
的文档。
检查两列之间的相等性,并替换标签列中的值:
import numpy as np
df['label'] = np.where(df['text'].eq(df['label']),'same',df['label'])
打印:
text label
0 she is good same
1 she is bad she is good