连接两列并在重复时保留唯一值

Concat two columns and keep unique values if repeating

我有一个数据框如下:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, 'dessert', None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df
Out[10]:
foodstuff           type
0   apple-martini   None
1   apple-pie       None
2   None            strawberry-tart
3   dessert         dessert
4   None            None

我想实现以下目标:

df
Out[10]:
Combined
0   apple-martini   
1   apple-pie       
2   strawberry-tart
3   dessert         
4   None            

解决方案 解决了当其中一列肯定是 None 时合并列的情况。我正在尝试实现这样一种情况,即如果两列在一行中重复值,则只保留一个值。

我们可以做到fillna

df['combine'] = df.foodstuff.fillna(df.type)
0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4               None
Name: foodstuff, dtype: object

您可以使用 combine_first:

df['combined'] = df['foodstuff'].combine_first(df['type'])
print(df)

# Output:
       foodstuff             type         combined
0  apple-martini             None    apple-martini
1      apple-pie             None        apple-pie
2           None  strawberry-tart  strawberry-tart
3        dessert          dessert          dessert
4           None             None             None

如果您需要处理某些列值为 '' 的情况,并在不相等的情况下采用列值的并集

test_df = pd.DataFrame({'col_1':['apple-martini', 'apple-pie', None, 'dessert', None, '', 'brown'], 'col_2':[None, None, 'strawberry-tart', 'dessert', None, 'cupcake', 'rice']})
test_df.fillna('', inplace=True)
test_df['col_1'] = test_df.apply(lambda x: x['col_1'].split(), axis=1)
test_df['col_2'] = test_df.apply(lambda x: x['col_2'].split(), axis=1)
test_df['set'] = test_df.apply(lambda x: set(x['col_1'] + x['col_2']), axis=1)
test_df['combined'] = test_df.apply(lambda x: ''.join(sorted(x['set'])), axis=1)

print(test_df['combined'])
#result
0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                   
5            cupcake
6          brownrice