连接两列并在重复时保留唯一值
Concat two columns and keep unique values if repeating
我有一个数据框如下:
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, 'dessert', None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})
df
Out[10]:
foodstuff type
0 apple-martini None
1 apple-pie None
2 None strawberry-tart
3 dessert dessert
4 None None
我想实现以下目标:
df
Out[10]:
Combined
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 None
解决方案 解决了当其中一列肯定是 None 时合并列的情况。我正在尝试实现这样一种情况,即如果两列在一行中重复值,则只保留一个值。
我们可以做到fillna
df['combine'] = df.foodstuff.fillna(df.type)
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 None
Name: foodstuff, dtype: object
您可以使用 combine_first
:
df['combined'] = df['foodstuff'].combine_first(df['type'])
print(df)
# Output:
foodstuff type combined
0 apple-martini None apple-martini
1 apple-pie None apple-pie
2 None strawberry-tart strawberry-tart
3 dessert dessert dessert
4 None None None
如果您需要处理某些列值为 ''
的情况,并在不相等的情况下采用列值的并集
test_df = pd.DataFrame({'col_1':['apple-martini', 'apple-pie', None, 'dessert', None, '', 'brown'], 'col_2':[None, None, 'strawberry-tart', 'dessert', None, 'cupcake', 'rice']})
test_df.fillna('', inplace=True)
test_df['col_1'] = test_df.apply(lambda x: x['col_1'].split(), axis=1)
test_df['col_2'] = test_df.apply(lambda x: x['col_2'].split(), axis=1)
test_df['set'] = test_df.apply(lambda x: set(x['col_1'] + x['col_2']), axis=1)
test_df['combined'] = test_df.apply(lambda x: ''.join(sorted(x['set'])), axis=1)
print(test_df['combined'])
#result
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4
5 cupcake
6 brownrice
我有一个数据框如下:
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, 'dessert', None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})
df
Out[10]:
foodstuff type
0 apple-martini None
1 apple-pie None
2 None strawberry-tart
3 dessert dessert
4 None None
我想实现以下目标:
df
Out[10]:
Combined
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 None
解决方案
我们可以做到fillna
df['combine'] = df.foodstuff.fillna(df.type)
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 None
Name: foodstuff, dtype: object
您可以使用 combine_first
:
df['combined'] = df['foodstuff'].combine_first(df['type'])
print(df)
# Output:
foodstuff type combined
0 apple-martini None apple-martini
1 apple-pie None apple-pie
2 None strawberry-tart strawberry-tart
3 dessert dessert dessert
4 None None None
如果您需要处理某些列值为 ''
的情况,并在不相等的情况下采用列值的并集
test_df = pd.DataFrame({'col_1':['apple-martini', 'apple-pie', None, 'dessert', None, '', 'brown'], 'col_2':[None, None, 'strawberry-tart', 'dessert', None, 'cupcake', 'rice']})
test_df.fillna('', inplace=True)
test_df['col_1'] = test_df.apply(lambda x: x['col_1'].split(), axis=1)
test_df['col_2'] = test_df.apply(lambda x: x['col_2'].split(), axis=1)
test_df['set'] = test_df.apply(lambda x: set(x['col_1'] + x['col_2']), axis=1)
test_df['combined'] = test_df.apply(lambda x: ''.join(sorted(x['set'])), axis=1)
print(test_df['combined'])
#result
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4
5 cupcake
6 brownrice