使用 Python - 如果可以在 "colB" 中找到 "colA"，我如何通过返回 "colA" 的值来创建新列 ("new_col")

Question

我被困在一个项目上。我正在尝试通过检查两列（A 和 B）来创建一个新的唯一列，如果 A 中的值存在于 B 中的任何位置，或者 B 的值存在于 A 中的任何位置 return 该值，否则 return一个“”。例如，我有;

    colA colB
0    x     
1    y     
2         c
3         d
4         x
5    d     
6

第一次比较 colA 和 colB 后，我期待这样的结果；

  colA colB new_colA
0    x             x
1    y             y
2         c         
3         d        d
4         x        x
5    d             d
6

这是第二次；

  colA colB new_colA new_colB
0    x             x         
1    y             y         
2         c                 c
3         d        d         
4         x        x         
5    d             d         
6

我不知道如何使用 python。我尝试了 excel，其中我只是使用条件格式来突出显示重复项。

Answer 1

如果空单元格中有 NaN，可以使用：

m = df['colB'].isin(df['colA'])
df['new_colA'] = df['colB'].where(m).fillna(df['colA'])
df['new_colB'] = df['colB'].mask(m)

输出：

  colA colB new_colA new_colB
0    x  NaN        x      NaN
1    y  NaN        y      NaN
2  NaN    c      NaN        c
3  NaN    d        d      NaN
4  NaN    x        x      NaN
5    d  NaN        d      NaN
6  NaN  NaN      NaN      NaN

空字符串的变体：

m = df['colB'].isin(df['colA'])&df['colB'].ne('')
df['new_colA'] = df['colB'].where(m).fillna(df['colA'])
df['new_colB'] = df['colB'].mask(m).fillna('')

输出：

  colA colB new_colA new_colB
0    x             x         
1    y             y         
2         c                 c
3         d        d         
4         x        x         
5    d             d         
6

使用 Python - 如果可以在 "colB" 中找到 "colA"，我如何通过返回 "colA" 的值来创建新列 ("new_col")

Using Python - How can I create a new column ("new_col") by returning the value of "colA" if "colA" can be found in "colB"

python

calculated-columns

dataframe

pandas

jupyter-notebook