迭代两个 Pandas 数据框 + 创建新列

Question

我是使用 Pandas 的新手，我正在尝试遍历来自不同数据帧的两列，如果两列具有相同的词，则将“是”附加到另一列。如果不是，请附加“否”一词。

这是我的：

    for row in df1.iterrows():
     for word in df2.iterrows():
       if df1['word1'] == df2['word2']:
         df1.column1.append('Yes') #I just want to have two columns in binary form, if one is yes the other must be no
         df2.column2.append('No')

       else:
         df1.column1.append('No')
         df2.column2.append('Yes')

我现在有：

      column1      column2  column3   
       apple        None    None
       orange       None    None
       banana       None    None
       tomato       None    None
       sugar        None    None
       grapes       None    None
       fig          None    None

我要：

      column1      column2  column3   
       apple           Yes       No
       orange          No        No
       banana          No        No
       tomato          No        No
       sugar           No        Yes
       grapes          No        Yes
       figs            No        Yes


    Sample of words from df1: apple, orange, pear
    Sample of words from df2: yellow, orange, green

我收到此错误： 只能比较标记相同的 Series 对象

注：df2的字数是2500，df1的字数是500。感谢您的帮助！

Answer 1

我认为从两列中获取 set 个单词然后进行查找可能是一个更好的主意。它也会更快。像这样：

words_df1 = set(df1['word1'].tolist())
words_df2 = set(df2['word2'].tolist())

然后做

df1['has_word2'] = df1['word1'].isin(words_df2)
df2['has_word1'] = df2['word2'].isin(words_df1)

Answer 2

其实你要填写：

df1.column1 具有：
- Yes - 如果此行的 word1 出现在 df2.word1 中（在任何行),
- 否 - 否则，
df2.column2 具有：
- Yes - 如果此行的 word2 出现在 df1.word2 中（在任何行),
- 否 - 否则。

要做到这一点，您可以运行:

df1['column1'] = np.where(df1.word1.isin(df2.word2), 'Yes', 'No')
df2['column2'] = np.where(df2.word2.isin(df1.word1), 'Yes', 'No')

为了测试我的代码，我使用了以下数据帧：

df1:                 df2:
        word1                word2
0       apple        0      yellow
1      orange        1      orange
2        pear        2       green
3  strawberry        3  strawberry
                     4        plum

我的代码的结果是：

df1:                       df2:
        word1 column1              word2 column2
0       apple      No      0      yellow      No
1      orange     Yes      1      orange     Yes
2        pear      No      2       green      No
3  strawberry     Yes      3  strawberry     Yes
                           4        plum      No

迭代两个 Pandas 数据框 + 创建新列

Iteration through two Pandas Dataframes + create new column

matching

conditional-statements

dataframe

pandas