如何根据列值标记多个数据框行 python

Question

我有如下数据框：

ID Reviews              Sorted  pairwise         scores
A   This is great         0     [(0, 1)]         [0.26386763883335373]
A   works well            1     []               []
B   can this be changed   0     [(0, 1), (0, 2)] [0.1179287227608669, 0.36815020951152794]
B   how to perform that   1     [(1, 2)]         [0.03299057711398918]
B   summarize it          2     []               []

排序是 ID 中重复项的顺序。 Pairwise 是按 ID 分组的成对组合。我通过使用成对组合得到了分数列。现在我需要创建一个标记列，如果分数 > 0.15，则标记 'Yes' 基于成对列。例如，当按 ID 分组时，值 B 的分数 > 0.15 是 0.36，当我们查看成对列 (0,2) 时，即应该标记 0 和 2 行 'yes'.

我想要的输出是：

ID Reviews              Sorted  pairwise         scores                                    Flag
A   This is great         0     [(0, 1)]         [0.26386763883335373]                      yes
A   works well            1     []               []                                         yes
B   can this be changed   0     [(0, 1), (0, 2)] [0.1179287227608669, 0.36815020951152794]  yes
B   how to perform that   1     [(1, 2)]         [0.03299057711398918]                      No
B   summarize it          2     []               []                                         yes

我尝试使用 np.where 来计算分数，但对我不起作用。

任何人都可以提出解决方法或任何想法吗？提前致谢！

Answer 1

我们做 explode，然后 merge 返回

s=df.scores.explode()
s=df.set_index('ID').pairwise.explode()[(s>0.15).values].explode()
df=df.merge(s.to_frame('Sorted').reset_index().assign(flag='Yes'),how='left')
df.flag.fillna('No',inplace=True)
df
                                      scores          pairwise Sorted ID flag
0                      [0.26386763883335373]          [(0, 1)]      0  A  Yes
1                                         []                []      1  A  Yes
2  [0.1179287227608669, 0.36815020951152794]  [(0, 1), (0, 2)]      0  B  Yes
3                      [0.03299057711398918]          [(1, 2)]      1  B   No
4                                         []                []      2  B  Yes

如何根据列值标记多个数据框行 python

How to flag multiple dataframe rows based on a column value python

python

comparison

pandas