根据多列条件在数据框中添加新列

Add new column in dataframe based on multiple column conditions

我有以下带有情绪的数据框:

Text Negative Neutral Positive
I lost my phone. I am sad 0.8 0.15 0.05
How is your day? 0.1 0.8 0.1
Let's go out for dinner today. 0.06 0.55 0.39
I am super pissed at my friend for cancelling the party. 0.73 0.11 0.16
I am so happy  I want to dance 0 0.1 0.9
I am not sure if I should laugh or just smile 0.08 0.24 0.68

这是基于我完成的情感分析。现在,每个文本都可以标记为 5:

中的任何一个

非常消极,消极,中立,积极,非常积极。

我想在数据框中添加一个新列,按照以下规则分析情绪和标签:

1。如果负数或正数的值最主要且 >= 0.8 (80%),则将其标记为非常负数或非常正数。

2。如果负数或正数的值最主要,但 >= 0.5 但小于 0.8,则只是负数或正数。

3。如果中性值 >= 0.5,则为中性。没有非常中性这样的东西。

对于上面的例子,结果应该如下所示:

Text Negative Neutral Positive Sentiment
I lost my phone. I am sad 0.8 0.15 0.05 Very Negative
How is your day? 0.1 0.8 0.1 Neutral
Let's go out for dinner today. 0.06 0.55 0.39 Neutral
I am super pissed at my friend for cancelling the party. 0.73 0.11 0.16 Negative
I am so happy  I want to dance 0 0.1 0.9 Very Positive
I am not sure if I should laugh or just smile 0.08 0.24 0.68 Positive

如何在数据框中执行此操作。然后我想绘制一个图表来查看这 5 种情绪中每一种的分布。那部分我可以做,但我正在尝试让这个多个条件在 pandas.

上工作

非常感谢任何帮助。

您可以使用np.select()

conditions = [df['Positive']>=0.80, df['Negative']>=0.80, ((df['Positive']>=0.50) & (df['Positive']<0.80)),
              ((df['Negative']>=0.50) & (df['Negative']<0.80)), df['Neutral']>=0.5]
values = ['Very Positive', 'Very Negative', 'Positive', 'Negative', 'Neutral']
df['Sentiment'] = np.select(conditions, values, default=np.nan)

OUTPUT

                                               Text  Negative  Neutral  Positive      Sentiment
0                          I lost my phone. I am sad      0.80     0.15      0.05  Very Negative
1                                   How is your day?      0.10     0.80      0.10        Neutral
2                     Let's go out for dinner today.      0.06     0.55      0.39        Neutral
3  I am super pissed at my friend for cancelling ...      0.73     0.11      0.16       Negative
4                     I am so happy  I want to dance      0.00     0.10      0.90  Very Positive
5      I am not sure if I should laugh or just smile      0.08     0.24      0.68       Positive

您可以创建一个将三个值映射到情绪的函数,然后使用 apply 方法为 table 中的每一行应用该函数。它应该生成一列(系列)。然后,将系列附加到主 table.

它应该看起来像这样

def your_fn(values):
  pos = values["Positive"]
  neu = values["Neutral"]
  neg = values["Negative"]

  # 1. If the value of negative or positive is most dominating and >= 0.8 (80%) then mark it as very negative or very positive.
  if (pos >= .8): 
    return "Very positive"
  if (neg >= .8):
    return "Very negative"
  
  # 2. If the value of negative or positive is most dominating but it is >= 0.5 but less than 0.8 then just negative or positive.
  if (pos >= .5): 
    return "Positive"
  if (neg >= .5):
    return "Negative"

  # 3. If the value of neutral is >= 0.5 then Neutral. There is no such thing as Very Neutral.
  if (neu >= .5):
    return "Neutral"
  
  return "-"

df['Sentiment'] = df.apply(your_fn, axis=1)

另一种方法可能是获得 idxmax/max,如果不是“中性”,则为 ≥ 0.8 的值添加“非常”。

s = df.drop('Text', axis=1).idxmax(1)
m = df.drop('Text', axis=1).max(1)
df['Sentiment'] = np.where(m.ge(0.8)&s.ne('Neutral'), 'Very '+s, s)

输出:

                                                       Text  Negative  Neutral  Positive      Sentiment
0                                 I lost my phone. I am sad      0.80     0.15      0.05  Very Negative
1                                          How is your day?      0.10     0.80      0.10        Neutral
2                            Let's go out for dinner today.      0.06     0.55      0.39        Neutral
3  I am super pissed at my friend for cancelling the party.      0.73     0.11      0.16       Negative
4                            I am so happy  I want to dance      0.00     0.10      0.90  Very Positive
5             I am not sure if I should laugh or just smile      0.08     0.24      0.68       Positive