根据多列条件在数据框中添加新列

Question

我有以下带有情绪的数据框：

Text	Negative	Neutral	Positive
I lost my phone. I am sad	0.8	0.15	0.05
How is your day?	0.1	0.8	0.1
Let's go out for dinner today.	0.06	0.55	0.39
I am super pissed at my friend for cancelling the party.	0.73	0.11	0.16
I am so happy I want to dance	0	0.1	0.9
I am not sure if I should laugh or just smile	0.08	0.24	0.68

这是基于我完成的情感分析。现在，每个文本都可以标记为 5:

中的任何一个

非常消极，消极，中立，积极，非常积极。

我想在数据框中添加一个新列，按照以下规则分析情绪和标签：

1。如果负数或正数的值最主要且 >= 0.8 (80%)，则将其标记为非常负数或非常正数。

2。如果负数或正数的值最主要，但 >= 0.5 但小于 0.8，则只是负数或正数。

3。如果中性值 >= 0.5，则为中性。没有非常中性这样的东西。

对于上面的例子，结果应该如下所示：

Text	Negative	Neutral	Positive	Sentiment
I lost my phone. I am sad	0.8	0.15	0.05	Very Negative
How is your day?	0.1	0.8	0.1	Neutral
Let's go out for dinner today.	0.06	0.55	0.39	Neutral
I am super pissed at my friend for cancelling the party.	0.73	0.11	0.16	Negative
I am so happy I want to dance	0	0.1	0.9	Very Positive
I am not sure if I should laugh or just smile	0.08	0.24	0.68	Positive

如何在数据框中执行此操作。然后我想绘制一个图表来查看这 5 种情绪中每一种的分布。那部分我可以做，但我正在尝试让这个多个条件在 pandas.

上工作

非常感谢任何帮助。

Answer 1

您可以使用np.select()

conditions = [df['Positive']>=0.80, df['Negative']>=0.80, ((df['Positive']>=0.50) & (df['Positive']<0.80)),
              ((df['Negative']>=0.50) & (df['Negative']<0.80)), df['Neutral']>=0.5]
values = ['Very Positive', 'Very Negative', 'Positive', 'Negative', 'Neutral']
df['Sentiment'] = np.select(conditions, values, default=np.nan)

OUTPUT

                                               Text  Negative  Neutral  Positive      Sentiment
0                          I lost my phone. I am sad      0.80     0.15      0.05  Very Negative
1                                   How is your day?      0.10     0.80      0.10        Neutral
2                     Let's go out for dinner today.      0.06     0.55      0.39        Neutral
3  I am super pissed at my friend for cancelling ...      0.73     0.11      0.16       Negative
4                     I am so happy  I want to dance      0.00     0.10      0.90  Very Positive
5      I am not sure if I should laugh or just smile      0.08     0.24      0.68       Positive

Answer 2

您可以创建一个将三个值映射到情绪的函数，然后使用 apply 方法为 table 中的每一行应用该函数。它应该生成一列（系列）。然后，将系列附加到主 table.

它应该看起来像这样

def your_fn(values):
  pos = values["Positive"]
  neu = values["Neutral"]
  neg = values["Negative"]

  # 1. If the value of negative or positive is most dominating and >= 0.8 (80%) then mark it as very negative or very positive.
  if (pos >= .8): 
    return "Very positive"
  if (neg >= .8):
    return "Very negative"
  
  # 2. If the value of negative or positive is most dominating but it is >= 0.5 but less than 0.8 then just negative or positive.
  if (pos >= .5): 
    return "Positive"
  if (neg >= .5):
    return "Negative"

  # 3. If the value of neutral is >= 0.5 then Neutral. There is no such thing as Very Neutral.
  if (neu >= .5):
    return "Neutral"
  
  return "-"

df['Sentiment'] = df.apply(your_fn, axis=1)

Answer 3

另一种方法可能是获得 idxmax/max，如果不是“中性”，则为 ≥ 0.8 的值添加“非常”。

s = df.drop('Text', axis=1).idxmax(1)
m = df.drop('Text', axis=1).max(1)
df['Sentiment'] = np.where(m.ge(0.8)&s.ne('Neutral'), 'Very '+s, s)

输出：

                                                       Text  Negative  Neutral  Positive      Sentiment
0                                 I lost my phone. I am sad      0.80     0.15      0.05  Very Negative
1                                          How is your day?      0.10     0.80      0.10        Neutral
2                            Let's go out for dinner today.      0.06     0.55      0.39        Neutral
3  I am super pissed at my friend for cancelling the party.      0.73     0.11      0.16       Negative
4                            I am so happy  I want to dance      0.00     0.10      0.90  Very Positive
5             I am not sure if I should laugh or just smile      0.08     0.24      0.68       Positive

根据多列条件在数据框中添加新列

Add new column in dataframe based on multiple column conditions

python

data-manipulation

dataframe

pandas