根据多列条件在数据框中添加新列
Add new column in dataframe based on multiple column conditions
我有以下带有情绪的数据框:
Text
Negative
Neutral
Positive
I lost my phone. I am sad
0.8
0.15
0.05
How is your day?
0.1
0.8
0.1
Let's go out for dinner today.
0.06
0.55
0.39
I am super pissed at my friend for cancelling the party.
0.73
0.11
0.16
I am so happy I want to dance
0
0.1
0.9
I am not sure if I should laugh or just smile
0.08
0.24
0.68
这是基于我完成的情感分析。现在,每个文本都可以标记为 5:
中的任何一个
非常消极,消极,中立,积极,非常积极。
我想在数据框中添加一个新列,按照以下规则分析情绪和标签:
1。如果负数或正数的值最主要且 >= 0.8 (80%),则将其标记为非常负数或非常正数。
2。如果负数或正数的值最主要,但 >= 0.5 但小于 0.8,则只是负数或正数。
3。如果中性值 >= 0.5,则为中性。没有非常中性这样的东西。
对于上面的例子,结果应该如下所示:
Text
Negative
Neutral
Positive
Sentiment
I lost my phone. I am sad
0.8
0.15
0.05
Very Negative
How is your day?
0.1
0.8
0.1
Neutral
Let's go out for dinner today.
0.06
0.55
0.39
Neutral
I am super pissed at my friend for cancelling the party.
0.73
0.11
0.16
Negative
I am so happy I want to dance
0
0.1
0.9
Very Positive
I am not sure if I should laugh or just smile
0.08
0.24
0.68
Positive
如何在数据框中执行此操作。然后我想绘制一个图表来查看这 5 种情绪中每一种的分布。那部分我可以做,但我正在尝试让这个多个条件在 pandas.
上工作
非常感谢任何帮助。
您可以使用np.select()
conditions = [df['Positive']>=0.80, df['Negative']>=0.80, ((df['Positive']>=0.50) & (df['Positive']<0.80)),
((df['Negative']>=0.50) & (df['Negative']<0.80)), df['Neutral']>=0.5]
values = ['Very Positive', 'Very Negative', 'Positive', 'Negative', 'Neutral']
df['Sentiment'] = np.select(conditions, values, default=np.nan)
OUTPUT
Text Negative Neutral Positive Sentiment
0 I lost my phone. I am sad 0.80 0.15 0.05 Very Negative
1 How is your day? 0.10 0.80 0.10 Neutral
2 Let's go out for dinner today. 0.06 0.55 0.39 Neutral
3 I am super pissed at my friend for cancelling ... 0.73 0.11 0.16 Negative
4 I am so happy I want to dance 0.00 0.10 0.90 Very Positive
5 I am not sure if I should laugh or just smile 0.08 0.24 0.68 Positive
您可以创建一个将三个值映射到情绪的函数,然后使用 apply
方法为 table 中的每一行应用该函数。它应该生成一列(系列)。然后,将系列附加到主 table.
它应该看起来像这样
def your_fn(values):
pos = values["Positive"]
neu = values["Neutral"]
neg = values["Negative"]
# 1. If the value of negative or positive is most dominating and >= 0.8 (80%) then mark it as very negative or very positive.
if (pos >= .8):
return "Very positive"
if (neg >= .8):
return "Very negative"
# 2. If the value of negative or positive is most dominating but it is >= 0.5 but less than 0.8 then just negative or positive.
if (pos >= .5):
return "Positive"
if (neg >= .5):
return "Negative"
# 3. If the value of neutral is >= 0.5 then Neutral. There is no such thing as Very Neutral.
if (neu >= .5):
return "Neutral"
return "-"
df['Sentiment'] = df.apply(your_fn, axis=1)
另一种方法可能是获得 idxmax
/max
,如果不是“中性”,则为 ≥ 0.8 的值添加“非常”。
s = df.drop('Text', axis=1).idxmax(1)
m = df.drop('Text', axis=1).max(1)
df['Sentiment'] = np.where(m.ge(0.8)&s.ne('Neutral'), 'Very '+s, s)
输出:
Text Negative Neutral Positive Sentiment
0 I lost my phone. I am sad 0.80 0.15 0.05 Very Negative
1 How is your day? 0.10 0.80 0.10 Neutral
2 Let's go out for dinner today. 0.06 0.55 0.39 Neutral
3 I am super pissed at my friend for cancelling the party. 0.73 0.11 0.16 Negative
4 I am so happy I want to dance 0.00 0.10 0.90 Very Positive
5 I am not sure if I should laugh or just smile 0.08 0.24 0.68 Positive
我有以下带有情绪的数据框:
Text | Negative | Neutral | Positive |
---|---|---|---|
I lost my phone. I am sad | 0.8 | 0.15 | 0.05 |
How is your day? | 0.1 | 0.8 | 0.1 |
Let's go out for dinner today. | 0.06 | 0.55 | 0.39 |
I am super pissed at my friend for cancelling the party. | 0.73 | 0.11 | 0.16 |
I am so happy I want to dance | 0 | 0.1 | 0.9 |
I am not sure if I should laugh or just smile | 0.08 | 0.24 | 0.68 |
这是基于我完成的情感分析。现在,每个文本都可以标记为 5:
中的任何一个非常消极,消极,中立,积极,非常积极。
我想在数据框中添加一个新列,按照以下规则分析情绪和标签:
1。如果负数或正数的值最主要且 >= 0.8 (80%),则将其标记为非常负数或非常正数。
2。如果负数或正数的值最主要,但 >= 0.5 但小于 0.8,则只是负数或正数。
3。如果中性值 >= 0.5,则为中性。没有非常中性这样的东西。
对于上面的例子,结果应该如下所示:
Text | Negative | Neutral | Positive | Sentiment |
---|---|---|---|---|
I lost my phone. I am sad | 0.8 | 0.15 | 0.05 | Very Negative |
How is your day? | 0.1 | 0.8 | 0.1 | Neutral |
Let's go out for dinner today. | 0.06 | 0.55 | 0.39 | Neutral |
I am super pissed at my friend for cancelling the party. | 0.73 | 0.11 | 0.16 | Negative |
I am so happy I want to dance | 0 | 0.1 | 0.9 | Very Positive |
I am not sure if I should laugh or just smile | 0.08 | 0.24 | 0.68 | Positive |
如何在数据框中执行此操作。然后我想绘制一个图表来查看这 5 种情绪中每一种的分布。那部分我可以做,但我正在尝试让这个多个条件在 pandas.
上工作非常感谢任何帮助。
您可以使用np.select()
conditions = [df['Positive']>=0.80, df['Negative']>=0.80, ((df['Positive']>=0.50) & (df['Positive']<0.80)),
((df['Negative']>=0.50) & (df['Negative']<0.80)), df['Neutral']>=0.5]
values = ['Very Positive', 'Very Negative', 'Positive', 'Negative', 'Neutral']
df['Sentiment'] = np.select(conditions, values, default=np.nan)
OUTPUT
Text Negative Neutral Positive Sentiment
0 I lost my phone. I am sad 0.80 0.15 0.05 Very Negative
1 How is your day? 0.10 0.80 0.10 Neutral
2 Let's go out for dinner today. 0.06 0.55 0.39 Neutral
3 I am super pissed at my friend for cancelling ... 0.73 0.11 0.16 Negative
4 I am so happy I want to dance 0.00 0.10 0.90 Very Positive
5 I am not sure if I should laugh or just smile 0.08 0.24 0.68 Positive
您可以创建一个将三个值映射到情绪的函数,然后使用 apply
方法为 table 中的每一行应用该函数。它应该生成一列(系列)。然后,将系列附加到主 table.
它应该看起来像这样
def your_fn(values):
pos = values["Positive"]
neu = values["Neutral"]
neg = values["Negative"]
# 1. If the value of negative or positive is most dominating and >= 0.8 (80%) then mark it as very negative or very positive.
if (pos >= .8):
return "Very positive"
if (neg >= .8):
return "Very negative"
# 2. If the value of negative or positive is most dominating but it is >= 0.5 but less than 0.8 then just negative or positive.
if (pos >= .5):
return "Positive"
if (neg >= .5):
return "Negative"
# 3. If the value of neutral is >= 0.5 then Neutral. There is no such thing as Very Neutral.
if (neu >= .5):
return "Neutral"
return "-"
df['Sentiment'] = df.apply(your_fn, axis=1)
另一种方法可能是获得 idxmax
/max
,如果不是“中性”,则为 ≥ 0.8 的值添加“非常”。
s = df.drop('Text', axis=1).idxmax(1)
m = df.drop('Text', axis=1).max(1)
df['Sentiment'] = np.where(m.ge(0.8)&s.ne('Neutral'), 'Very '+s, s)
输出:
Text Negative Neutral Positive Sentiment
0 I lost my phone. I am sad 0.80 0.15 0.05 Very Negative
1 How is your day? 0.10 0.80 0.10 Neutral
2 Let's go out for dinner today. 0.06 0.55 0.39 Neutral
3 I am super pissed at my friend for cancelling the party. 0.73 0.11 0.16 Negative
4 I am so happy I want to dance 0.00 0.10 0.90 Very Positive
5 I am not sure if I should laugh or just smile 0.08 0.24 0.68 Positive