获取表情符号的情感得分#Python

Get sentiment score of emoji #Python

df
0        NaN
1        NaN
2         
3        NaN
4          ❤
        ... 
26368    NaN
26369    NaN
26370    NaN
26371     
26372    NaN
Name: emojis, Length: 26373, dtype: object

根据上面的df,我想计算每一行表情符号的情感分数。 如果为 NaN,则 returnNaN。

#!pip install emosent-py
from emosent import get_emoji_sentiment_rank
def emoji_sentiment(text):
    return get_emoji_sentiment_rank(text)["sentiment_score"]

emoji_sentiment("")
--> 0.221

应用于整列

df['emoji_sentiment'] = df['emojis'].apply(emoji_sentiment)

上面的代码returns KeyError: nan

预期结果:

          df             emoji_sentiment
0        NaN         |         NaN
1        NaN         |         NaN
2               |  (a decimal number)
3        NaN         |         NaN
4          ❤        |   (a decimal number)
        ... 
26368    NaN         |         NaN
26369    NaN         |         NaN
26370    NaN         |         NaN
26371            |   (a decimal number)
26372    NaN         |         NaN

根据您的错误,我猜如果文本是 NaNget_emoji_sentiment_rank(text)["sentiment_score"] 会失败,因此您可以应用该函数并将更新仅分配给重新 non-nan 的行(最好,但您首先需要使用默认 NaN 值创建列 emoji_sentiment):

df['emoji_sentiment'] = np.NaN # init the value for all rows
not_na_idx = ~df.emojis.isna()
df.loc[not_na_idx, 'emoji_sentiment'] = df.loc[not_na_idx, 'emojis'].apply(emoji_sentiment)

或者您更改 emoji_sentiment() 的 return:

def emoji_sentiment(text):
    return get_emoji_sentiment_rank(text)["sentiment_score"] if not pd.isna(text) else np.NaN

(更丑陋,性能更差,但仍然可行)