获取表情符号的情感得分#Python
Get sentiment score of emoji #Python
df
0 NaN
1 NaN
2
3 NaN
4 ❤
...
26368 NaN
26369 NaN
26370 NaN
26371
26372 NaN
Name: emojis, Length: 26373, dtype: object
根据上面的df,我想计算每一行表情符号的情感分数。
如果为 NaN,则 returnNaN。
#!pip install emosent-py
from emosent import get_emoji_sentiment_rank
def emoji_sentiment(text):
return get_emoji_sentiment_rank(text)["sentiment_score"]
emoji_sentiment("")
--> 0.221
应用于整列
df['emoji_sentiment'] = df['emojis'].apply(emoji_sentiment)
上面的代码returns KeyError: nan
预期结果:
df emoji_sentiment
0 NaN | NaN
1 NaN | NaN
2 | (a decimal number)
3 NaN | NaN
4 ❤ | (a decimal number)
...
26368 NaN | NaN
26369 NaN | NaN
26370 NaN | NaN
26371 | (a decimal number)
26372 NaN | NaN
根据您的错误,我猜如果文本是 NaN
,get_emoji_sentiment_rank(text)["sentiment_score"]
会失败,因此您可以应用该函数并将更新仅分配给重新 non-nan 的行(最好,但您首先需要使用默认 NaN
值创建列 emoji_sentiment
):
df['emoji_sentiment'] = np.NaN # init the value for all rows
not_na_idx = ~df.emojis.isna()
df.loc[not_na_idx, 'emoji_sentiment'] = df.loc[not_na_idx, 'emojis'].apply(emoji_sentiment)
或者您更改 emoji_sentiment()
的 return:
def emoji_sentiment(text):
return get_emoji_sentiment_rank(text)["sentiment_score"] if not pd.isna(text) else np.NaN
(更丑陋,性能更差,但仍然可行)
df
0 NaN
1 NaN
2
3 NaN
4 ❤
...
26368 NaN
26369 NaN
26370 NaN
26371
26372 NaN
Name: emojis, Length: 26373, dtype: object
根据上面的df,我想计算每一行表情符号的情感分数。 如果为 NaN,则 returnNaN。
#!pip install emosent-py
from emosent import get_emoji_sentiment_rank
def emoji_sentiment(text):
return get_emoji_sentiment_rank(text)["sentiment_score"]
emoji_sentiment("")
--> 0.221
应用于整列
df['emoji_sentiment'] = df['emojis'].apply(emoji_sentiment)
上面的代码returns KeyError: nan
预期结果:
df emoji_sentiment
0 NaN | NaN
1 NaN | NaN
2 | (a decimal number)
3 NaN | NaN
4 ❤ | (a decimal number)
...
26368 NaN | NaN
26369 NaN | NaN
26370 NaN | NaN
26371 | (a decimal number)
26372 NaN | NaN
根据您的错误,我猜如果文本是 NaN
,get_emoji_sentiment_rank(text)["sentiment_score"]
会失败,因此您可以应用该函数并将更新仅分配给重新 non-nan 的行(最好,但您首先需要使用默认 NaN
值创建列 emoji_sentiment
):
df['emoji_sentiment'] = np.NaN # init the value for all rows
not_na_idx = ~df.emojis.isna()
df.loc[not_na_idx, 'emoji_sentiment'] = df.loc[not_na_idx, 'emojis'].apply(emoji_sentiment)
或者您更改 emoji_sentiment()
的 return:
def emoji_sentiment(text):
return get_emoji_sentiment_rank(text)["sentiment_score"] if not pd.isna(text) else np.NaN
(更丑陋,性能更差,但仍然可行)