将 textblob 应用于数据框系列的问题

Question

将我的数据集拆分为训练集、测试集和验证集后，我有一个 x_validation 集，它是一组字符串。调用 x_validation.head() 得到：

0    this drink is making my throat hurt more and need to convince corey to go to jacks mannequin concert obvs will be in need of advil
1                         there gonna be movie on no can see it not even the trailers hate thinking about it as it is ll have breakdown
2                                                 the wire on my braces is too long and is cutting through my cheek farrrrrrrk it hurts
3                                             finally have uploaded my documentary to an external site message me for link and password
4                                        lovely national day today hour children parade and hour citizens parade with ju jitsu training

总共有大约 15,000 个字符串。我正在尝试创建一个新列表 tbresult，其中包含由 TextBlob 计算的每个字符串的情感极性分数：

tbresult = [TextBlob(i).sentiment.polarity for i in x_validation]

这给了我以下错误：

TypeError: The `text` argument passed to `__init__(text)` must be a string, not <class 'float'>

我很困惑，因为当我执行以下操作时，

lst = [x for x in x_validation]
TextBlob(lst[0]).sentiment.polarity

有效，我得到 0.5。我很困惑这个浮点类型在错误中的来源。我该如何正确执行此操作？

Answer 1

尝试删除包含浮点值的行，或使用 .isna().sum() 而不是使用 dropna。

def remove_floats(row):
  if isinstance(row, str):
    return row
  else:
    return None

df = pd.DataFrame({'col':['balh_1', 'blah_2', 1.0, 'blah_3']})

for key in df:
  df[key] = df[key].apply(remove_floats)

df.dropna(inplace=True)

df

     col
0   balh_1
1   blah_2
3   blah_3

将 textblob 应用于数据框系列的问题

Issue applying textblob to a dataframe series

python

sentiment-analysis