将 textblob 应用于数据框系列的问题
Issue applying textblob to a dataframe series
将我的数据集拆分为训练集、测试集和验证集后,我有一个 x_validation
集,它是一组字符串。调用 x_validation.head()
得到:
0 this drink is making my throat hurt more and need to convince corey to go to jacks mannequin concert obvs will be in need of advil
1 there gonna be movie on no can see it not even the trailers hate thinking about it as it is ll have breakdown
2 the wire on my braces is too long and is cutting through my cheek farrrrrrrk it hurts
3 finally have uploaded my documentary to an external site message me for link and password
4 lovely national day today hour children parade and hour citizens parade with ju jitsu training
总共有大约 15,000 个字符串。我正在尝试创建一个新列表 tbresult
,其中包含由 TextBlob 计算的每个字符串的情感极性分数:
tbresult = [TextBlob(i).sentiment.polarity for i in x_validation]
这给了我以下错误:
TypeError: The `text` argument passed to `__init__(text)` must be a string, not <class 'float'>
我很困惑,因为当我执行以下操作时,
lst = [x for x in x_validation]
TextBlob(lst[0]).sentiment.polarity
有效,我得到 0.5。我很困惑这个浮点类型在错误中的来源。我该如何正确执行此操作?
尝试删除包含浮点值的行,或使用 .isna().sum()
而不是使用 dropna
。
def remove_floats(row):
if isinstance(row, str):
return row
else:
return None
df = pd.DataFrame({'col':['balh_1', 'blah_2', 1.0, 'blah_3']})
for key in df:
df[key] = df[key].apply(remove_floats)
df.dropna(inplace=True)
df
col
0 balh_1
1 blah_2
3 blah_3
将我的数据集拆分为训练集、测试集和验证集后,我有一个 x_validation
集,它是一组字符串。调用 x_validation.head()
得到:
0 this drink is making my throat hurt more and need to convince corey to go to jacks mannequin concert obvs will be in need of advil
1 there gonna be movie on no can see it not even the trailers hate thinking about it as it is ll have breakdown
2 the wire on my braces is too long and is cutting through my cheek farrrrrrrk it hurts
3 finally have uploaded my documentary to an external site message me for link and password
4 lovely national day today hour children parade and hour citizens parade with ju jitsu training
总共有大约 15,000 个字符串。我正在尝试创建一个新列表 tbresult
,其中包含由 TextBlob 计算的每个字符串的情感极性分数:
tbresult = [TextBlob(i).sentiment.polarity for i in x_validation]
这给了我以下错误:
TypeError: The `text` argument passed to `__init__(text)` must be a string, not <class 'float'>
我很困惑,因为当我执行以下操作时,
lst = [x for x in x_validation]
TextBlob(lst[0]).sentiment.polarity
有效,我得到 0.5。我很困惑这个浮点类型在错误中的来源。我该如何正确执行此操作?
尝试删除包含浮点值的行,或使用 .isna().sum()
而不是使用 dropna
。
def remove_floats(row):
if isinstance(row, str):
return row
else:
return None
df = pd.DataFrame({'col':['balh_1', 'blah_2', 1.0, 'blah_3']})
for key in df:
df[key] = df[key].apply(remove_floats)
df.dropna(inplace=True)
df
col
0 balh_1
1 blah_2
3 blah_3