AttributeError: 'WordList' object has no attribute 'split'
AttributeError: 'WordList' object has no attribute 'split'
我在标记我的“脚本”列后尝试应用词形还原。但是我得到一个 AttributeError。我尝试了不同的东西
这是我的“脚本”专栏:
df_toklem["script"][0:5]
---------------------------------------------------------------------------
type(df_toklem["script"])
输出:
id
1 [ext, street, day, ups, man, big, pot, belly, ...
2 [credits, still, life, tableaus, lawford, n, h...
3 [fade, ext, convent, day, whispering, nuns, pr...
4 [fade, int, c, hercules, turbo, prop, night, e...
5 [open, theme, jaws, plane, busts, clouds, like...
Name: script, dtype: object
---------------------------------------------------------------------------
pandas.core.series.Series
以及我尝试应用词形还原的代码:
from textblob import Word
nltk.download("wordnet")
df_toklem["script"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))
错误:
[nltk_data] Downloading package wordnet to
[nltk_data] C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data] Package wordnet is already up-to-date!
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-72-dbc80c619ec5> in <module>
1 from textblob import Word
2 nltk.download("wordnet")
----> 3 df_toklem["script"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))
~\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
4198 else:
4199 values = self.astype(object)._values
-> 4200 mapped = lib.map_infer(values, f, convert=convert_dtype)
4201
4202 if len(mapped) and isinstance(mapped[0], Series):
pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-72-dbc80c619ec5> in <lambda>(x)
1 from textblob import Word
2 nltk.download("wordnet")
----> 3 df_toklem["script"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))
AttributeError: 'WordList' object has no attribute 'split'
我尝试了不同的方法,但遗憾的是找不到有效的解决方案。谢谢你的时间。
您正在尝试执行的操作将不起作用,因为您正在将字符串函数(拆分)应用于单词列表。
我会尝试使用 nltk
,并用我的标记化数据创建一个新的 pandas 列:
import nltk
df_toklem['tokenized'] = df_toklem.apply(lambda row: nltk.word_tokenize(row['script']))
我在标记我的“脚本”列后尝试应用词形还原。但是我得到一个 AttributeError。我尝试了不同的东西
这是我的“脚本”专栏:
df_toklem["script"][0:5]
---------------------------------------------------------------------------
type(df_toklem["script"])
输出:
id
1 [ext, street, day, ups, man, big, pot, belly, ...
2 [credits, still, life, tableaus, lawford, n, h...
3 [fade, ext, convent, day, whispering, nuns, pr...
4 [fade, int, c, hercules, turbo, prop, night, e...
5 [open, theme, jaws, plane, busts, clouds, like...
Name: script, dtype: object
---------------------------------------------------------------------------
pandas.core.series.Series
以及我尝试应用词形还原的代码:
from textblob import Word
nltk.download("wordnet")
df_toklem["script"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))
错误:
[nltk_data] Downloading package wordnet to
[nltk_data] C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data] Package wordnet is already up-to-date!
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-72-dbc80c619ec5> in <module>
1 from textblob import Word
2 nltk.download("wordnet")
----> 3 df_toklem["script"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))
~\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
4198 else:
4199 values = self.astype(object)._values
-> 4200 mapped = lib.map_infer(values, f, convert=convert_dtype)
4201
4202 if len(mapped) and isinstance(mapped[0], Series):
pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-72-dbc80c619ec5> in <lambda>(x)
1 from textblob import Word
2 nltk.download("wordnet")
----> 3 df_toklem["script"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))
AttributeError: 'WordList' object has no attribute 'split'
我尝试了不同的方法,但遗憾的是找不到有效的解决方案。谢谢你的时间。
您正在尝试执行的操作将不起作用,因为您正在将字符串函数(拆分)应用于单词列表。
我会尝试使用 nltk
,并用我的标记化数据创建一个新的 pandas 列:
import nltk
df_toklem['tokenized'] = df_toklem.apply(lambda row: nltk.word_tokenize(row['script']))