NameError: name 'clean_text' is not defined

Question

我正在学习如何实现 nlp，所以我从数据清理开始，现在我正在尝试使用词袋向量化数据，这是我的代码

import pandas as pd
import numpy as np
import string
import re
import nltk
stopword=nltk.corpus.stopwords.words('english')
wn=nltk.WordNetLemmatizer()
from sklearn.feature_extraction.text import CountVectorizer


count_vect=CountVectorizer(analyzer=clean_text)
x_count=count_vect.fit_transform(lematizing_words)
print(x_count.shape)

但是，当我运行这段代码时，我得到以下错误

NameError: name 'clean_text' is not defined

我该如何解决这个问题？

我已经参考了 this 关于 nlp 实现的博客

Answer 1

错误消息很好地描述了您的问题。什么是 clean_text ？您是否定义了 clean_text 函数？或者导入包含此功能的正确 python 模块？

Answer 2

def cleanText(text):
    text = "".join([word.lower() for word in text if word not in string.punctuation])
    tokens = re.split('\W+', text)
    text = [ps.stem(word) for word in tokens if word not in stopwords]
    return text

stopwords = nltk.corpus.stopwords.words('english')

这是 Badreesh 放入 github 但博客中没有的函数。

NameError: name 'clean_text' is not defined

NameError: name 'clean_text' is not defined

nlp

data-science

countvectorizer