如何删除使用 python 措辞的段落中的常用词
How to remove common words in a paragraph phrased using python
我需要一种方法来删除网页上述短语内容中的常用词。如何整合这样的方法。
third_headers = ' '.join([r.text for r in soup.find_all('h3')]) third_headers
我得到了一个输出 - 'HTML and CSS Data Analytics XML Tutorials JavaScript Programming Server Side Web Building Data Analytics XML Tutorials HTML CSS JavaScript Programming Server Side XML Character Sets Exercises Quizzes Courses Certificates Example Example Explained'
需要一个没有常用词的新输出(常用词从常用词语料库中删除)
假设我们有一个名为 CORPUS
:
的列表中的常用词语料库
raw = 'HTML and CSS Data Analytics XML Tutorials JavaScript Programming Server Side Web Building Data Analytics XML Tutorials HTML CSS JavaScript Programming Server Side XML Character Sets Exercises Quizzes Courses Certificates Example Example Explained'
CORPUS = ["And", "So", "If", "etc."] # assumed to have
corpus = [w.lower() for w in CORPUS] # to lowercase
words = raw.split()
processed = [w for w in words if w.lower() not in corpus]
print(processed)
我需要一种方法来删除网页上述短语内容中的常用词。如何整合这样的方法。
third_headers = ' '.join([r.text for r in soup.find_all('h3')]) third_headers
我得到了一个输出 - 'HTML and CSS Data Analytics XML Tutorials JavaScript Programming Server Side Web Building Data Analytics XML Tutorials HTML CSS JavaScript Programming Server Side XML Character Sets Exercises Quizzes Courses Certificates Example Example Explained'
需要一个没有常用词的新输出(常用词从常用词语料库中删除)
假设我们有一个名为 CORPUS
:
raw = 'HTML and CSS Data Analytics XML Tutorials JavaScript Programming Server Side Web Building Data Analytics XML Tutorials HTML CSS JavaScript Programming Server Side XML Character Sets Exercises Quizzes Courses Certificates Example Example Explained'
CORPUS = ["And", "So", "If", "etc."] # assumed to have
corpus = [w.lower() for w in CORPUS] # to lowercase
words = raw.split()
processed = [w for w in words if w.lower() not in corpus]
print(processed)