从 python 中的句子中提取主语
Extracting main subject from a sentence in python
我正在尝试从包含在文本文件中的句子中提取主要主题。例如,该文件包含如下所示的数据
I never used tobacco
They smoke tobacco
I do not like today's weather
Good weather
Exercise 3 to 4 times a week
No exercise
Family history of Cancer
No Cancer
,,· Alcohol use
Amazing football match
Pathetic football match
Has Depression
我要提取主旨打印如下:
I never used tobacco | Tobacco | False
They smoke tobacco | Tobacco | True
I do not like today's weather | Weather | False
Good weather | Weather | True
Exercise 3 to 4 times a week | Exercise | True
No exercise | Exercise | False
Family history of Cancer | Cancer | True
No Cancer | Cancer | False
,,· Alcohol use. | Alcohol | True
Amazing football match | Football Match| True
Pathetic football match | Football Match | False
Has Depression | Depression | True
我正在为它尝试 Spacy,但无法获得所需的输出。我使用 Spacy 对句子进行标记,然后使用词性标记来提取名词,但仍然没有得到所需的东西。
任何人都可以帮助它如何完成吗?
没有确切的解决方案,但我使用的以下代码有点帮助:
negatedwords = read_words_from_file('false.txt') # file containing all the negation words
#read_words_from_file() will read words from file
from collections import Counter
import spacy
nlp = spacy.load('en_core_web_md')
count = Counter(line.split())
negated_word_found = False
for key, val in count.items():
key = key.rstrip('.,?!\n') # removing punctuations
if key in negatedwords :
negated_word_found= True
if negated_word_found== True:
file_write.write("False")
else:
file_write.write("True")
file_write.write(" | ")
document = nlp(line)
for word in document:
look_for_word = word.text
word_pos = word.pos_
if ((word_pos =="NOUN" or word_pos =="ADJ" or word_pos == "PROPN" ) and look_for_word!="use" ): #The pos_ tag for 'use' is showed as NOUN
file_write.write(look_for_word)
file_write.write(' ')
false.txt
never
Never
no
No
NO
not
NOT
Not
NEVER
don't
Don't
DON'T
我正在尝试从包含在文本文件中的句子中提取主要主题。例如,该文件包含如下所示的数据
I never used tobacco
They smoke tobacco
I do not like today's weather
Good weather
Exercise 3 to 4 times a week
No exercise
Family history of Cancer
No Cancer
,,· Alcohol use
Amazing football match
Pathetic football match
Has Depression
我要提取主旨打印如下:
I never used tobacco | Tobacco | False
They smoke tobacco | Tobacco | True
I do not like today's weather | Weather | False
Good weather | Weather | True
Exercise 3 to 4 times a week | Exercise | True
No exercise | Exercise | False
Family history of Cancer | Cancer | True
No Cancer | Cancer | False
,,· Alcohol use. | Alcohol | True
Amazing football match | Football Match| True
Pathetic football match | Football Match | False
Has Depression | Depression | True
我正在为它尝试 Spacy,但无法获得所需的输出。我使用 Spacy 对句子进行标记,然后使用词性标记来提取名词,但仍然没有得到所需的东西。 任何人都可以帮助它如何完成吗?
没有确切的解决方案,但我使用的以下代码有点帮助:
negatedwords = read_words_from_file('false.txt') # file containing all the negation words
#read_words_from_file() will read words from file
from collections import Counter
import spacy
nlp = spacy.load('en_core_web_md')
count = Counter(line.split())
negated_word_found = False
for key, val in count.items():
key = key.rstrip('.,?!\n') # removing punctuations
if key in negatedwords :
negated_word_found= True
if negated_word_found== True:
file_write.write("False")
else:
file_write.write("True")
file_write.write(" | ")
document = nlp(line)
for word in document:
look_for_word = word.text
word_pos = word.pos_
if ((word_pos =="NOUN" or word_pos =="ADJ" or word_pos == "PROPN" ) and look_for_word!="use" ): #The pos_ tag for 'use' is showed as NOUN
file_write.write(look_for_word)
file_write.write(' ')
false.txt
never
Never
no
No
NO
not
NOT
Not
NEVER
don't
Don't
DON'T