使用 spacy 和 html 突出显示动词短语
Highlight verb phrases using spacy and html
我设计了一个红色字体动词短语的代码并将其输出为 HTML。
from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
import codecs
nlp = en_core_web_sm.load()
sentence = 'The author is writing a new book. The dog is barking.'
pattern = r'<VERB>?<ADV>*<VERB>+'
doc = textacy.Doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.pos_regex_matches(doc, pattern)
with open("my.html","w") as fp:
for list in lists:
search_word = (list.text)
fp.write(sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))
当前输出
The author **is writing** a new book. The dog is barking.The author is writing a new book. The dog **is barking.**
检测到句子重复两次,第一次是写作,最后一次是吠叫。
预期输出:
The author **is writing** a new book. The dog **is barking.**
在将其发送到列表检查之前,我是否必须进行句子标记化?请帮忙?
找到了另一种更合乎逻辑的方法。与其整句替换,不如替换成一个有规律的句子。
with open("my.html","w") as fp:
for _list in lists:
search_word = (_list.text)
containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]
fp.write(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))
以上代码将句子分开写。如果你想把它作为一个句子来做,请将修改附加到列表中并在写入文件之前加入它们,如下所示。
mod_sentence = []
for _list in lists:
search_word = (_list.text)
containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]+'.'
mod_sentence.append(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))
with open("my.html","w") as fp:
fp.write(''.join(mod_sentence))
希望对您有所帮助!干杯!
我设计了一个红色字体动词短语的代码并将其输出为 HTML。
from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
import codecs
nlp = en_core_web_sm.load()
sentence = 'The author is writing a new book. The dog is barking.'
pattern = r'<VERB>?<ADV>*<VERB>+'
doc = textacy.Doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.pos_regex_matches(doc, pattern)
with open("my.html","w") as fp:
for list in lists:
search_word = (list.text)
fp.write(sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))
当前输出
The author **is writing** a new book. The dog is barking.The author is writing a new book. The dog **is barking.**
检测到句子重复两次,第一次是写作,最后一次是吠叫。
预期输出:
The author **is writing** a new book. The dog **is barking.**
在将其发送到列表检查之前,我是否必须进行句子标记化?请帮忙?
找到了另一种更合乎逻辑的方法。与其整句替换,不如替换成一个有规律的句子。
with open("my.html","w") as fp:
for _list in lists:
search_word = (_list.text)
containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]
fp.write(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))
以上代码将句子分开写。如果你想把它作为一个句子来做,请将修改附加到列表中并在写入文件之前加入它们,如下所示。
mod_sentence = []
for _list in lists:
search_word = (_list.text)
containing_sentence = [i for i in sentence.split('.') if str(search_word) in str(i)][0]+'.'
mod_sentence.append(containing_sentence.replace(search_word, '<span style="color: red">{}</span>'.format(search_word)))
with open("my.html","w") as fp:
fp.write(''.join(mod_sentence))
希望对您有所帮助!干杯!