自定义模式以匹配 spacy 的匹配器中的短语

Question

我正在尝试使用 spacy 来匹配一些例句。我成功地尝试了示例代码，但现在我需要更具体的东西。先放个示例代码让大家更好理解：

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
# Add match ID "HelloWorld" with no callback and one pattern
pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
matcher.add("HelloWorld", [pattern])

doc = nlp("Hello, world! Hello world!")
matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(match_id, string_id, start, end, span.text)

这很好用，但我现在需要更具体的东西：我需要 python 从文件中加载短语（每个句子都不同）并将其存储在内存中，然后查看 phrase1 ( Hello, world! Hello world! 在示例中）包含内存中的任何模式。这可能吗？如果是，请有人帮助或指导我，我真的不知道如何进行。非常感谢！！

Answer 1

如果我没理解错的话，你想要：

读取一个外部文件，其中包含要匹配的字符串，在您的例子中是 Hello, world!
在加载的文件中查找您的模式。
Return 上面的模式。

这应该有效：

# File contents:
"""./myfile.txt
This is one sentence. Hello world! This is another sentence.
Yet another sentence. Hello world... Hello, world!
"""

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
# Add match ID "HelloWorld" with no callback and one pattern
pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
matcher.add("HelloWorld", [pattern])

matches = matcher(doc)

# Load file as string into memory: https://java2blog.com/python-read-file-into-string/
with open('myfile.txt') as f:
    doc = nlp(f.read())

# Use the pipeline's sentence recognizer: https://spacy.io/usage/linguistic-features#sbd
for sent in doc.sents:
    matches = matcher(sent)
    # From your code, just replace `doc` by `sent`
    for match_id, start, end in matches:
        string_id = nlp.vocab.strings[match_id]  # Get string representation
        span = sent[start:end]  # The matched span
        print(match_id, string_id, start, end, span.text)

请注意，如果您的文件太大，您可能需要逐行阅读，如下所示：

with open(file) as f:
    for line in f:
    # do your stuff here

自定义模式以匹配 spacy 的匹配器中的短语

Custom pattern to match phrases in spacy's Matcher

python

python-3.x

spacy

spacy-3