如何获得spacy中的合取跨度？

Question

我使用 spacy，token.conjuncts 来获取每个标记的连词。

但是token.conjuncts的return类型是tuple，但是我想获取span类型，例如：

import spacy
nlp = spacy.load("en_core_web_lg")

sentence = "I like to eat food at the lunch time, or even at the time between a lunch and a dinner"
doc = nlp(sentence)
for token in doc:
    conj = token.conjuncts
    print(conj)

#output: <class 'tuple'>

有谁知道如何将这个 tuple 转换成 span 类型？

或者我怎样才能直接得到 span 类型的连词？

我需要span类型的原因是，我想用conjuncts (span)来定位这个连词的位置，比如这个连词属于哪个名词块或分裂（无论哪种方式我习惯把它们分开）。

目前，我将 tuple 转换为 str 以迭代所有拆分或名词块以搜索 split/noun 块是否包含此 conjunct。

然而，存在一个错误，例如，当一个conjunct（一个token）出现在多个split/noun块中时，那么定位确切的分割将是一个问题包含 conjunct。因为我只考虑str而不考虑conjunct的index或id。如果我能得到这个conjunct的span，那么我就能定位到conjunct.

的确切位置

请随时发表评论，提前致谢！

Answer 1

token.conjuncts returns 一个标记元组。要获得跨度，请调用 doc[conj.i: conj.i+1]

import spacy

nlp = spacy.load('en_core_web_sm')


sentence = "I like oranges and apples and lemons."


doc = nlp(sentence)

for token in doc:
    if token.conjuncts:
        conjuncts = token.conjuncts             # tuple of conjuncts
        print("Conjuncts for ", token.text)
        for conj in conjuncts:
            # conj is type of Token
            span = doc[conj.i: conj.i+1]        # Here's span
            print(span.text, type(span))

如何获得spacy中的合取跨度？

How to get the span of a conjunct in spacy?

python

nlp

conjunctive-normal-form

spacy