将实体 ID 映射到 SpaCy 3.0 中的字符串
Mapping entity IDs to strings in SpaCy 3.0
我已经使用 spacy 3.0 训练了一个简单的 NER 流水线。训练后,我想获得预测的 IOB 标签列表,以及来自 Doc
(doc = nlp(text)
) 的其他内容。例如,["O", "O", "B", "I", "O"]
我可以使用
轻松获取 IOB id(整数)
>> doc.to_array("ENT_IOB")
array([2, 2, ..., 2], dtype=uint64)
但是我怎样才能得到 mappings/lookup?
我没有在 doc.vocab.lookups.tables
中找到任何查找表。
我也明白在每个token([token.ent_iob_ for token in doc]
)处访问ent_iob_
也能达到同样的效果,但我想知道有没有更好的方法?
查看 token
文档:
ent_iob
IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set.
ent_iob_
IOB code of named entity tag. “B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and "" means no entity tag is set.
因此,您只需使用简单的 iob_map = {0: "", 1: "I", 2: "O", 3: "B"}
字典替换将 ID 映射到名称即可:
doc = nlp("John went to New York in 2010.")
print([x.text for x in doc.ents])
# => ['John', 'New York', '2010']
iob_map = {0: "", 1: "I", 2: "O", 3: "B"}
print(list(map(iob_map.get, doc.to_array("ENT_IOB").tolist())))
# => ['B', 'O', 'O', 'B', 'I', 'O', 'B', 'O']
我已经使用 spacy 3.0 训练了一个简单的 NER 流水线。训练后,我想获得预测的 IOB 标签列表,以及来自 Doc
(doc = nlp(text)
) 的其他内容。例如,["O", "O", "B", "I", "O"]
我可以使用
轻松获取 IOB id(整数)>> doc.to_array("ENT_IOB")
array([2, 2, ..., 2], dtype=uint64)
但是我怎样才能得到 mappings/lookup?
我没有在 doc.vocab.lookups.tables
中找到任何查找表。
我也明白在每个token([token.ent_iob_ for token in doc]
)处访问ent_iob_
也能达到同样的效果,但我想知道有没有更好的方法?
查看 token
文档:
ent_iob
IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set.ent_iob_
IOB code of named entity tag. “B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and "" means no entity tag is set.
因此,您只需使用简单的 iob_map = {0: "", 1: "I", 2: "O", 3: "B"}
字典替换将 ID 映射到名称即可:
doc = nlp("John went to New York in 2010.")
print([x.text for x in doc.ents])
# => ['John', 'New York', '2010']
iob_map = {0: "", 1: "I", 2: "O", 3: "B"}
print(list(map(iob_map.get, doc.to_array("ENT_IOB").tolist())))
# => ['B', 'O', 'O', 'B', 'I', 'O', 'B', 'O']