catalogue.RegistryError: [E893] Could not find function 'Custom_Candidate_Gen.v1' in function registry 'misc'

catalogue.RegistryError: [E893] Could not find function 'Custom_Candidate_Gen.v1' in function registry 'misc'

我目前正在构建一个带有自定义 NER、实体链接器和 Textcat 组件的 spacy 管道。对于我的实体链接器组件,我修改了 candidate_generator() 以适合我的用例。我使用了 ner_emersons 演示项目作为参考。以下是我的 custom_functions 代码。

import spacy
from functools import partial
from pathlib import Path
from typing import Iterable, Callable
from spacy.training import Example
from spacy.tokens import DocBin
from spacy.kb import Candidate, KnowledgeBase, get_candidates

@spacy.registry.misc("Custom_Candidate_Gen.v1")
def create_candidates():
    return custom_get_candidates

def custom_get_candidates(kb, span):
    return kb.get_alias_candidates(span.text.lower())

@spacy.registry.readers("MyCorpus.v1")
def create_docbin_reader(file: Path) -> Callable[["Language"], Iterable[Example]]:
    return partial(read_files, file)


def read_files(file: Path, nlp: "Language") -> Iterable[Example]:
    # we run the full pipeline and not just nlp.make_doc to ensure we have entities and sentences
    # which are needed during training of the entity linker
    with nlp.select_pipes(disable="entity_linker"):
        doc_bin = DocBin().from_disk(file)
        docs = doc_bin.get_docs(nlp.vocab)
        for doc in docs:
            yield Example(nlp(doc.text), doc)

训练我的实体链接器并将我的 textcat 组件添加到管道后,我收到以下错误:

catalogue.RegistryError: [E893] Could not find function 'Custom_Candidate_Gen.v1' in function registry 'misc'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.

Available names: spacy.CandidateGenerator.v1, spacy.EmptyKB.v1, spacy.KBFromFile.v1, spacy.LookupsDataLoader.v1, spacy.ngram_range_suggester.v1, spacy.ngram_suggester.v1

为什么我的自定义候选生成器没有注册?

加载模型时加载和注册自定义代码的选项:

  • 在加载模型之前直接在脚本中导入此代码
  • 使用 spacy package --code 将其与您的模型打包并从安装的包名称(而不是目录)加载模型
  • 在单独的包中提供此代码,该包使用 setup.cfg 中的入口点来注册方法(效果很好,但在这种情况下不是我的首选)

参见: