KeyBERT 包不适用于 Google Colab

Question

我在 Google Colab 上使用 KeyBERT 从文本中提取关键字。

from keybert import KeyBERT

model = KeyBERT('distilbert-base-nli-mean-tokens')
text_keywords = model.extract_keywords(my_long_text)

但是我收到以下错误：

OSError: 在模型名称列表中找不到模型名称 'distilbert-base-nli-mean-token'（distilbert-base-uncased、distilbert-base-uncased-distilled-squad）。我们假定 'distilbert-base-nli-mean-token' 是名为 config.json 的配置文件或包含此类文件的目录的路径或 url，但在此路径或 url 中找不到任何此类文件.

知道如何解决这个问题吗？

谢谢

Exception when trying to download http://sbert.net/models/distilbert-base-nli-mean-token.zip. Response 404
SentenceTransformer-Model http://sbert.net/models/distilbert-base-nli-mean-token.zip not found. Try to create it from scratch
Try to create Transformer Model distilbert-base-nli-mean-token with mean pooling
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py in __init__(self, model_name_or_path, modules, device)
     78                         zip_save_path = os.path.join(model_path_tmp, 'model.zip')
---> 79                         http_get(model_url, zip_save_path)
     80                         with ZipFile(zip_save_path, 'r') as zip:

11 frames
/usr/local/lib/python3.7/dist-packages/sentence_transformers/util.py in http_get(url, path)
    241         print("Exception when trying to download {}. Response {}".format(url, req.status_code), file=sys.stderr)
--> 242         req.raise_for_status()
    243         return

/usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
    940         if http_error_msg:
--> 941             raise HTTPError(http_error_msg, response=self)
    942 

HTTPError: 404 Client Error: Not Found for url: https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/v0.2/distilbert-base-nli-mean-token.zip

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    133           that will be used by default in the :obj:`generate` method of the model. In order to get the tokens of the
--> 134           words that should not appear in the generated text, use :obj:`tokenizer.encode(bad_word,
    135           add_prefix_space=True)`.

/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies)
    181 except importlib_metadata.PackageNotFoundError:
--> 182     _timm_available = False
    183 

OSError: file distilbert-base-nli-mean-token not found

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<ipython-input-59-d0fa7b6b7cd1> in <module>()
      1 doc = full_text
----> 2 model = KeyBERT('distilbert-base-nli-mean-token')

/usr/local/lib/python3.7/dist-packages/keybert/model.py in __init__(self, model)
     46                       * https://www.sbert.net/docs/pretrained_models.html
     47         """
---> 48         self.model = select_backend(model)
     49 
     50     def extract_keywords(self,

/usr/local/lib/python3.7/dist-packages/keybert/backend/_utils.py in select_backend(embedding_model)
     40     # Create a Sentence Transformer model based on a string
     41     if isinstance(embedding_model, str):
---> 42         return SentenceTransformerBackend(embedding_model)
     43 
     44     return SentenceTransformerBackend("xlm-r-bert-base-nli-stsb-mean-tokens")

/usr/local/lib/python3.7/dist-packages/keybert/backend/_sentencetransformers.py in __init__(self, embedding_model)
     33             self.embedding_model = embedding_model
     34         elif isinstance(embedding_model, str):
---> 35             self.embedding_model = SentenceTransformer(embedding_model)
     36         else:
     37             raise ValueError("Please select a correct SentenceTransformers model: \n"

/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py in __init__(self, model_name_or_path, modules, device)
     93                             save_model_to = model_path
     94                             model_path = None
---> 95                             transformer_model = Transformer(model_name_or_path)
     96                             pooling_model = Pooling(transformer_model.get_word_embedding_dimension())
     97                             modules = [transformer_model, pooling_model]

/usr/local/lib/python3.7/dist-packages/sentence_transformers/models/Transformer.py in __init__(self, model_name_or_path, max_seq_length, model_args, cache_dir, tokenizer_args, do_lower_case)
     25         self.do_lower_case = do_lower_case
     26 
---> 27         config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
     28         self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)
     29         self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, cache_dir=cache_dir, **tokenizer_args)

/usr/local/lib/python3.7/dist-packages/transformers/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)

/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    144           after the :obj:`decoder_start_token_id`. Useful for multilingual models like :doc:`mBART
    145           <../model_doc/mbart>` where the first generated token needs to be the target language token.
--> 146         - **forced_eos_token_id** (:obj:`int`, `optional`) -- The id of the token to force as the last generated token
    147           when :obj:`max_length` is reached.
    148         - **remove_invalid_values** (:obj:`bool`, `optional`) -- Whether to remove possible `nan` and `inf` outputs of

OSError: Model name 'distilbert-base-nli-mean-token' was not found in model name list (distilbert-base-uncased, distilbert-base-uncased-distilled-squad). We assumed 'distilbert-base-nli-mean-token' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

Answer 1

我无法使用您提供的代码重现此问题，但根据提供的错误消息，我认为您只是在型号名称中缺少 's'，因此只需确保型号名称如下：

distilbert-base-nli-mean-tokens

而不是

distilbert-base-nli-mean-token

另请参阅 this link 以了解所有可用的模型。

KeyBERT 包不适用于 Google Colab

KeyBERT package is not working on Google Colab

google-colaboratory

keyword-extraction

bert-language-model